llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Maitland	184fbfd712	[RISCV][CodeGen] Chapter of vector instruction type corresponds with chapters in RISCV vector specification. NFC The [vector spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) is organized in chapters based on instruction type. The comments in the tablegen marked the incorrect chapters. This change updates the comments with the correct chapter numbers. Differential Revision: https://reviews.llvm.org/D138311	2022-11-18 10:30:08 -08:00
Matt Arsenault	fe56afc4d7	AMDGPU: Fix fcanonicalize constant folding not correctly handling -0.0	2022-11-18 10:03:29 -08:00
Philip Reames	18fda867f4	[RISCV] Optimize scalable frame offset calculation when VLEN is precisely known When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well. Differential Revision: https://reviews.llvm.org/D137591	2022-11-18 09:56:55 -08:00
Michael Maitland	98e342dca2	[RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind of information impacts how the instruction takes to execute and what dependencies this may cause. On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or vl, in addition with the instruction itself. But MCA does not track or use the data in these registers. This patch fixes this problem by introducing Instruments into MCA. * Replace `CodeRegions` with `AnalysisRegions` * Add `Instrument` and `InstrumentManager` * Add `InstrumentRegions` * Add RISCV Instrument and `InstrumentManager` * Parse `Instruments` in driver * Use instruments to override schedule class * RISCV use lmul instrument to override schedule class * Fix unit tests to pass empty instruments * Add -ignore-im clopt to disable this change A prior version of this patch was commited in `5e82ee5373`. `2323a4ee61` reverted that change because the unit test files caused build errors. The change with fixes were committed in `b88b8307bf` but reverted once again `e8e92c8313` due to more build errors. This commit adds the prior changes and fixes the build error. Differential Revision: https://reviews.llvm.org/D137440	2022-11-18 09:55:15 -08:00
Mirko Brkusanin	e58b116843	[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11 Differential Revision: https://reviews.llvm.org/D133012	2022-11-18 18:19:27 +01:00
Petar Avramovic	0f3e72e86c	AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection When selectVOP3PMadMixModsImpl fails, it can still create new copy instr via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr will remain dead but will not be automatically removed. InstructionSelect does not check if instructions created during selection are dead. Such dead copy doesn't have register class on dst operand and causes crash. Fix is to build copy when operands are being added to selected instruction. Differential Revision: https://reviews.llvm.org/D138044	2022-11-18 18:02:26 +01:00
Jay Foad	38302c60ef	[AMDGPU] Stop looking for implicit M0 uses on MOV instructions Before D114230, indirect moves used regular MOV opcodes and were identified by having an implicit use of M0. Since D114230 they use dedicated opcodes instead, so remove some old code that checks for implicit uses of M0. NFCI. Differential Revision: https://reviews.llvm.org/D138308	2022-11-18 16:57:55 +00:00
Matt Arsenault	08ec15e44b	AMDGPU/GlobalISel: Fix strictfp fmul	2022-11-18 08:53:49 -08:00
Dinar Temirbulatov	44e2c6a428	[AArch64][SVE] Use PTRUE instruction instead of WHILELO if the range is appropriate for predicator constant. While get_active_lane_mask lowering it uses WHILELO instruction, but forconstant range suitable for PTRUE then we could issue PTRUE instruction instead. Differential Revision: https://reviews.llvm.org/D137547	2022-11-18 16:21:10 +00:00
Krzysztof Parzyszek	ea6693d4c8	[Hexagon] Add missing patterns for mulhs/mulhu	2022-11-18 08:13:57 -08:00
Alexander Timofeev	3ae96e9eb8	ARCRegisterInfo::eliminateFrameIndex updated to fix build error caused by `32bd75716c`	2022-11-18 16:16:10 +01:00
Alexander Timofeev	32bd75716c	PEI should be able to use backward walk in replaceFrameIndicesBackward. The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137574	2022-11-18 15:57:34 +01:00
David Sherwood	2e02f007a2	[AArch64][SME2] Remove vector constraints from zip/uzp (2-vector) instruction classes The zip/uzp (2-vector) instruction classes have the incorrect register constraints and mark the destination as also being an input. However, the instructions are fully destructive so I've restructured the classes. Differential Revision: https://reviews.llvm.org/D138288	2022-11-18 14:30:48 +00:00
Phoebe Wang	d558255650	[X86] Use lock add/sub for cases that we only care about the EFLAGS This fixes #36373, #36905 and partial of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137711	2022-11-18 21:43:47 +08:00
Hassnaa Hamdi	79e8bd1add	[AArch64][SME]: Generate streaming-compatible code for ISD::INSERT_VECTOR_ELT. 1- Enable custom lowering INSERT_VECTOR_ELT to generate code compatible to streaming mode. 2- Add testing file: insert-vector-elt.ll Differential Revision: https://reviews.llvm.org/D138222	2022-11-18 12:20:16 +00:00
Hassnaa Hamdi	d8306b8885	[AArch64][SME]: Use SVE mov instruction for FPR128 registers in streaming-compatible mode. 1- in streaming mode, use SVE OR/mov instruction instead of NEON OR, during copying phyReg -AArch64InstrInfo::copyPhysReg-. 2- add testing file: register-mov.ll Differential Revision: https://reviews.llvm.org/D138211	2022-11-18 11:18:30 +00:00
Dmitry Preobrazhensky	96155bf44b	[AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2) Rename interface functions and operands to make code clearer. Differential Revision: https://reviews.llvm.org/D138133	2022-11-18 14:15:05 +03:00
Valery Pykhtin	a35ba2a256	[AMDGPU] Fix PreRARematStage::sinkTriviallyRematInsts region boundary update after sinking. First boundary of a region wasn't updated when a sinked instruction was added first into the region. Reviewed By: vangthao Differential Revision: https://reviews.llvm.org/D138256	2022-11-18 12:13:14 +01:00
wanglei	bfa3551dd3	[LoongArch] Implement assembler branches pseudo instructions These instructions always output the canonical mnemonic. The GNU tools emit the canonical mnemonic for the branch pseudo instructions as well (e.g. "bgt" will be recognised by the assembler but never printed by objdump). Reviewed By: xen0n Differential Revision: https://reviews.llvm.org/D138100	2022-11-18 16:54:20 +08:00
Chen Zheng	f034c98af0	[PowerPC] mark dead def for ctr be clobber. TLS pseudo ADDIStlsgdHA will have such def. This dead def should also prevent PPC from generating CTR loops.	2022-11-18 06:55:42 +00:00
Han-Kuan Chen	7e6dbfcd9d	[RISCV] Make lowerVECTOR_SHUFFLEAsVSlidedown follow source until not EXTRACT_SUBVECTOR. Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are EXTRACT_SUBVECTOR and their source are same. The commit will make the function seek input and their source until they are not EXTRACT_SUBVECTOR. Differential Revision: https://reviews.llvm.org/D138025	2022-11-17 22:32:53 -08:00
Matt Arsenault	fe5b9a6a11	AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal	2022-11-17 20:50:04 -08:00
Matt Arsenault	ae43420f39	AMDGPU/GlobalISel: Fix not selecting modifiers for f16 fma on gfx9 VOP3OpSel wasn't trying to match any modifiers. Just try to match the basic case, like the DAG does.	2022-11-17 18:51:45 -08:00
Alexander Shaposhnikov	7059a6c32c	[IR] Split out IR printing passes into IRPrinter This diff splits out (from LLVMCore) IR printing passes into IRPrinter. This structure is similar to what we already have for IRReader and enables us to avoid circular dependencies between LLVMCore and Analysis (this is a preparation for https://reviews.llvm.org/D137768). The legacy interface is left unchanged, once the legacy pass manager is removed (in the future) we will be able to clean it up further. The bazel build configuration has been updated as well. Test plan: 1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc 2/ bazel build --config=generic_clang @llvm-project//... Differential revision: https://reviews.llvm.org/D138081	2022-11-18 01:47:56 +00:00
Krzysztof Parzyszek	a98fc08396	[Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73 This includes instruction formats, definitions, encodings, scheduling classes, and builtins/intrinsics. New and improved version of 76536989ba, so much so that even clang builds with it.	2022-11-17 15:51:38 -08:00
Fangrui Song	99f730c645	Revert "[Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73" This reverts commit `766536989b`. The commit caused: clang/include/clang/Basic/BuiltinsHexagonDep.def:1896:69: error: use of undeclared identifier 'HVXV73' TARGET_BUILTIN(__builtin_HEXAGON_V6_vadd_sf_bf, "V32iV16iV16i", "", HVXV73) when building `clang`.	2022-11-17 23:14:32 +00:00
Krzysztof Parzyszek	766536989b	[Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73 This includes instruction formats, definitions, encodings, scheduling classes, and builtins/intrinsics.	2022-11-17 14:15:47 -08:00
Krzysztof Parzyszek	534b26aa07	[Hexagon] Improve inserting/extracting to/from scalar predicates Fixes https://github.com/llvm/llvm-project/issues/59042.	2022-11-17 13:03:45 -08:00
Krzysztof Parzyszek	a2a89eb019	[Hexagon] Fix lowering loads/stores of scalar vNi1 Don't treat them as i1, all predicate bits need to be loaded or stored.	2022-11-17 12:48:01 -08:00
Krzysztof Parzyszek	8407c9916d	[Hexagon] Use BUILD_PAIR instead of HexagonISD::COMBINE in lowering	2022-11-17 12:31:48 -08:00
Sami Tolvanen	a542d5422a	[X86][KCFI] Add support for memory operand unfolding When the Linux kernel is compiled without -mretpoline, KCFI fails ungracefully because it doesn't handle indirect calls with a memory target operand. Since the KCFI check will need to load the target address into a register for validating the type hash anyway, simply unfold memory operands in indirect calls that need a KCFI check. Fixes #59017	2022-11-17 19:00:48 +00:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Bradley Smith	ac82907a1c	[AArch64][SVE] Ensure redundant PTEST are removed with an 'invalid' PTRUE When a PTRUE of non-element size is encountered, the PTEST optimization logic bails out since it cannot handle that type of PTRUE. Instead, it should be treated as a generic predicate to allow later optimizations trigger. Differential Revision: https://reviews.llvm.org/D138116	2022-11-17 15:42:17 +00:00
Anton Sidorenko	b6c790736e	[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns This patch adds tranformation of fmul+fadd/fsub chains to fused multiply instructions: * fmul+fadd->fmadd * fmul+fsub->fmsub/fnmsub We also will try to combine these instructions if the fmul has more than one use and cannot be deleted. However, removing the dependence between fmul and fadd can still be profitable, and we rely on machine combiner approximations of scheduling. Differential Revision: https://reviews.llvm.org/D136764	2022-11-17 13:24:04 +03:00
Yashwant Singh	2652db4d68	Handling ADD\|SUB U64 decomposed Pseudos not getting lowered to SDWA form This patch fixes some of the V_ADD/SUB_U64_PSEUDO not getting converted to their sdwa form. We still get below patterns in generated code: v_and_b32_e32 v0, 0xff, v0 v_add_co_u32_e32 v0, vcc, v1, v0 v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc and, v_and_b32_e32 v2, 0xff, v2 v_add_co_u32_e32 v0, vcc, v0, v2 v_addc_co_u32_e32 v1, vcc, 0, v1, vcc 1st and 2nd instructions of both above examples should have been folded into sdwa add with BYTE_0 src operand. The reason being the pseudo instruction is broken down into VOP3 instruction pair of V_ADD_CO_U32_e64 and V_ADDC_U32_e64. The sdwa pass attempts lowering them to their VOP2 form before converting them into sdwa instructions. However V_ADDC_U32_e64 cannot be shrunk to it's VOP2 form if it has non-reg src1 operand. This change attempts to fix that problem by only shrinking V_ADD_CO_U32_e64 instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136663	2022-11-17 10:01:40 +05:30
Craig Topper	7e15ea102f	[RISCV] Add a DAG combine to pre-promote (i1 (truncate (i32 (srl X, Y)))) with Zbs on RV64. Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW, which will prevent us from using a BEXT instruction. This is similar to what we do for (i32 (and (srl X, Y), 1)).	2022-11-16 19:07:33 -08:00
Koakuma	fd0aeaa83a	[SPARC] Don't emit deprecated FP branches when targeting v9 Don't emit deprecated v8-style FP compares & branches when targeting v9 processors. For now, always use %fcc0, because currently the allocator requires allocatable registers to also be spillable, which isn't the case with v9 FCC registers. The work to enable allocation over the entire FCC register file will be done in a future patch. Fixes bug #17834 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135515	2022-11-16 20:56:17 -05:00
Koakuma	586d5f91e6	[SPARC] Improve integer branch handling for v9 targets Do not emit deprecated v8-style branches when targeting a v9 processor. As a side effect, this also fixes the emission of useless ba's when doing conditional branches on 64-bit integer values. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D130006	2022-11-16 20:51:20 -05:00
gonglingqin	825547247a	[LoongArch] Eliminate extra un-accounted-for successors Specifically: ``` * Bad machine code: MBB has unexpected successors which are not branch targets, fallthrough, EHPads, or inlineasm_br targets. * - function: atomicrmw_umax_i8_acquire - basic block: %bb.3 (0x1b90bd8) * Bad machine code: Non-terminator instruction after the first terminator * - function: atomicrmw_umax_i8_acquire - basic block: %bb.3 (0x1b90bd8) - instruction: DBAR 1792 ``` Differential Revision: https://reviews.llvm.org/D137884	2022-11-17 09:44:59 +08:00
wanglei	7da2d69da6	[LoongArch] Transfer MI flags when expand PseudoCALL When expanding a PseudoCALL, the corresponding flags (e.g. nomerge) need to be passed to the new instruction. This patch also adds test for the nomerge attribute. The `nomerge` attribute was added during `LowerCall`, but was lost during expand PseudoCALL. Now add it back. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D137888	2022-11-17 09:25:10 +08:00
Craig Topper	5c9b03faef	[RISCV] Remove duplicate setOperationAction. NFC	2022-11-16 16:54:27 -08:00
Matt Arsenault	3830e4e58c	AMDGPU: Create poison values instead of undef These placeholders don't care about the finer points on the difference between the two.	2022-11-16 14:47:24 -08:00
Krzysztof Parzyszek	1aa7bd09a9	[Hexagon] Rearrange bits in TSFlags, NFC	2022-11-16 11:02:07 -08:00
Simon Pilgrim	becf7b2259	[X86] Remove unnecessary override GFNI AFFINE reg-reg overrides from AlderlakeP model Now matches the default SchedWriteVecIMul values used for the instruction. NOTE: The folded variant overrides are still there as the latency differs by 1cy	2022-11-16 17:46:29 +00:00
Sander de Smalen	6f48e68d39	[SME] Store buffer to the correct pointer when setting up lazy-save. This fixes a bug in 'allocateLazySaveBuffer' that led to the buffer pointer being stored to the wrong address. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D137734	2022-11-16 16:37:33 +00:00
Nicholas Guy	41a3f92596	[AArch64][CodeGen] Add AArch64 support for complex deinterleaving Differential Revision: https://reviews.llvm.org/D129066	2022-11-16 14:00:54 +00:00
Dmitry Preobrazhensky	e468b1b740	[AMDGPU][GFX11] Refactor VOPD operands handling Differential Revision: https://reviews.llvm.org/D137952	2022-11-16 16:29:12 +03:00
David Green	71609871dd	[AArch64][MachineCombiner] Use MIMetadata to copy pcsections metadata to reassociated instructions. D134260/D138107 exposed that the MachineCombiner was not copying pcsections metadata where it should. This patch switches the MIBuild methods to use MIMetadata that can copy the debug loc and pcsections at the same time. Differential Revision: https://reviews.llvm.org/D138112	2022-11-16 13:22:48 +00:00
David Green	5f7f484ee5	[AArch64] Add GPR rr instructions to isAssociativeAndCommutative This adds some more scalar instructions that are both associative and commutative to isAssociativeAndCommutative, allowing the machine combiner to reassociate them to reduce critical path length. Differential Revision: https://reviews.llvm.org/D134260	2022-11-16 12:39:13 +00:00

1 2 3 4 5 ...

69712 Commits