llvm-project

Commit Graph

Author	SHA1	Message	Date
Dmitry Preobrazhensky	09c9f4dc7d	[AMDGPU][MC] Added missing isCall/isBranch flags Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072	2021-07-16 14:59:10 +03:00
Stanislav Mekhanoshin	e5b0fe1b83	[AMDGPU] Mark more SOP instructions as rematerializable The rest of the SOP instructions implicitly set SCC and not suitable for the rematerialization. Differential Revision: https://reviews.llvm.org/D105670	2021-07-08 16:00:45 -07:00
Jay Foad	7f3ac6714a	[AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 13:36:02 +01:00
Jay Foad	323b3e645d	[AMDGPU] Set mayLoad and mayStore on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:10:23 +01:00
Jay Foad	37109974af	[AMDGPU] Use defvar in SOPInstructions.td. NFC. Factor out repeated !cast<SOP*_Pseudo>(NAME) into a new "defvar ps", just to improve readability and maintainability. Differential Revision: https://reviews.llvm.org/D104306	2021-06-16 09:16:45 +01:00
Mirko Brkusanin	35ef4c940b	[AMDGPU][GlobalISel] Legalize G_ABS Legalize and select G_ABS so that we can use llvm.abs intrinsic Differential Revision: https://reviews.llvm.org/D102391	2021-06-04 14:46:43 +02:00
Konstantin Zhuravlyov	844012940e	AMDGPU: Add isBranch=1 to SOPP branch instructions Differential Revision: https://reviews.llvm.org/D99955	2021-04-06 10:59:30 -04:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Jay Foad	796a60d2ea	[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) The expected use case is for frontends to insert this into shaders that are to be run under a debugger. The shader can then be resumed or single stepped from the point of the call under debugger control. Differential Revision: https://reviews.llvm.org/D97670	2021-03-01 14:30:23 +00:00
Jay Foad	3ad5216ed8	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Jay Foad	9f69c1bc54	[AMDGPU] Rename pseudo S_WAITCNT_IDLE to S_WAIT_IDLE. NFC.	2020-11-18 14:03:43 +00:00
Michael Liao	23c6d1501d	[amdgpu] Add `llvm.amdgcn.endpgm` support. - `llvm.amdgcn.endpgm` is added to enable "abort" support. Differential Revision: https://reviews.llvm.org/D90809	2020-11-05 19:06:50 -05:00
Jay Foad	a442fad911	[AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode Differential Revision: https://reviews.llvm.org/D90374	2020-10-29 14:54:33 +00:00
Matt Arsenault	d61996473d	AMDGPU: Increase branch size estimate with offset bug This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.	2020-10-23 10:34:24 -04:00
Joe Nash	f6d7832f4c	[AMDGPU] Refactor SOPC & SOPP .td for extension We use the Real vs Pseudo instruction abstraction for other types of instructions to facilitate changes in opcode between gpu generations. This patch introduces that abstraction to SOPC and SOPP. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89738 Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138	2020-10-21 12:35:52 -04:00
Stanislav Mekhanoshin	caeb13aba8	[AMDGPU] Allow SOP asm mnemonic to differ Allows the creation of real SOP1 instructions with assembler mnemonics that differ from their pseudo-instruction mnemonics. The default behavior keeps the mnemonics matching. Corrects a subtarget label typo in a comment. Authored By: Joe_Nash Differential Revision: https://reviews.llvm.org/D88708	2020-10-01 16:00:04 -07:00
Jay Foad	2806f586dc	[AMDGPU] Make bfi patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. Differential Revision: https://reviews.llvm.org/D88245	2020-09-28 10:16:51 +01:00
Jay Foad	90777e2924	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446	2020-09-16 16:10:47 +01:00
Matt Arsenault	40a142fa57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (`0045786f14`), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
Matt Arsenault	87b2af8140	AMDGPU/GlobalISel: Enable s_{and\|or}n2_{b32\|b64} patterns	2020-08-06 18:00:38 -04:00
Matt Arsenault	b6ebc77326	AMDGPU/GlobalISel: Fix selecting llvm.amdgcn.s.getreg This introduces the same bug llvm.amdgcn.s.setreg has where if the user specified an immediate outside of the valid 16-bit range, it will select into a verifier error.	2020-07-28 21:34:50 -04:00
Dmitry Preobrazhensky	2e87acac9b	[AMDGPU] Removed s_mov_regrd and mov_fed opcodes These opcodes are not intended for public use. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81659	2020-07-17 19:52:54 +03:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
Piotr Sobczak	6d9565d6d5	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit `4067de569f`.	2020-06-19 16:41:04 +02:00
Piotr Sobczak	4067de569f	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
Matt Arsenault	779cba79ec	AMDGPU: Remove mayLoad/mayStore from some side effecting intrinsics These don't really modify any memory, and should not expect memory operands.	2020-06-18 14:12:19 -04:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Matt Arsenault	4b1f6cdbf9	AMDGPU: Don't run indexing mode switches with exec = 0 Add mode defs rather than special casing this like some of the other instructions.	2020-06-02 13:47:48 -04:00
Matt Arsenault	0892a96a05	AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode This is a custom inserter because it was less work than teaching tablegen a way to indicate that it is sometimes OK to have a no side effect instruction in the output of a side effecting pattern. The asm is needed to look like a read of the mode register to prevent it from being deleted. However, there seems to be a bug where the mode register def instructions are moved across the asm sideeffect by the post-RA scheduler. Another oddity is the immediate is formatted differently between s_denorm_mode and s_round_mode.	2020-05-29 21:11:36 -04:00
Matt Arsenault	97f3f0bab0	AMDGPU: Add intrinsic for s_setreg This will be more useful with fenv access implemented.	2020-05-28 14:26:38 -04:00
Matt Arsenault	1a9e0d7092	AMDGPU: Make S_DENORM_MODE not be a scheduling boundary Now that the mode register uses/defs should be properly modeled, we don't need to treat the FP mode switch as an arbitrary side effect.	2020-05-28 10:39:33 -04:00
Matt Arsenault	4b4496312e	AMDGPU: Start adding MODE register uses to instructions This is the groundwork required to implement strictfp. For now, this should be NFC for regular instructoins (many instructions just gain an extra use of a reserved register). Regalloc won't rematerialize instructions with reads of physical registers, but we were suffering from that anyway with the exec reads. Should add it for all the related FP uses (possibly with some extras). I did not add it to either the gpr index mode instructions (or every single VALU instruction) since it's a ridiculous feature already modeled as an arbitrary side effect. Also work towards marking instructions with FP exceptions. This doesn't actually set the bit yet since this would start to change codegen. It seems nofpexcept is currently not implied from the regular IR FP operations. Add it to some MIR tests where I think it might matter.	2020-05-27 14:47:00 -04:00
Kazuaki Ishizaki	0312b9f550	[llvm] NFC: Fix trivial typo in rst and td files Differential Revision: https://reviews.llvm.org/D77469	2020-04-23 14:26:32 +09:00
alex-t	48a9cf9043	[AMDGPU] Enable SEXT divergence driven selection. Summary: This change enable the divergence driven selection for the SEXT DAG opcode. Reviewers: vpykhtin, rampitec Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Differential Revision: https://reviews.llvm.org/D76230	2020-03-17 17:30:11 +03:00
Matt Arsenault	209094eeb6	AMDGPU/GlobalISel: Start matching s_lshlN_add_u32 instructions Use a hack to only enable this for GlobalISel. Technically this also works with SelectionDAG, but the divergence selection isn't reliable enough and a few cases fail, but I have no desire to spend time writing the manual expansion code for it. The DAG actually does a better job since it catches using v_add_lshl_u32 in the mixed SGPR/VGPR cases.	2020-03-09 12:36:51 -07:00
Matt Arsenault	d1b393d92c	AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:46 -08:00
Matt Arsenault	045a8921d7	AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:45 -08:00
alex-t	5df1ac7846	[AMDGPU] fixed divergence driven shift operations selection Differential Revision: https://reviews.llvm.org/D73483 Reviewers: rampitec	2020-01-31 20:49:56 +03:00
Matt Arsenault	62129878a6	AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using tablegen selected add/sub, which switch to the signed version of the opcode. This matches the current DAG behavior. We can't drop the manual selection for add/sub yet, because it's still both for VALU add/sub and for G_PTR_ADD.	2020-01-29 08:55:54 -08:00
Matt Arsenault	4e69df091d	Revert "AMDGPU: Temporary drop s_mul_hi_i/u32 patterns" This reverts commit `fe23ed2c68`. It was never really clear this was responsible for the performance regressions that caused this to be reverted. It's been a long time, and we need to have scalar patterns for this to get GlobalISel working.	2020-01-27 08:07:21 -08:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Matt Arsenault	e699c03c9b	AMDGPU/GlobalISel: Fix import of s_abs_i32 pattern	2020-01-07 10:32:07 -05:00
Matt Arsenault	9150d6bd73	AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.vote	2020-01-07 10:15:29 -05:00
Matt Arsenault	92ff017a85	AMDGPU: Only allow regs for s_movrel_{b32\|b64} This would incorrectly allowing folding immediates. These currently aren't selectable, but will be from GlobalISel soon.	2020-01-03 15:25:49 -05:00
Stanislav Mekhanoshin	4312c4afd4	[AMDGPU] deduplicate tablegen predicates We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815	2019-11-04 12:19:17 -08:00
Matt Arsenault	eb6eb694e4	AMDGPU/GlobalISel: Allow selection of scalar min/max I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450	2019-09-21 02:37:33 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00

1 2 3

124 Commits