llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	892ef2e3c0	[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector It's simpler to do this at codegen time than to do ad-hoc constant folding of machine instructions in SIFoldOperands. Differential Revision: https://reviews.llvm.org/D88028	2020-09-22 10:41:38 +01:00
Matt Arsenault	27df165270	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit `c3492a1aa1`. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit `7d593d0d69`.	2020-09-18 09:48:33 -04:00
Michael Liao	c3492a1aa1	[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations. Differential Revision: https://reviews.llvm.org/D87556	2020-09-17 11:04:17 -04:00
Jay Foad	90777e2924	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446	2020-09-16 16:10:47 +01:00
dfukalov	c259d3a061	[AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158	2020-09-10 01:39:25 +03:00
Michael Liao	bf41c4d29e	[codegen] Ensure target flags are cleared/set properly. NFC. - When an operand is changed into an immediate value or like, ensure their target flags being cleared or set properly. Differential Revision: https://reviews.llvm.org/D87109	2020-09-03 18:37:39 -04:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Matt Arsenault	da3f357de6	AMDGPU: Don't look at dbg users for foldable operands These would have always failed to fold, so checking them or adding them to the fold candidates is useless.	2020-08-17 09:53:25 -04:00
Matt Arsenault	68fab44acf	AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by `766cb615a3`.	2020-08-10 13:46:51 -04:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
Matt Arsenault	766cb615a3	AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by `f8bf7d7f42`, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet.	2020-07-29 14:01:53 -04:00
Matt Arsenault	b9c644ec61	AMDGPU: Fix failures from overflowing uint8_t number of operands If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the operand index field to avoid this, and also don't bother trying to fold into implicit operands.	2020-07-23 15:39:33 -04:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Matt Arsenault	16ea23ff78	AMDGPU: Clear subreg when folding immediate copies This was getting reinterpreted as operand target flags, and appearing as as <unknown target flag>, resulting in unparseable MIR.	2020-07-01 13:59:13 -04:00
Graham Sellers	a19a56f6a1	[AMDGPU] Extend constant folding for logical operations This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_OR_B32. Also added a couple of tests for existing folds.	2020-04-07 14:37:16 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Michael Liao	07a569a053	[amdgpu] Remove unused header. NFC.	2020-01-08 11:32:09 -05:00
David Stuttard	46db606834	AMDGPU: Avoid folding 2 constant operands into an SALU operation Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant. Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70896	2019-12-04 10:25:34 +00:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Stanislav Mekhanoshin	1bfcc60828	[AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.	2019-11-04 13:31:21 -08:00
Stanislav Mekhanoshin	4c0251da14	[AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445	2019-10-25 15:08:30 -07:00
Stanislav Mekhanoshin	c7dcacf16a	[AMDGPU] Fixed asan failure in SIFoldOperands Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI. Differential Revision: https://reviews.llvm.org/D69448	2019-10-25 13:59:56 -07:00
Stanislav Mekhanoshin	d4303b3861	[AMDGPU] Fold AGPR reg_sequence initializers Differential Revision: https://reviews.llvm.org/D69413	2019-10-25 11:39:02 -07:00
Michael Liao	b2a65f0d70	[AMDGPU] Skip additional folding on the same operand. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355	2019-10-24 11:30:22 -04:00
Stanislav Mekhanoshin	61e7a61bdc	[AMDGPU] Allow folding of sgpr to vgpr copy Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xexec permutations. Differential Revision: https://reviews.llvm.org/D69280	2019-10-23 18:42:48 -07:00
Stanislav Mekhanoshin	48f57138be	[AMDGPU] Allow tied operand subreg folding Turns out it makes sense, contrarily to what comment said. Differential Revision: https://reviews.llvm.org/D69287	2019-10-22 11:27:36 -07:00
Matt Arsenault	8ebbf25cb1	AMDGPU: Erase redundant redefs of m0 in SIFoldOperands Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initially teaching that pass to ignore redefs of m0, but having them not exist beforehand is much simpler. This is in preparation for deleting the current special m0 handling in SIFixSGPRCopies to allow the register coalescer to handle the difficult cases. llvm-svn: 375449	2019-10-21 19:53:46 +00:00
Matt Arsenault	e5be543a55	AMDGPU: Increase vcc liveness scan threshold Avoids a test regression in a future patch. Also add debug printing on this case, so I waste less time debugging folds in the future. llvm-svn: 375367	2019-10-20 17:44:17 +00:00
Matt Arsenault	f8bf7d7f42	AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257	2019-10-09 22:51:42 +00:00
Alexander Timofeev	565b1d3d46	[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221	2019-09-30 15:31:17 +00:00
Stanislav Mekhanoshin	d3b2b97195	[AMDGPU] gfx10 v_fmac_f16 operand folding Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906	2019-09-25 18:40:20 +00:00
Stanislav Mekhanoshin	8fe1245a0f	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816	2019-08-23 22:09:58 +00:00
Alexander Timofeev	78347c979e	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532	2019-08-21 15:15:04 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Tim Renouf	10db641aab	[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm' That change (r363670) could leave a copy from vgpr to sgpr. Fixed. Differential Revision: https://reviews.llvm.org/D66133 Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4 llvm-svn: 368736	2019-08-13 18:57:55 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
Jay Foad	27ec195f39	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910	2019-07-12 15:59:40 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00
Matt Arsenault	60957cb74c	AMDGPU: Fold frame index into MUBUF This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185	2019-06-24 14:53:56 +00:00
Matt Arsenault	4d000d2488	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876	2019-06-19 20:44:15 +00:00
Matt Arsenault	f39f3bd056	AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675	2019-06-18 12:48:36 +00:00
Matt Arsenault	bcb5ea0042	AMDGPU: Fold readlane from copy of SGPR or imm These may be inserted to assert uniformity somewhere. llvm-svn: 363670	2019-06-18 12:23:46 +00:00
Matt Arsenault	e75e197ad8	AMDGPU: Remove unnecessary check for virtual register The copy was found by searching the uses of a virtual register, so it's already known to be virtual. llvm-svn: 363669	2019-06-18 12:23:45 +00:00
Matt Arsenault	cfd0ca38b0	AMDGPU: Support shrinking add with FI in SIFoldOperands Avoids test regression in a future patch llvm-svn: 359898	2019-05-03 15:21:53 +00:00
Matt Arsenault	0446fbe45e	AMDGPU: Replace shrunk instruction with dummy implicit_def This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891	2019-05-03 14:40:10 +00:00
Matt Arsenault	2c8936fd26	AMDGPU: Fix incorrect commute with sub when folding immediates When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883	2019-05-03 13:42:56 +00:00
Stanislav Mekhanoshin	5cf8167735	[AMDGPU] gfx1010 allows VOP3 to have a literal Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756	2019-05-02 04:01:39 +00:00
Michael Liao	389d5a3474	[AMDGPU] Fix an issue in `op_sel_hi` skipping. Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922	2019-04-22 22:05:49 +00:00

1 2 3

125 Commits