llvm-project

Commit Graph

Author	SHA1	Message	Date
Qiu Chaofan	573531fb1f	Fix typo of colon to semicolon in lit tests	2021-10-09 10:03:50 +08:00
David Stuttard	69f7d81d0a	[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR registers Calculate the number of VGPR registers used as input VGPRs based on these registers rather than the arguments passed in (this conservatively always allocates the maximum). Differential Revision: https://reviews.llvm.org/D101633 Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6	2021-10-08 14:24:35 +01:00
Mirko Brkusanin	d20840c937	[GlobalISel] Combine for eliminating redundant operand negations Differential Revision: https://reviews.llvm.org/D111319	2021-10-08 14:29:22 +02:00
Amara Emerson	08b3c0d995	[GlobalISel] Combine G_UMULH x, (1 << c)) -> x >> (bitwidth - c) In order to not generate an unnecessary G_CTLZ, I extended the constant folder in the CSEMIRBuilder to handle G_CTLZ. I also added some extra handing of vector constants too. It seems we don't have any support for doing constant folding of vector constants, so the tests show some other useless G_SUB instructions too. Differential Revision: https://reviews.llvm.org/D111036	2021-10-07 23:51:37 -07:00
Jay Foad	e996cf7dce	[AMDGPU] Preserve MachineDominatorTree in SILowerControlFlow Updating the MachineDominatorTree is easy since SILowerControlFlow only splits and removes basic blocks. This should save a bit of compile time because previously we would recompute the dominator tree from scratch after this pass. Another reason for doing this is that SILowerControlFlow preserves LiveIntervals which transitively requires MachineDominatorTree. I think that means that SILowerControlFlow is obliged to preserve MachineDominatorTree too as explained here: https://lists.llvm.org/pipermail/llvm-dev/2020-November/146923.html although it does not seem to have caused any problems in practice yet. Differential Revision: https://reviews.llvm.org/D111313	2021-10-07 21:30:26 +01:00
Amara Emerson	8bfc0e06dc	[GlobalISel] Port the udiv -> mul by constant combine. This is a straight port from the equivalent DAG combine. Differential Revision: https://reviews.llvm.org/D110890	2021-10-07 11:37:17 -07:00
Carl Ritson	b5d6ad20e1	[MachineCopyPropagation] Handle propagation of undef copies When propagating undefined copies the undef flag must also be propagated. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111219	2021-10-07 20:34:27 +09:00
Philip Reames	d652724c0b	[test] refresh a couple of autogen tests	2021-10-05 18:41:24 -07:00
Carl Ritson	adf7043a9f	[AMDGPU] Only remove branches in SIInstrInfo::removeBranch Without this change _term instructions can be removed during critical edge splitting. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111126	2021-10-06 10:34:26 +09:00
kpyzhov	095c48fdf3	[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf. Differential revision: https://reviews.llvm.org/D110337	2021-10-05 09:56:04 -04:00
Mirko Brkusanin	40e00063bc	[GlobalISel] Combine fabs(fneg(x)) to fabs(x) Differential Revision: https://reviews.llvm.org/D110943	2021-10-05 13:43:39 +02:00
Jay Foad	9ce4f37206	[AMDGPU][GlobalISel] Fix legalization of G_UMULH Scalarize before narrowing because the narrowing implementation does not work on vectors. This matches what we do for regular G_MUL. Differential Revision: https://reviews.llvm.org/D111129	2021-10-05 10:56:02 +01:00
Carl Ritson	e86d45ec00	[AMDGPU] Pre-commit test for D111126 (NFC)	2021-10-05 18:13:54 +09:00
Amara Emerson	8bde5e58c0	Delay outgoing register assignments to last. The delayed stack protector feature which is currently used for SDAG (and thus allows for more commonly generating tail calls) depends on being able to extract the tail call into a separate return block. To do this it also has to extract the vreg->physreg copies that set up the call's arguments, since if it doesn't then the call inst ends up using undefined physregs in it's new spliced block. SelectionDAG implementations can do this because they delay emitting register copies until after the stack arguments are set up. GISel however just processes and emits the arguments in IR order, so stack arguments always end up last, and thus this breaks the code that looks for any register arg copies that precede the call instruction. This patch adds a thunk argument to the assignValueToReg() and custom assignment hooks. For outgoing arguments, register assignments use this return param to return a thunk that does the actual generating of the copies. We collect these until all the outgoing stack assignments have been done and then execute them, so that the copies (and perhaps some artifacts like G_SEXTs) are placed after any stores. Differential Revision: https://reviews.llvm.org/D110610	2021-10-04 12:33:20 -07:00
Jay Foad	24688f8fdf	Revert "[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul" This reverts commit `90da0b9a5a`. It was causing an LLVM_ENABLE_EXPENSIVE_CHECKS buildbot failure.	2021-10-04 20:26:30 +01:00
Amara Emerson	dafcbfdaa0	[GlobalISel] Widen G_EXTRACT_VECTOR_ELT using anyext instead of sext. G_SEXT seems to be unnecessary here, anyext will do. Differential Revision: https://reviews.llvm.org/D110469	2021-10-04 12:19:19 -07:00
Jay Foad	90da0b9a5a	[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul Also remove some redundancy because the source and result types of any multiply are always the same. Differential Revision: https://reviews.llvm.org/D110926	2021-10-04 19:33:38 +01:00
Jay Foad	dff3454bda	[TwoAddressInstruction] Tweak constraining of tied operands In collectTiedOperands, when handling an undef use that is tied to a def, constrain the dst reg with the actual register class of the src reg, instead of with the register class from the instructions's MCInstrDesc. This makes a difference in some AMDGPU test cases like this, before: %16:sgpr_96 = INSERT_SUBREG undef %15:sgpr_96_with_sub0_sub1(tied-def 0), killed %11:sreg_64_xexec, %subreg.sub0_sub1 After, without this patch: undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec This fails machine verification if you force it to run after TwoAddressInstruction (currently it is disabled) with: * Bad machine code: Invalid register class for subregister index * - function: s_load_constant_v3i32_align4 - basic block: %bb.0 (0xa011a88) - instruction: undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec - operand 0: undef %16.sub0_sub1:sgpr_96 Register class SGPR_96 does not fully support subreg index 4 After, with this patch: undef %16.sub0_sub1:sgpr_96_with_sub0_sub1 = COPY killed %11:sreg_64_xexec See also svn r159120 which introduced the code to handle tied undef uses. Differential Revision: https://reviews.llvm.org/D110944	2021-10-01 20:57:58 +01:00
Jay Foad	61ecfc6f9d	[TwoAddressInstruction] Pre-commit a test case for D110944	2021-10-01 20:57:57 +01:00
Jay Foad	156d7d2df7	[LiveIntervals] Remove unused subreg ranges in repairIntervalsInRange If the old instructions mentioned a subreg that the new instructions do not, remove the subrange for that subreg. For example, in TwoAddressInstructionPass::eliminateRegSequence, if a use operand in the REG_SEQUENCE has the undef flag then we don't generate a copy for it so after the elimination there should be no live interval at all for the corresponding subreg of the def. This is a small step towards switching TwoAddressInstructionPass over from LiveVariables to LiveIntervals. Currently this path is only tested if you explicitly enable -early-live-intervals. Differential Revision: https://reviews.llvm.org/D110542	2021-09-30 09:15:10 +01:00
Ruiling Song	52785989e9	AMDGPU: Broadcast scalar boolean to vector boolean explicitly This is used to fix wrong code generation of s_add_co_select_user in test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll s_addc_u32 s4, s6, 0 s_cselect_b64 vcc, 1, 0 <-- vcc set as 0x1 if SCC==1 v_mov_b32_e32 v1, s4 s_cmp_gt_u32 s6, 31 v_cndmask_b32_e32 v1, 0, v1, vcc If the s_addc_u32 set SCC, then we will get value 0x1 in VCC. The v_cndmask will do per thread selection with VCC as condition register. As VCC only gets the first bit being set, only the first thread/lane in destination register can get correct result if the very first lane is active. In fact, we should broadcast the value to all active lanes of the final register. The idea here is doing this broadcast to vector boolean explicitly instead of lowering it into a COPY from SCC which would be interpreted as selecting between 0/1. This is used to replace D109754. Reviewed-by: foad, alex-t Differential Revision: https://reviews.llvm.org/D109889	2021-09-30 10:15:01 +08:00
Praveen Velliengiri	e90b512c4d	[AMDGPU] Change ASAN init/fini kernels linkage to external. HSA runtime fails to find the symbols for Init and Fini kernels as they mark with internal linkage, changing the linkage to external to fix those errors. Differential Revision: https://reviews.llvm.org/D110054	2021-09-27 11:50:37 -06:00
Sebastian Neubauer	bf980930e5	[AMDGPU] Ignore KILLs when forming clauses KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042	2021-09-27 16:33:52 +02:00
Amara Emerson	acd13994d1	[GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour.	2021-09-26 17:25:38 -07:00
Amara Emerson	f4cfda03d6	[AArch64][AMDGPU] Re-generate some tests with CHECK-NEXT to prepare for a patch.	2021-09-24 18:26:08 -07:00
Stanislav Mekhanoshin	cf74ef134c	[AMDGPU] Limit promote alloca max size in functions Non-entry functions have 32 caller saved VGPRs available. If we promote alloca to consume more registers we will have to spill CSRs. There is no reason to eliminate scratch access to get another scratch access instead. Differential Revision: https://reviews.llvm.org/D110372	2021-09-24 13:38:39 -07:00
Stanislav Mekhanoshin	08d7eec06e	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit `92c1fd19ab`.	2021-09-24 10:26:11 -07:00
Stanislav Mekhanoshin	082e22f3d7	[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation. This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs. Differential Revision: https://reviews.llvm.org/D110376	2021-09-24 09:46:31 -07:00
Jay Foad	e4e95f14f1	[LiveIntervals] Repair live intervals that gain subranges In repairIntervalsInRange, if the new instructions refer to subregs but the old instructions did not, make sure any existing live interval for the superreg is updated to have subranges. Also skip repairing any range that we have recalculated from scratch, partly for efficiency but also to avoids some cases that repairOldRegInRange can't handle. The existing test/CodeGen/AMDGPU/twoaddr-regsequence.mir provides some test coverage for this change: when TwoAddressInstructionPass converts REG_SEQUENCE into subreg copies, the live intervals will now get subranges and MachineVerifier will verify that the subranges are correct. Unfortunately MachineVerifier does not complain if the subranges are not present, so the test also passed before this patch. This patch also fixes ~800 of the ~1500 failures in the whole CodeGen lit test suite when -early-live-intervals is forced on. Differential Revision: https://reviews.llvm.org/D110328	2021-09-24 11:58:08 +01:00
Jay Foad	7863cc6c1c	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238 Unrevert with some changes to the tests: - Add -verify-machineinstrs to check for remaining problems in live interval support in TwoAddressInstructionPass. - Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from some of those remaining problems.	2021-09-24 11:44:49 +01:00
Christudasan Devadasan	7a62a5b56d	[AMDGPU] Legalize initialized LDS variables We don't allow an initializer for LDS variables and there is an early abort during instruction selection. This patch legalizes them by ignoring the init values. During assembly emission, proper error reporting already exists for such instances. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109901	2021-09-23 22:53:20 -04:00
Vang Thao	1443ba6163	[AMDGPU] Propagate defining src reg for AGPR to AGPR Copys On targets that do not support AGPR to AGPR copying directly, try to find the defining accvgpr_write and propagate its source vgpr register to the copies before register allocation so the source vgpr register does not get clobbered. The postrapseudos pass also attempt to propagate the defining accvgpr_write but if the register to propagate is clobbered, it will give up and create new temporary vgpr registers instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108830	2021-09-23 15:17:53 -07:00
Jay Foad	deb2ca566a	Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases" This reverts commit `8229cb7412`. It was failing on buildbots with expensive checks enabled.	2021-09-23 17:55:05 +01:00
Jay Foad	8229cb7412	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238	2021-09-23 17:16:14 +01:00
Jay Foad	0205806d0f	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156	2021-09-22 09:36:34 +01:00
Jay Foad	3828ea6181	[AMDGPU] Divergence-driven instruction selection for mul i32 Differential Revision: https://reviews.llvm.org/D109881	2021-09-22 09:36:34 +01:00
Matt Arsenault	4c2ee57148	AMDGPU: Fix test relying on incompatible attributes This combination of amdgpu-waves-per-eu and amdgpu-flat-work-group-size cannot be satisfied at the same time, so this was using the default.	2021-09-21 22:44:35 -04:00
Arthur Eubanks	e42234383e	Make DiagnosticInfoResourceLimit's limit param required And always print it. This makes some LLVM diagnostics match up better with Clang's diagnostics. Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print better diagnostics for those. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D110204	2021-09-21 15:27:58 -07:00
alex-t	1a33294652	[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa. VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite. To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D109900	2021-09-21 21:19:31 +03:00
Brendon Cahoon	cbdf624bb8	[AMDGPU] Correctly merge alias.scope and noalias metadata for memops When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are merged. The rule for merging alias.scope is to take the intersection of the domains and the union of the scopes within those domains. The rule for merging noalias is to take the intersection. The bug is that AMDGPULowerModuleLDS was using concatenation for both alias.scope and noalias. For example, when f1 and f2 are added to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)). Then, concatenation creates noalias metadata for the memcpy that includes both {f1, f2}. That means that the memcpy is assumed not to alias a prior load of f2, which enables the optimizer to remove a load of f2 that occurs after mempcy. The function MDNode::getmostGenericAliasScope defines the semantics for alias.scope. There is a function, combineMetadata in Local.cpp, that uses intersect for noalias. Differential Revision: https://reviews.llvm.org/D110049	2021-09-21 13:02:01 -05:00
Aleksandr Bezzubikov	624e4d087e	[GlobalISel] Support ConstantAsMetadata in IRTranslator When using instructions which have a MetadataAsValue argument (e.g. some target-specific intrinsics) MD canonicalization strips internal MDNodes with a single ConstantAsMetadata child. That prevented IRTranslator from the proper translation of such a calls.	2021-09-21 11:24:56 -04:00
Petar Avramovic	f3366983f0	AMDGPU/GlobalISel: Restore run line erased in D109154 by mistake	2021-09-21 17:03:46 +02:00
Jay Foad	598bebeaa6	[AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac if there are no source modifiers, just like we do for other mad/mac and fma/fmac cases. Differential Revision: https://reviews.llvm.org/D110074	2021-09-21 11:57:45 +01:00
Jay Foad	86dcb59206	[AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used v_fmac with source modifiers forces VOP3 encoding, but it is strictly better to use the VOP3-only v_fma instead, because $dst and $src2 are not tied so it gives the register allocator more freedom and avoids a copy in some cases. This is the same strategy we already use for v_mad vs v_mac and v_fma_legacy vs v_fmac_legacy. Differential Revision: https://reviews.llvm.org/D110070	2021-09-21 11:57:45 +01:00
Amara Emerson	cc65e08fe7	[GlobalISel][Legalizer] Use ArtifactValueFinder first for unmerge combines before trying others. This is motivated by an pathological compile time issue during unmerge combining. We should be able to use the AVF to do simplification. However AMDGPU has a lot of codegen changes which I'm not sure how to evaluate. Differential Revision: https://reviews.llvm.org/D109748	2021-09-21 00:02:15 -07:00
Amara Emerson	7091a7f781	[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't seem to have debug users of defs. However, in the legalizer we're always calling MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive. In some rare cases, this contributes significantly to unreasonably long compile times when we have lots of artifact combiner activity. To verify this, I added asserts to that function when it actually replaced a debug use operand with undef for these artifacts. On CTMark with both -O0 and -Os and debug info enabled, I didn't see a single case where it triggered. In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0 for AArch64 with this change. Differential Revision: https://reviews.llvm.org/D109750	2021-09-20 23:34:42 -07:00
Jay Foad	680592b5d0	[AMDGPU] Regenerate checks	2021-09-20 14:48:23 +01:00
Petar Avramovic	e4c46ddd91	[GlobalISel] Improve elimination of dead instructions in legalizer Add eraseInstr(s) utility functions. Before deleting an instruction collects its use instructions. After deletion deletes use instructions that became trivially dead. This patch clears all dead instructions in existing legalizer mir tests. Differential Revision: https://reviews.llvm.org/D109154	2021-09-20 13:00:58 +02:00
Nikita Popov	80110aafa0	[Tests] Fix incorrect noalias metadata Mostly this fixes cases where !noalias or !alias.scope were passed a scope rather than a scope list. In some cases I opted to drop the metadata entirely instead, because it is not really relevant to the test.	2021-09-18 20:51:00 +02:00
Vang Thao	106959acc1	[AMDGPU] Inline non-kernel functions using extern lds In https://reviews.llvm.org/D100481, forceful inline of all non-kernel functions using lds was disabled since AMDGPULowerModuleLDS pass now handles static lds. However that pass does not handle extern lds so non-kernel functions using extern lds must sill be inline. Reviewed By: hsmhsm, arsenm Differential Revision: https://reviews.llvm.org/D109773	2021-09-16 10:58:51 -07:00

1 2 3 4 5 ...

4894 Commits