llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	74cd4dee20	[AMDGPU] Preserve deadness of vcc when shrinking instructions This doesn't have any effect on codegen now, but it might do in the future if we shrink instructions before post-RA scheduling, which is sensitive to live vs dead defs. Differential Revision: https://reviews.llvm.org/D112305	2021-10-22 14:22:24 +01:00
Simon Pilgrim	a750332d77	AMDGPULibCalls - constify some FuncInfo& arguments. NFCI.	2021-10-22 12:10:58 +01:00
Simon Pilgrim	99a64cc9da	AMDGPULibCalls::parseFunctionName - use reference instead of pointer. NFCI. parseFunctionName allowed a default null pointer, despite it being dereferenced immediately to be used as a reference and that all callers were taking the address of an existing reference. Fixes static analyzer warning about potential dereferenced nulls	2021-10-22 11:45:25 +01:00
Stanislav Mekhanoshin	ca0c92d6a1	[AMDGPU] Allow to use a whole register file on gfx90a for VGPRs In a kernel which does not have calls or AGPR usage we can allocate the whole vector register budget for VGPRs and have no AGPRs as long as VGPRs stay addressable (i.e. below 256). Differential Revision: https://reviews.llvm.org/D111764	2021-10-21 18:24:34 -07:00
Stanislav Mekhanoshin	6185835656	[AMDGPU] Allow rematerialization of SOP with virtual registers D106408 was doing this for all targets although it was reverted due to couple performance regressions on some targets. The difference for AMDGPU is the ability to rematerialize SOP instructions with virtual register uses like we already do for VOP. Differential Revision: https://reviews.llvm.org/D110743	2021-10-20 11:46:50 -07:00
Anshil Gandhi	0567f03331	[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707	2021-10-18 16:53:15 -06:00
Kazu Hirata	8568ca789e	Use llvm::erase_if (NFC)	2021-10-18 09:33:42 -07:00
Jay Foad	d55db4b033	[AMDGPU] Remove unused VirtRegMap analysis. NFC.	2021-10-18 11:55:40 +01:00
Jay Foad	a129932b0d	[AMDGPU] Add link to bug	2021-10-18 10:33:42 +01:00
Jay Foad	012248b0bc	Remove the verifyAfter mechanism that was replaced by D111397 Differential Revision: https://reviews.llvm.org/D111872	2021-10-18 10:26:46 +01:00
Jay Foad	36deb9a670	Add new MachineFunction property FailsVerification TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets you skip machine verification after a particular pass. Unfortunately this is used in generic code in TargetPassConfig itself to skip verification after a generic pass, only because some previous target- specific pass damaged the MIR on that specific target. This is bad because problems in one target cause lack of verification for all targets. This patch replaces that mechanism with a new MachineFunction property called "FailsVerification" which can be set by (usually target-specific) passes that are known to introduce problems. Later passes can reset it again if they are known to clean up the previous problems. Differential Revision: https://reviews.llvm.org/D111397	2021-10-18 10:26:46 +01:00
Piotr Sobczak	d869921004	[AMDGPU] Add patterns for i8/i16 local atomic load/store Add patterns for i8/i16 local atomic load/store. Added tests for new patterns. Copied atomic_[store/load]_local.ll to GlobalISel directory. Differential Revision: https://reviews.llvm.org/D111869	2021-10-18 11:23:10 +02:00
Stanislav Mekhanoshin	7cdb1df8c7	[AMDGPU] Divergence driven selection for fused bitlogic The change adds divergence predicates for fused logical operations. The problem with selecting a scalar fused op such as S_NOR_B32 is that it does not have a VALU counterpart and will be split in moveToVALU. At the same time it prevents selection of a better opcode on the VALU side (such as V_OR3_B32) which does not have a counterpart on SALU side. XNOR opcodes are left as is and selected as scalar to get advantage of the SIInstrInfo::lowerScalarXnor() code which can commute operations to keep one of two opcodes on SALU if possible. See xnor.ll test for this. Differential Revision: https://reviews.llvm.org/D111907	2021-10-18 01:44:25 -07:00
Anshil Gandhi	1830ec94ac	Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols" This reverts commit `03375a3fb3`.	2021-10-15 16:16:18 -06:00
Anshil Gandhi	03375a3fb3	[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707	2021-10-15 11:39:15 -06:00
Michael Liao	bacddf47a8	[amdgpu] Fix a crash case when preserving MDT in SILowerControlFlow - When a redundant MBB is being erased from MDT, check whether its single successor is dominiated by it. If yes, update that successor's idom before erasing MBB; otherwise, it implies MBB is a leaf node and could be erased directly. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111831	2021-10-15 13:21:53 -04:00
Abinav Puthan Purayil	de3038400b	[AMDGPU] Avoid redundant calls to numBits in AMDGPUCodeGenPrepare::replaceMulWithMul24(). The isU24() and isI24() calls numBits to make its decision. This change replaces them with the internal numBits call so that we can use its result for the > 32 bit width cases. Differential Revision: https://reviews.llvm.org/D111864	2021-10-15 19:49:44 +05:30
Abinav Puthan Purayil	0379263f23	[AMDGPU] Fix width check for signed mul24 generation. This changes fixes a case in which the highest set bit of the original result is at bit 31 and sign-extending the mul24 for it would make the result negative. Differential Revision: https://reviews.llvm.org/D111823	2021-10-15 18:53:41 +05:30
Abinav Puthan Purayil	b3c9d84e5a	[AMDGPU] Fix 24-bit mul intrinsic generation for > 32-bit result. The 24-bit mul intrinsics yields the low-order 32 bits. We should only do the transformation if the operands are known to be not wider than 24 bits and the result is known to be not wider than 32 bits. Differential Revision: https://reviews.llvm.org/D111523	2021-10-14 09:00:19 +05:30
Joe Nash	b44eac1b85	[AMDGPU] Remove unneeded emit literal check NFC. This check does not verify any functional property since size 8 was added. Remove it for simplicity. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D111737 Change-Id: Ifd7cbd324a137f939d8dc04acb8fbd54c9527a42	2021-10-13 12:46:22 -04:00
Jay Foad	c885857e9d	[AMDGPU] Enable load clustering in the post-RA scheduler This has a couple of benefits: 1. It can sometimes fix clusters that got broken apart when the register allocator inserted a copy. 2. Post-RA scheduling does not have to worry about increasing register pressure, which in some cases gives it more freedom to reorder instructions. Testing on a collection of 10,000 graphics shaders compiled for gfx1010 showed: - The average length of each run of one or more load instructions increased by about 1%. - The number of runs of two or more load instructions increased by about 4%. Differential Revision: https://reviews.llvm.org/D111646	2021-10-13 17:12:26 +01:00
Stanislav Mekhanoshin	9cf995be6b	[AMDGPU] Promote generic pointer kernel arguments into global The new pass walks kernel's pointer arguments, then loads from them. If a loaded value is a pointer and loaded pointer is unmodified in the kernel before the load, then promote loaded pointer to global. Then recursively continue. Differential Revision: https://reviews.llvm.org/D111464	2021-10-12 10:07:33 -07:00
Jay Foad	66ce1015af	Revert "[AMDGPU] Enable load clustering in the post-RA scheduler" This reverts commit `66e13c7f43`. It was committed by accident.	2021-10-12 16:19:35 +01:00
Jay Foad	66e13c7f43	[AMDGPU] Enable load clustering in the post-RA scheduler This has a couple of benefits: 1. It can sometimes fix clusters that got broken apart when the register allocator inserted a copy. 2. Post-RA scheduling does not have to worry about increasing register pressure, which in some cases gives it more freedom to reorder instructions. Testing on a collection of 10,000 graphics shaders compiled for gfx1010 showed: - The average length of each run of one or more load instructions increased by about 1%. - The number of runs of two or more load instructions increased by about 4%.	2021-10-12 16:09:04 +01:00
hsmahesha	52cb3af08c	[AMDGPU] Remove dead frame indices after sgpr spill. All those frame indices which are dead after sgpr spill should be removed from the function frame. Othewise, there is a side effect such as re-mapping of free frame index ids by the later pass(es) like "stack slot coloring" which in turn could mess-up with the book keeping of "frame index to VGPR lane". Reviewed By: cdevadas Differential Revision: https://reviews.llvm.org/D111150	2021-10-12 09:58:49 +05:30
Roman Lebedev	684cbae89a	[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places	2021-10-11 23:36:06 +03:00
Jay Foad	2e1ad93201	[AMDGPU] Fix copying a machine operand Without this I get: * Bad machine code: Instruction has operand with wrong parent set * - function: available_externally_test - basic block: %bb.0 (0x7dad598) - instruction: %0:r600_treg32_x = MOV 1, 0, 0, 0, $alu_literal_x, 0, 0, 0, -1, 1, $pred_sel_off, @available_externally, 0 Differential Revision: https://reviews.llvm.org/D111549	2021-10-11 20:22:47 +01:00
Joe Nash	b4b7e605a6	[AMDGPU] Support shared literals in FMAMK/FMAAK These instructions should allow src0 to be a literal with the same value as the mandatory other literal. Enable it by introducing an operand that defers adding its value to the MI when decoding till the mandatory literal is parsed. Reviewed By: dp, foad Differential Revision: https://reviews.llvm.org/D111067 Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7	2021-10-11 13:09:54 -04:00
Reid Kleckner	b3a6d096d7	Fix shlib builds for all lib/Target/*/TargetInfo libs They all must depend on MC now that the target registry is in MC. Also fix llvm-cxxdump	2021-10-08 15:21:13 -07:00
Reid Kleckner	89b57061f7	Move TargetRegistry.(h\|cpp) from Support to MC This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support. This allows us to ensure that Support doesn't have includes from MC/*. Differential Revision: https://reviews.llvm.org/D111454	2021-10-08 14:51:48 -07:00
David Stuttard	69f7d81d0a	[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR registers Calculate the number of VGPR registers used as input VGPRs based on these registers rather than the arguments passed in (this conservatively always allocates the maximum). Differential Revision: https://reviews.llvm.org/D101633 Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6	2021-10-08 14:24:35 +01:00
Jay Foad	e996cf7dce	[AMDGPU] Preserve MachineDominatorTree in SILowerControlFlow Updating the MachineDominatorTree is easy since SILowerControlFlow only splits and removes basic blocks. This should save a bit of compile time because previously we would recompute the dominator tree from scratch after this pass. Another reason for doing this is that SILowerControlFlow preserves LiveIntervals which transitively requires MachineDominatorTree. I think that means that SILowerControlFlow is obliged to preserve MachineDominatorTree too as explained here: https://lists.llvm.org/pipermail/llvm-dev/2020-November/146923.html although it does not seem to have caused any problems in practice yet. Differential Revision: https://reviews.llvm.org/D111313	2021-10-07 21:30:26 +01:00
Jack Andersen	bd4dad87f4	[MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState::Debug operands. This change enforces this invariant as part of MachineInstr::addOperand so that all passes emit this flag consistently. RegState::Debug is inconsistently set on DBG_VALUE registers throughout LLVM. This runs the risk of a filtering iterator like MachineRegisterInfo::reg_nodbg_iterator to process these operands erroneously when not parsed from MIR sources. This issue was observed in the development of the llvm-mos fork which adds a backend that relies on physical register operands much more than existing targets. Physical RegUnit 0 has the same numeric encoding as $noreg (indicating an undef for DBG_VALUE). Allowing debug operands into the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision of register numbers with different zero semantics). Eventually, this causes an assert where DBG_VALUE instructions are prohibited from participating in live register ranges. Reviewed By: MatzeB, StephenTozer Differential Revision: https://reviews.llvm.org/D110105	2021-10-07 16:08:52 +01:00
Simon Pilgrim	21661607ca	[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.	2021-10-06 12:04:30 +01:00
Carl Ritson	adf7043a9f	[AMDGPU] Only remove branches in SIInstrInfo::removeBranch Without this change _term instructions can be removed during critical edge splitting. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111126	2021-10-06 10:34:26 +09:00
kpyzhov	095c48fdf3	[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf. Differential revision: https://reviews.llvm.org/D110337	2021-10-05 09:56:04 -04:00
Jay Foad	9ce4f37206	[AMDGPU][GlobalISel] Fix legalization of G_UMULH Scalarize before narrowing because the narrowing implementation does not work on vectors. This matches what we do for regular G_MUL. Differential Revision: https://reviews.llvm.org/D111129	2021-10-05 10:56:02 +01:00
Amara Emerson	8bde5e58c0	Delay outgoing register assignments to last. The delayed stack protector feature which is currently used for SDAG (and thus allows for more commonly generating tail calls) depends on being able to extract the tail call into a separate return block. To do this it also has to extract the vreg->physreg copies that set up the call's arguments, since if it doesn't then the call inst ends up using undefined physregs in it's new spliced block. SelectionDAG implementations can do this because they delay emitting register copies until after the stack arguments are set up. GISel however just processes and emits the arguments in IR order, so stack arguments always end up last, and thus this breaks the code that looks for any register arg copies that precede the call instruction. This patch adds a thunk argument to the assignValueToReg() and custom assignment hooks. For outgoing arguments, register assignments use this return param to return a thunk that does the actual generating of the copies. We collect these until all the outgoing stack assignments have been done and then execute them, so that the copies (and perhaps some artifacts like G_SEXTs) are placed after any stores. Differential Revision: https://reviews.llvm.org/D110610	2021-10-04 12:33:20 -07:00
Dávid Bolvanský	fb84aa2a8f	Fixed warnings in target/parser codes produced by -Wbitwise-instead-of-logicala	2021-10-03 15:04:01 +02:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
Daniil Fukalov	47d6274d4c	[NFC][AMDGPU] Reduce includes dependencies, part 2 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Minor forward declarations, redundant includes and flags in GCNSubtarget cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D109351	2021-10-01 17:50:20 +03:00
Stanislav Mekhanoshin	244aa7f735	[AMDGPU] move hasAGPRs/hasVGPRs into header It is now very simple and can go right into the header allowing optimizer to combine callers, such as isVGPRClass and similar. It does not need anything from the TRI itself anymore, so make it static class member along with the callers. Differential Revision: https://reviews.llvm.org/D110762	2021-09-30 10:02:02 -07:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Ruiling Song	52785989e9	AMDGPU: Broadcast scalar boolean to vector boolean explicitly This is used to fix wrong code generation of s_add_co_select_user in test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll s_addc_u32 s4, s6, 0 s_cselect_b64 vcc, 1, 0 <-- vcc set as 0x1 if SCC==1 v_mov_b32_e32 v1, s4 s_cmp_gt_u32 s6, 31 v_cndmask_b32_e32 v1, 0, v1, vcc If the s_addc_u32 set SCC, then we will get value 0x1 in VCC. The v_cndmask will do per thread selection with VCC as condition register. As VCC only gets the first bit being set, only the first thread/lane in destination register can get correct result if the very first lane is active. In fact, we should broadcast the value to all active lanes of the final register. The idea here is doing this broadcast to vector boolean explicitly instead of lowering it into a COPY from SCC which would be interpreted as selecting between 0/1. This is used to replace D109754. Reviewed-by: foad, alex-t Differential Revision: https://reviews.llvm.org/D109889	2021-09-30 10:15:01 +08:00
Jay Foad	f9b68304a2	[AMDGPU] Enable machine verification after AMDGPUISelDAGToDAG This was introduced in D32628 but it does not seem to be required any more. At least it does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110692	2021-09-29 18:47:19 +01:00
hsmahesha	c0735cb9f1	[AMDGPU] Do not internalize ASan device library functions. ASan device library functions (those starts with the prefix __asan_) are at the moment undergoing through undesired optimizations due to internalization. Hence, in order to avoid such undesired optimizations on ASan device library functions, do not internalize them in the first place. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D110468	2021-09-29 07:19:02 +05:30
Praveen Velliengiri	e90b512c4d	[AMDGPU] Change ASAN init/fini kernels linkage to external. HSA runtime fails to find the symbols for Init and Fini kernels as they mark with internal linkage, changing the linkage to external to fix those errors. Differential Revision: https://reviews.llvm.org/D110054	2021-09-27 11:50:37 -06:00
Sebastian Neubauer	bf980930e5	[AMDGPU] Ignore KILLs when forming clauses KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042	2021-09-27 16:33:52 +02:00
Stanislav Mekhanoshin	cf74ef134c	[AMDGPU] Limit promote alloca max size in functions Non-entry functions have 32 caller saved VGPRs available. If we promote alloca to consume more registers we will have to spill CSRs. There is no reason to eliminate scratch access to get another scratch access instead. Differential Revision: https://reviews.llvm.org/D110372	2021-09-24 13:38:39 -07:00
Stanislav Mekhanoshin	082e22f3d7	[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation. This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs. Differential Revision: https://reviews.llvm.org/D110376	2021-09-24 09:46:31 -07:00
Christudasan Devadasan	7a62a5b56d	[AMDGPU] Legalize initialized LDS variables We don't allow an initializer for LDS variables and there is an early abort during instruction selection. This patch legalizes them by ignoring the init values. During assembly emission, proper error reporting already exists for such instances. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109901	2021-09-23 22:53:20 -04:00
Vang Thao	1443ba6163	[AMDGPU] Propagate defining src reg for AGPR to AGPR Copys On targets that do not support AGPR to AGPR copying directly, try to find the defining accvgpr_write and propagate its source vgpr register to the copies before register allocation so the source vgpr register does not get clobbered. The postrapseudos pass also attempt to propagate the defining accvgpr_write but if the register to propagate is clobbered, it will give up and create new temporary vgpr registers instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108830	2021-09-23 15:17:53 -07:00
Piotr Sobczak	2ac53fffae	[AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders The pass amdgpu-propagate-attributes ("Early/Late propagate attributes from kernels to functions") is currently run also for shaders, where it does nothing. Modify the check so the pass only processes functions for kernels. Differential Revision: https://reviews.llvm.org/D109961	2021-09-23 16:46:56 +02:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Mikael Holmen	e7b169a8ae	[AMDGPU] Fix gcc warnings about unused variables [NFC]	2021-09-23 08:08:00 +02:00
Simon Pilgrim	b1f38a27f0	[Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible. This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up. Differential Revision: https://reviews.llvm.org/D110242	2021-09-22 15:28:08 +01:00
Jay Foad	0205806d0f	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156	2021-09-22 09:36:34 +01:00
Jay Foad	3828ea6181	[AMDGPU] Divergence-driven instruction selection for mul i32 Differential Revision: https://reviews.llvm.org/D109881	2021-09-22 09:36:34 +01:00
Matt Arsenault	ec55dcedce	AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.	2021-09-21 22:57:17 -04:00
Arthur Eubanks	e42234383e	Make DiagnosticInfoResourceLimit's limit param required And always print it. This makes some LLVM diagnostics match up better with Clang's diagnostics. Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print better diagnostics for those. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D110204	2021-09-21 15:27:58 -07:00
alex-t	1a33294652	[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa. VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite. To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D109900	2021-09-21 21:19:31 +03:00
Brendon Cahoon	cbdf624bb8	[AMDGPU] Correctly merge alias.scope and noalias metadata for memops When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are merged. The rule for merging alias.scope is to take the intersection of the domains and the union of the scopes within those domains. The rule for merging noalias is to take the intersection. The bug is that AMDGPULowerModuleLDS was using concatenation for both alias.scope and noalias. For example, when f1 and f2 are added to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)). Then, concatenation creates noalias metadata for the memcpy that includes both {f1, f2}. That means that the memcpy is assumed not to alias a prior load of f2, which enables the optimizer to remove a load of f2 that occurs after mempcy. The function MDNode::getmostGenericAliasScope defines the semantics for alias.scope. There is a function, combineMetadata in Local.cpp, that uses intersect for noalias. Differential Revision: https://reviews.llvm.org/D110049	2021-09-21 13:02:01 -05:00
Dmitry Preobrazhensky	3500e7d2b0	[AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap Differential Revision: https://reviews.llvm.org/D109616	2021-09-21 18:06:02 +03:00
Dmitry Preobrazhensky	b8e7f53208	[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics Differential Revision: https://reviews.llvm.org/D109614	2021-09-21 16:23:20 +03:00
Jay Foad	598bebeaa6	[AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac if there are no source modifiers, just like we do for other mad/mac and fma/fmac cases. Differential Revision: https://reviews.llvm.org/D110074	2021-09-21 11:57:45 +01:00
Jay Foad	86dcb59206	[AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used v_fmac with source modifiers forces VOP3 encoding, but it is strictly better to use the VOP3-only v_fma instead, because $dst and $src2 are not tied so it gives the register allocator more freedom and avoids a copy in some cases. This is the same strategy we already use for v_mad vs v_mac and v_fma_legacy vs v_fmac_legacy. Differential Revision: https://reviews.llvm.org/D110070	2021-09-21 11:57:45 +01:00
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Petar Avramovic	e4c46ddd91	[GlobalISel] Improve elimination of dead instructions in legalizer Add eraseInstr(s) utility functions. Before deleting an instruction collects its use instructions. After deletion deletes use instructions that became trivially dead. This patch clears all dead instructions in existing legalizer mir tests. Differential Revision: https://reviews.llvm.org/D109154	2021-09-20 13:00:58 +02:00
Petar Avramovic	d477a7c2e7	GlobalISel/Utils: Refactor integer/float constant match functions Rework getConstantstVRegValWithLookThrough in order to make it clear if we are matching integer/float constant only or any constant(default). Add helper functions that get DefVReg and APInt/APFloat from constant instr getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal and getIConstantVRegSExtVal. These now only match G_CONSTANT as described in comment. Relevant matchers now return both DefVReg and APInt/APFloat. Replace existing uses of getConstantstVRegValWithLookThrough and getConstantVRegVal with new helper functions. Any constant match is only required in: ConstantFoldBinOp: for constant argument that was bit-cast of float to int getAArch64VectorSplat: AArch64::G_DUP operands can be any constant amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant In other places use integer only constant match. Differential Revision: https://reviews.llvm.org/D104409	2021-09-17 11:22:13 +02:00
Jacob Lambert	4c1023b4b7	[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files. Testing workflow as a new contributor. Differential Revision: https://reviews.llvm.org/D109733	2021-09-16 13:03:09 -07:00
Vang Thao	106959acc1	[AMDGPU] Inline non-kernel functions using extern lds In https://reviews.llvm.org/D100481, forceful inline of all non-kernel functions using lds was disabled since AMDGPULowerModuleLDS pass now handles static lds. However that pass does not handle extern lds so non-kernel functions using extern lds must sill be inline. Reviewed By: hsmhsm, arsenm Differential Revision: https://reviews.llvm.org/D109773	2021-09-16 10:58:51 -07:00
Jay Foad	128a49727a	[AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC. The warning is implemented by D109359 which is still in review. Differential Revision: https://reviews.llvm.org/D109826	2021-09-16 09:07:18 +01:00
Matt Arsenault	f12174204c	AMDGPU: Rename attributor class for uniform-work-group-size This isn't really an AMDGPU specific attribute and could be moved to generic code. It's also important to include the word uniform in the name.	2021-09-14 19:49:08 -04:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Matt Arsenault	c305513cc2	AMDGPU: Fix assert with indirect call with known required inputs The attributor can determine that some indirect calls do not require special inputs. The special inputs will still be present in the ABI, so we need to allocate the registers and pass undefs.	2021-09-13 22:54:11 -04:00
Jay Foad	477b9bc9f7	[AMDGPU] Minor cleanup after D109483. NFC.	2021-09-13 10:27:15 +01:00
Johannes Doerfert	c09fbbdcfb	Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"" This reapplies commit `7dbba3376f`, or, put differently, this reverts commit `d9a8d20827`. The test now requires the amdgpu and nvptx backend explicitly as it won't work without properly.	2021-09-10 15:22:56 -05:00
Kazu Hirata	c9fca53af1	[CodeGen, Target] Use pred_empty and succ_empty (NFC)	2021-09-10 11:11:31 -07:00
Johannes Doerfert	d9a8d20827	Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals" This reverts commit `7dbba3376f`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:23:08 -05:00
Johannes Doerfert	7dbba3376f	[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals Not all address spaces support initializers for globals and we can therefore not set them without checking if they are allowed. This patch adds a hook into TTI to check if an AS allows non-undef initializers. We disable it for all but address space 0 by default, NVPTX and AMDGPU targets allow all but address space 3. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109337	2021-09-10 12:08:50 -05:00
hsmahesha	0c28814015	Revert "[AMDGPU] Split entry basic block after alloca instructions." This reverts commit `98f4713122`. Without any (theoretical/practical) guarantee that all the allocas within entry basic block are clustered together at the beginning of the block, this patch is doomed to fail. Hence reverting it.	2021-09-10 10:23:51 +05:30
Matt Arsenault	0197cd0bd4	AMDGPU: Optimize amdgpu-no-* attributes This allows clobbering a few extra registers in the fixed ABI, and avoids some workitem ID packing instructions.	2021-09-09 18:24:28 -04:00
Matt Arsenault	db4963d080	AMDGPU: Use attributor to propagate uniform-work-group-size Drop the legacy version in AMDGPUAnnotateKernelFeatures. This has the side effect of now respecting the linkage, and not changing externally visible functions.	2021-09-09 18:24:28 -04:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Kazu Hirata	5648f7170e	[Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC)	2021-09-07 09:19:33 -07:00
Peter Smith	e63455d5e0	[MC] Use local MCSubtargetInfo in writeNops On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962	2021-09-07 15:46:19 +01:00
Michael Liao	640beb38e7	[amdgpu] Enable selection of `s_cselect_b64`. Differential Revision: https://reviews.llvm.org/D109159	2021-09-07 10:45:07 -04:00
Mirko Brkusanin	6c4b634da6	[AMDGPU][GlobalISel] Legalize G_MUL for non-standard types Legalizing G_MUL for non-standard types (like i33) generated an error. Putting minScalar and maxScalar instead of clampScalar. Also using new rule, instead of widening to the next power of 2, widen to the next multiple of the passed argument (32 in this case), so instead of widening i65 to i128, we widen it to i96. Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D109228	2021-09-07 16:33:24 +02:00
Mirko Brkusanin	5263bf583a	[AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR Add implementation for the legalization of G_ROTL and G_ROTR machine instructions. They are very similar to funnel shift instructions, the only difference is funnel shifts have 3 operands, whereas rotate instructions have two operands, the first being the register that is being rotated and the second being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering them into funnel shift instructions if they are legal. Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D105347	2021-09-07 16:33:24 +02:00
Mirko Brkusanin	36527cbe02	[AMDGPU][GlobalISel] Legalize memcpy family of intrinsics Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE. Corresponding intrinsics are replaced by a loop that uses loads/stores in AMDGPULowerIntrinsics pass unless their length is a constant lower then MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that reaches legalizer should have a const length argument and should be expanded into appropriate number of loads + stores. Differential Revision: https://reviews.llvm.org/D108357	2021-09-07 12:24:07 +02:00
Stanislav Mekhanoshin	d0c064715c	[AMDGPU] Small cleanup in optimizeCompareInstr. NFC.	2021-09-03 11:31:40 -07:00
Matt Arsenault	79bcd4a7db	AMDGPU: Remove FeatureLocalMemorySize0 There's no reason to make this an explicit feature, since it's implied by the lack of a feature with a size.	2021-09-02 22:43:01 -04:00
Stanislav Mekhanoshin	78fbd1aa3d	[AMDGPU] Process any power of 2 in optimizeCompareInstr Differential Revision: https://reviews.llvm.org/D109201	2021-09-02 17:39:17 -07:00
Stanislav Mekhanoshin	2cfda6a691	[AMDGPU] Fold immediates in the optimizeCompareInstr Peephole works before the first SIFoldOperands so most of the immediates are in registers. Differential Revision: https://reviews.llvm.org/D109186	2021-09-02 17:23:26 -07:00
Stanislav Mekhanoshin	832c87b4fb	[AMDGPU] Use S_BITCMP0_* to replace AND in optimizeCompareInstr These can be used for reversed conditions if result of the AND is unused except in the compare: s_cmp_eq_u32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0 s_cmp_eq_i32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0 s_cmp_eq_u64 (s_and_b64 $src, 1), 0 => s_bitcmp0_b64 $src, 0 s_cmp_lg_u32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0 s_cmp_lg_i32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0 s_cmp_lg_u64 (s_and_b64 $src, 1), 1 => s_bitcmp0_b64 $src, 0 Differential Revision: https://reviews.llvm.org/D109099	2021-09-02 09:38:01 -07:00
Piotr Sobczak	30d6c39bca	[AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD. Merging into DWORDX2 and DWORDX4 variants is handled already. Differential Revision: https://reviews.llvm.org/D108909	2021-09-02 16:26:25 +02:00
Stanislav Mekhanoshin	f3645c792a	[AMDGPU] Use S_BITCMP1_* to replace AND in optimizeCompareInstr Differential Revision: https://reviews.llvm.org/D109082	2021-09-01 15:59:12 -07:00
Stanislav Mekhanoshin	bf77b11277	[AMDGPU] Introduce optimizeCompareInstr The following patterns are currently handled: s_cmp_eq_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_eq_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_eq_u64 (s_and_b64 $src, 1), 1 => s_and_b64 $src, 1 s_cmp_ge_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_ge_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_lg_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 s_cmp_lg_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 s_cmp_lg_u64 (s_and_b64 $src, 1), 0 => s_and_b64 $src, 1 s_cmp_gt_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 s_cmp_gt_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 Differential Revision: https://reviews.llvm.org/D109031	2021-09-01 15:57:05 -07:00

1 2 3 4 5 ...

6384 Commits