llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikolay Haustov	8e3f099497	[AMDGPU] Assembler: Fix s_setpc_b64 s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to. Differential Revision: http://reviews.llvm.org/D17888 llvm-svn: 263005	2016-03-09 10:56:19 +00:00
Matt Arsenault	c89f2919a4	AMDGPU: Match more med3 integer patterns llvm-svn: 262864	2016-03-07 21:54:48 +00:00
Matt Arsenault	56356c8a9c	AMDGPU: Remove a fixme for ptrrtoint handling llvm-svn: 262854	2016-03-07 21:12:46 +00:00
Matt Arsenault	81d06015c6	AMDGPU: Move function only used by R600 llvm-svn: 262853	2016-03-07 21:10:13 +00:00
Valery Pykhtin	dc11054f20	[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler. Engages code from r262804. Differential Revision: http://reviews.llvm.org/D17151 llvm-svn: 262808	2016-03-06 20:25:36 +00:00
Valery Pykhtin	50cd3c4ec7	fix sanitizer-ppc64be-linux failure for r262804 error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move] http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/930 llvm-svn: 262805	2016-03-06 15:13:54 +00:00
Valery Pykhtin	499a5c6323	[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields Differential Revision: http://reviews.llvm.org/D17150 llvm-svn: 262804	2016-03-06 13:27:13 +00:00
Valery Pykhtin	0c6293da68	[AMDGPU] SOPxx instructions operand naming fixed in td files. dst -> sdst ssrcN -> srcN Differential Revision: http://reviews.llvm.org/D17646 llvm-svn: 262801	2016-03-06 10:31:44 +00:00
Tom Stellard	649b5db557	AMDGPU/SI: Add support for spiling SGPRs to scratch buffer Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732	2016-03-04 18:31:18 +00:00
Tom Stellard	ebef6f9771	AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter Summary: This allows us to use virtual registers when we need extra registers for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex(). Once all the frame indices have been eliminated, the PrologEpilogueInserter does an extra pass over the program to replace all virtual registers with physical ones. This allows us to make more efficient use of our emergency spill slots, so we only need to create one. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17591 llvm-svn: 262728	2016-03-04 18:02:01 +00:00
Sam Kolton	f51f4b8370	Test commit access llvm-svn: 262714	2016-03-04 12:29:14 +00:00
Valery Pykhtin	824e804bf6	test commit llvm-svn: 262709	2016-03-04 10:59:50 +00:00
Nikolay Haustov	5bf46ac150	AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. Initial change by Nicolai H.hnle Differential Revision: http://reviews.llvm.org/D17401 llvm-svn: 262701	2016-03-04 10:39:50 +00:00
Tom Stellard	cc7067a668	AMDGPU: Insert two S_NOP instructions for every high level source statement. Patch by: Konstantin Zhuravlyov Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input. Reviewers: tstellarAMD, arsenm Subscribers: echristo, dblaikie, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17454 llvm-svn: 262579	2016-03-03 03:53:29 +00:00
Tom Stellard	600ca6fd39	AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs Summary: When there were no free SGPRs, we were trying to move this value into some of the reserved registers which was causing a segmentation fault. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17590 llvm-svn: 262577	2016-03-03 03:45:09 +00:00
Matt Arsenault	8226fc4829	AMDGPU: Simplify boolean conditional return statements Patch by Richard Thomson llvm-svn: 262536	2016-03-02 23:00:21 +00:00
Nikolay Haustov	f2fbabe9c1	Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields" Build failure with clang. llvm-svn: 262477	2016-03-02 11:16:56 +00:00
Nikolay Haustov	f0f24628cb	Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler." Build failure with clang. llvm-svn: 262475	2016-03-02 10:54:21 +00:00
Nikolay Haustov	73447a9714	[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler. complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17151 llvm-svn: 262474	2016-03-02 10:36:30 +00:00
Nikolay Haustov	6c8c74969a	[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields This is going to be used in .hsatext disassembler and can be used in current assembler parser (lit tests passed on parsing). Code using this helpers isn't included in this patch. Benefits: unified approach fast field name lookup on parsing Later I would like to enhance some of the field naming/syntax using this code. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17150 llvm-svn: 262473	2016-03-02 10:36:25 +00:00
Matt Arsenault	f2dcb4737b	AMDGPU: Fix bug 26659. Fix checking the same instruction twice instead of the second branch that uses vccz. I don't think this matters currently because s_branch_vccnz is always used currently. llvm-svn: 262457	2016-03-02 04:12:39 +00:00
Matt Arsenault	a266bd8760	AMDGPU: Cleanup suggested in bug 23960 llvm-svn: 262456	2016-03-02 04:05:14 +00:00
Matt Arsenault	5de68cbc4c	Bug 20810: Use report_fatal_error instead of unreachable llvm-svn: 262455	2016-03-02 03:33:55 +00:00
Matthias Braun	17cb57995e	TableGen: Check scheduling models for completeness TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384	2016-03-01 20:03:21 +00:00
Changpeng Fang	24f035af32	AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356	2016-03-01 17:51:23 +00:00
Nikolay Haustov	e309e1415d	[AMDGPU] Remove unused disassembler code. llvm-svn: 262346	2016-03-01 16:02:40 +00:00
Nikolay Haustov	47a115cd41	[AMDGPU] Fix build warnings. llvm-svn: 262338	2016-03-01 14:50:59 +00:00
Nikolay Haustov	ac106add0f	[AMDGPU] Disassembler code refactored + error messages. Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out. 64bit instructions decoding added. Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code. The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel. Previous TODOs were saved whenever possible. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17720 llvm-svn: 262332	2016-03-01 13:57:29 +00:00
Nikolay Haustov	ea8febde04	[TableGen] AsmMatcher: Skip optional operands in the midle of instruction if it is not present Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented. For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString: string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod"; Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod). Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal. Patch by: Sam Kolton Review: http://reviews.llvm.org/D17568 [AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods With this change you should place optional operands in order specified by asm string: clamp -> omod offset -> glc -> slc -> tfe Fixes for several tests. Depends on D17568 Patch by: Sam Kolton Review: http://reviews.llvm.org/D17644 llvm-svn: 262314	2016-03-01 08:34:43 +00:00
Matt Arsenault	d275fcabcb	AMDGPU: Don't emit build_pair during udivrem legalization Technically you aren't supposed to emit these after type legalization for some reason, and we use vector extracts of bitcasted integers as the canonical way to do this. llvm-svn: 262298	2016-03-01 05:06:05 +00:00
Matt Arsenault	f4dfc1a027	AMDGPU: Don't use estimated stack size when we know the real stack size llvm-svn: 262297	2016-03-01 04:58:20 +00:00
Matt Arsenault	59b8b77405	AMDGPU: Set HasExtractBitInsn This currently does not have the control over the bitwidth, and there are missing optimizations to reduce the integer to 32-bit if it can be. But in most situations we do want the sinking to occur. llvm-svn: 262296	2016-03-01 04:58:17 +00:00
Matt Arsenault	3a61985b2f	AMDGPU: More bits of frame index are known to be zero The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153	2016-02-27 20:26:57 +00:00
Duncan P. N. Exon Smith	be8f8c4478	CodeGen: Update LiveIntervalAnalysis API to use MachineInstr&, NFC These parameters aren't expected to be null, so take them by reference. llvm-svn: 262151	2016-02-27 20:14:29 +00:00
Duncan P. N. Exon Smith	5702287809	CodeGen: Update DFAPacketizer API to take MachineInstr&, NFC In all but one case, change the DFAPacketizer API to take MachineInstr& instead of MachineInstr*. In DFAPacketizer::endPacket(), take MachineBasicBlock::iterator. Besides cleaning up the API, this is in search of PR26753. llvm-svn: 262142	2016-02-27 19:09:00 +00:00
Matt Arsenault	9d82ee7526	AMDGPU: Split vi-insts subtarget feature This will be more useful for marking builtins acceptable for which subtargets. llvm-svn: 262121	2016-02-27 08:53:55 +00:00
Matt Arsenault	274d34e725	AMDGPU: Add s_sleep intrinsic llvm-svn: 262120	2016-02-27 08:53:52 +00:00
Matt Arsenault	61738cbcb6	AMDGPU: Implement readcyclecounter This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119	2016-02-27 08:53:46 +00:00
Duncan P. N. Exon Smith	3ac9cc6156	CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are never null, so this cleans up the API a bit. It also incidentally removes a few implicit conversions from MachineInstrBundleIterator to MachineInstr* (see PR26753). At a couple of call sites it was convenient to convert to a range-based for loop over MachineBasicBlock::instr_begin/instr_end, so I added MachineBasicBlock::instrs. llvm-svn: 262115	2016-02-27 06:40:41 +00:00
Nikolay Haustov	2f684f1347	[AMDGPU] Assembler: Basic support for MIMG Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 llvm-svn: 261995	2016-02-26 09:51:05 +00:00
Nikolay Haustov	161a158e5c	[AMDGPU] Disassembler: Support for all VOP1 instructions. Support all instructions with VOP1 encoding with 32 or 64-bit operands for VI subtarget: VGPR_32 and VReg_64 operand register classes VS_32 and VS_64 operand register classes with inline and literal constants Tests for VOP1 instructions. Patch by: skolton Reviewers: arsenm, tstellarAMD Review: http://reviews.llvm.org/D17194 llvm-svn: 261878	2016-02-25 16:09:14 +00:00
Nikolay Haustov	2e4c72977c	[AMDGPU] Assembler: Simplify handling of optional operands Resubmit with index problem fixed. Verified with valgrind. Prepare to support DPP encodings. For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead. Mark more operands as IsOptional = 1 in .td files. Do not add default values for optional operands to OperandVector in AMDGPUAsmParser. Add default values for optional operands during conversion using new helper addOptionalImmOperand. Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one. Separate cvtFlat and cvtFlatAtomic. Fix CNDMASK_B32 definition to have no modifiers. Review: http://reviews.llvm.org/D17445 llvm-svn: 261856	2016-02-25 10:58:54 +00:00
NAKAMURA Takumi	3d3d0f4151	Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands" It brought undefined behavior. llvm-svn: 261839	2016-02-25 08:35:27 +00:00
Nikolay Haustov	4f073ca7fa	[AMDGPU] Assembler: Simplify handling of optional operands Prepare to support DPP encodings. For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead. Mark more operands as IsOptional = 1 in .td files. Do not add default values for optional operands to OperandVector in AMDGPUAsmParser. Add default values for optional operands during conversion using new helper addOptionalImmOperand. Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one. Separate cvtFlat and cvtFlatAtomic. Fix CNDMASK_B32 definition to have no modifiers. Review: http://reviews.llvm.org/D17445 Reviewers: tstellarAMD llvm-svn: 261742	2016-02-24 14:22:47 +00:00
Nikolay Haustov	3c728e43aa	[AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields) lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t. Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com) Reviewers: arsenm llvm-svn: 261732	2016-02-24 10:54:25 +00:00
Matt Arsenault	cd09961fb3	AMDGPU: Check cheaper condition before SignBitIsZero Don't do an expensive computeKnownBits call when we can do the cheap check for legal offsets first. llvm-svn: 261720	2016-02-24 04:55:29 +00:00
Nikolay Haustov	2a62b3c244	[AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64 src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64). src0 and src1 of s_bfm_b64 are 32-bit. Update tests. Review: http://reviews.llvm.org/D17480 Reviewers: arsenm llvm-svn: 261621	2016-02-23 09:19:14 +00:00
Duncan P. N. Exon Smith	6307eb5518	CodeGen: TII: Take MachineInstr& in predicate API, NFC Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605	2016-02-23 02:46:52 +00:00
Duncan P. N. Exon Smith	d84f600653	CodeGen: Bring back MachineBasicBlock::iterator::getInstrIterator()... This is a little embarrassing. When I reverted r261504 (getIterator() => getInstrIterator()) in r261567, I did a `git grep` to see if there were new calls to `getInstrIterator()` that I needed to migrate. There were 10-20 hits, and I blindly did a `sed ...` before calling `ninja check`. However, these were `MachineInstrBundleIterator::getInstrIterator()`, which predated r261567. Perhaps coincidentally, these had an identical name and return type. This commit undoes my careless sed and restores `MachineBasicBlock::iterator::getInstrIterator()`. llvm-svn: 261577	2016-02-22 21:30:15 +00:00
Matt Arsenault	fa67bdbde0	AMDGPU/R600: Implement allowsMisalignedMemoryAccess This avoids some test regressions in a future commit when unaligned operations are expanded when they have custom lowering. llvm-svn: 261570	2016-02-22 21:04:16 +00:00
Duncan P. N. Exon Smith	c5b668deb8	Revert "CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC" This reverts commit r261504, since it's not obvious the new name is better: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160222/334298.html I'll recommit if we get consensus that it's the right direction. llvm-svn: 261567	2016-02-22 20:49:58 +00:00
Tom Stellard	d93a34f714	[AMDGPU][llvm-mc] Support for 32-bit inline literals Patch by: Artem Tamazov Summary: Note: Support for 64-bit inline literals TBD Added: Support of abs/neg modifiers for literals (incomplete; parsing TBD). Added: Some TODO comments. Reworked/clarity: rename isInlineImm() to isInlinableImm() Reworked/robustness: disallow BitsToFloat() with undefined value in isInlinableImm() Reworked/reuse: isSSrc32/64(), isVSrc32/64() Tests added. Reviewers: tstellarAMD, arsenm Subscribers: vpykhtin, nhaustov, SamWot, arsenm Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D17204 llvm-svn: 261559	2016-02-22 19:17:56 +00:00
Tom Stellard	82fc153baa	[AMDGPU] [llvm-mc] [VI] Fix encoding of LDS/GDS instructions. Patch by: Artem Tamazov Summary: Tests added. Reviewers: tstellarAMD, arsenm Subscribers: vpykhtin, SamWot, #llvm-amdgpu-spb Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D17271 llvm-svn: 261558	2016-02-22 19:17:53 +00:00
Duncan P. N. Exon Smith	dc0848c029	CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC Delete MachineInstr::getIterator(), since the term "iterator" is overloaded when talking about MachineInstr. - Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so that ilist_node::getIterator() is still available. - Add it back as MachineInstr::getInstrIterator(). This matches the naming in MachineBasicBlock. - Add MachineInstr::getBundleIterator(). This is explicitly called "bundle" (not matching MachineBasicBlock) to disintinguish it clearly from ilist_node::getIterator(). - Update all calls. Some of these I switched to `auto` to remove boiler-plate, since the new name is clear about the type. There was one call I updated that looked fishy, but it wasn't clear what the right answer was. This was in X86FrameLowering::inlineStackProbe(), added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to leave the behaviour unchanged, but I'll reply to the original commit on the list in a moment. llvm-svn: 261504	2016-02-21 22:58:35 +00:00
Tom Stellard	467b5b9024	AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385	2016-02-20 00:37:25 +00:00
Tom Stellard	2d26fe7aa6	AMDGPU/SI: Fix s_waitcnt insertion for flat instructions Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330	2016-02-19 15:33:13 +00:00
Nicolai Haehnle	f2c64db55a	AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224	2016-02-18 16:44:18 +00:00
Nikolay Haustov	10813a4efa	Test commit access. llvm-svn: 261199	2016-02-18 10:02:12 +00:00
Tom Stellard	e1818af8c5	[AMDGPU] Disassembler: Added basic disassembler for AMDGPU target Changes: - Added disassembler project - Fixed all decoding conflicts in .td files - Added DecoderMethod=“NONE” option to Target.td that allows to disable decoder generation for an instruction. - Created decoding functions for VS_32 and VReg_32 register classes. - Added stubs for decoding all register classes. - Added several tests for disassembler Disassembler only supports: - VI subtarget - VOP1 instruction encoding - 32-bit register operands and inline constants [Valery] One of the point that requires to pay attention to is how decoder conflicts were resolved: - Groups of target instructions were separated by using different DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate approach. - There were conflicts in IMAGE_<> instructions caused by two different reasons: 1. dmask wasn’t specified for the output (fixed) 2. There are image instructions that differ only by the number of the address components but have the same encoding by the HW spec. The actual number of address components is determined by the HW at runtime using image resource descriptor starting from the VGPR encoded in an IMAGE instruction. This means that we should choose only one instruction from conflicting group to be the rule for decoder. I didn’t find the way to disable decoder generation for an arbitrary instruction and therefore made a onelinear fix to tablegen generator that would suppress decoder generation when DecoderMethod is set to “NONE”. This is a change that should be reviewed and submitted first. Otherwise I would need to specify different DecoderNamespace for every instruction in the conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not used in other targets. 3. IMAGE_GATHER decoder generation is for now disabled and to be done later. [/Valery] Patch By: Sam Kolton Differential Revision: http://reviews.llvm.org/D16723 llvm-svn: 261185	2016-02-18 03:42:32 +00:00
Tom Stellard	cc4c8718ed	[AMDGPU] Rename $dst operand to $vdst for VOP instructions. Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986	2016-02-16 18:14:56 +00:00
Matt Arsenault	f2ddbf00ed	AMDGPU: Prepare for reducing private element size. Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804	2016-02-13 04:18:53 +00:00
Tom Stellard	4409051d00	AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792	2016-02-13 02:09:49 +00:00
Matt Arsenault	d2759212b8	AMDGPU: Cleanup includes and random macros llvm-svn: 260784	2016-02-13 01:24:08 +00:00
Matt Arsenault	ce56a0ef54	AMDGPU: Add intrinsics for sin/cos These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782	2016-02-13 01:19:56 +00:00
Matt Arsenault	79963e80b8	AMDGPU: Rename intrinsic to better match instruction name Also fixes missing f32 test. llvm-svn: 260780	2016-02-13 01:03:00 +00:00
Tom Stellard	607051640c	AMDGPU/SI: Add instruction defs for VOP1 DPP instructions Reviewers: nhaustov, cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17159 llvm-svn: 260774	2016-02-13 00:51:31 +00:00
Matt Arsenault	16f48d7c55	AMDGPU: Fix broken condition causing warning llvm-svn: 260773	2016-02-13 00:36:10 +00:00
Tom Stellard	bc4497b13c	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765	2016-02-12 23:45:29 +00:00
Tom Stellard	46937ca4e7	[AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694	2016-02-12 17:57:54 +00:00
Changpeng Fang	e07f1aa8fa	AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass. Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692	2016-02-12 17:11:04 +00:00
Benjamin Kramer	ac5e36f52e	Fix uninitialized memory read. Found by msan. llvm-svn: 260676	2016-02-12 12:37:21 +00:00
Matt Arsenault	296b849163	AMDGPU: Set flat_scratch from flat_scratch_init reg This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658	2016-02-12 06:31:30 +00:00
Matt Arsenault	24ee0785dd	AMDGPU: Set element_size in private resource descriptor Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651	2016-02-12 02:40:47 +00:00
Matt Arsenault	16f7bcb661	AMDGPU: Fix mishandling alignment when scalarizing vector loads/stores I don't think this was causing any real problems, so I'm not sure how to test for this. llvm-svn: 260646	2016-02-12 02:22:21 +00:00
Matt Arsenault	55d49cfe2d	AMDGPU: Initialize SILowerControlFlow llvm-svn: 260645	2016-02-12 02:16:10 +00:00
Matt Arsenault	806dd0a532	AMDGPU: Remove trailing whitespace llvm-svn: 260644	2016-02-12 02:16:07 +00:00
Tom Stellard	1397d49ef5	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599	2016-02-11 21:45:07 +00:00
Tom Stellard	fa8f204c5b	AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist Summary: When we split SMRD instructions into two MUBUFs we were adding the users of the newly created MUBUFs to the VALU worklist. However, the only users these instructions had was the REG_SEQUENCE that was inserted by splitSMRD when the original SMRD instruction was split. We need to make sure to add the users of the original SMRD to the VALU worklist before it is split. I have a test case, but it requires one other bug fix, so it will be added in a later commt. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17101 llvm-svn: 260588	2016-02-11 21:14:34 +00:00
Tom Stellard	e993451f5c	[AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing Summary: Added support for "VOP3Only" attribute in VOP3bInst encoding. Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns. Added support for multi-dest instructions in AMDGPUAs::cvt*(). Added lit test for "V_DIV_SCALE_F64\|F32 vreg,vcc\|sreg,vreg,vreg,vreg". Reviewers: tstellarAMD, arsenm Subscribers: arsenm, SamWot, nhaustov, vpykhtin Differential Revision: http://reviews.llvm.org/D16995 Patch By: Artem Tamazov llvm-svn: 260560	2016-02-11 18:25:26 +00:00
Matt Arsenault	fcb345f172	AMDGPU: Fix constant bus use check with subregisters If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495	2016-02-11 06:15:39 +00:00
Matt Arsenault	427c548925	AMDGPU: Fix passes depending on dominator tree for no reason llvm-svn: 260494	2016-02-11 06:15:34 +00:00
Matt Arsenault	fe26def35c	AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis llvm-svn: 260491	2016-02-11 05:32:51 +00:00
Matt Arsenault	9524566314	AMDGPU: Split R600 and SI store lowering These were only sharing some somewhat incorrect logic for when to scalarize or split vectors. llvm-svn: 260490	2016-02-11 05:32:46 +00:00
Tom Stellard	a90b9526df	[AMDGPU] Assembler: Fix VOP3 only instructions Separate methods to convert parsed instructions to MCInst: - VOP3 only instructions (always create modifiers as operands in MCInst) - VOP2 instrunctions with modifiers (create modifiers as operands in MCInst when e64 encoding is forced or modifiers are parsed) - VOP2 instructions without modifiers (do not create modifiers as operands in MCInst) - Add VOP3Only flag. Pass HasMods flag to VOP3Common. - Simplify code that deals with modifiers (-1 is now same as 0). This is no longer needed. - Add few tests (more will be added separately). Update error message now correct. Patch By: Nikolay Haustov Differential Revision: http://reviews.llvm.org/D16778 llvm-svn: 260483	2016-02-11 03:28:15 +00:00
Nicolai Haehnle	d791bd07c7	AMDGPU: Release the scavenged offset register during VGPR spill Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427	2016-02-10 20:13:58 +00:00
Matt Arsenault	a14364115e	AMDGPU: Fix indentation and variable names llvm-svn: 260399	2016-02-10 18:21:45 +00:00
Matt Arsenault	6dfda9625d	AMDGPU: Split R600 and SI load lowering These weren't actually sharing anything in the common LowerLOAD. llvm-svn: 260398	2016-02-10 18:21:39 +00:00
Ahmed Bougacha	f8dfb47c02	[CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI. llvm-svn: 260316	2016-02-09 22:54:12 +00:00
Tom Stellard	309617645d	AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit Summary: We will hit this once we have enabled uniform branches. The smrd-vccz-bug.ll test will be added with the uniform branch commit. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16725 llvm-svn: 260137	2016-02-08 19:49:20 +00:00
Matt Arsenault	92edab2df9	AMDGPU: Remove bfi and bfm intrinsics Nothing is using them. llvm-svn: 260123	2016-02-08 19:06:01 +00:00
Matt Arsenault	7f83397d72	AMDGPU: Account for LDS alignment The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912	2016-02-05 19:47:29 +00:00
Matt Arsenault	cf84e26fb6	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Tom Stellard	1242ce9695	AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900	2016-02-05 18:44:57 +00:00
Tom Stellard	5dde1d2eb3	AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16863 llvm-svn: 259897	2016-02-05 18:29:17 +00:00
Tom Stellard	6e1967ef66	AMDGPU/SI: Correctly initialize SIInsertWaits pass Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894	2016-02-05 17:42:38 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	fb8cdbae0c	AMDGPU: Minor cleanups for AMDGPUPromoteAlloca Mostly convert to use range loops. llvm-svn: 259550	2016-02-02 19:32:35 +00:00
Matt Arsenault	e5737f7cac	AMDGPU: Report AMDGPUPromoteAlloca changed the function llvm-svn: 259547	2016-02-02 19:18:57 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Oliver Stannard	7e7d983a87	Refactor backend diagnostics for unsupported features Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498	2016-02-02 13:52:43 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00
Matt Arsenault	d0799df707	AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work correctly. llvm-svn: 259296	2016-01-30 05:10:59 +00:00
Matt Arsenault	43976df0da	AMDGPU: Add new amdgcn workitem intrinsics These use the correct prefix and follow the HSA naming convention rather than the config register option names. llvm-svn: 259293	2016-01-30 04:25:19 +00:00
Matt Arsenault	295875efda	AMDGPU: Remove 24-bit intrinsics The known bit matching code seems to work reasonably well, so these shouldn't really be needed. llvm-svn: 259180	2016-01-29 10:05:16 +00:00
Matt Arsenault	5b39b34ca5	AMDGPU: Match fmed3 patterns with legacy fmin/fmax llvm-svn: 259090	2016-01-28 20:53:48 +00:00
Matt Arsenault	f639c32739	AMDGPU: Match some med3 patterns llvm-svn: 259089	2016-01-28 20:53:42 +00:00
Matt Arsenault	7293f9895e	AMDGPU: Set DX10Clamp bit llvm-svn: 259088	2016-01-28 20:53:35 +00:00
Tom Stellard	3d2c852958	AMDGPU: waitcnt operand fixes Summary: Allow lgkmcnt up to 0xF (hardware allows that). Fix mask for ExpCnt in AMDGPUInstPrinter. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16314 Patch by: Nikolay Haustov llvm-svn: 259059	2016-01-28 17:13:44 +00:00
Tom Stellard	2ff726272a	AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054	2016-01-28 16:04:37 +00:00
Oliver Stannard	02fa1c80c4	Revert r259035, it introduces a cyclic library dependency llvm-svn: 259045	2016-01-28 13:19:47 +00:00
Oliver Stannard	b4b092ea1b	Add backend dignostic printer for unsupported features Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035	2016-01-28 10:07:27 +00:00
NAKAMURA Takumi	628a7a0aef	Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016	2016-01-28 04:41:32 +00:00
Oliver Stannard	1e67a9f196	Refactor backend diagnostics for unsupported features The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951	2016-01-27 17:30:33 +00:00
Tom Stellard	6e3b14de62	AMDGPU/SI: Fix commuting of 32-bit VOPC instructions Summary: We didn't have entries in the commuting table for the 32-bit instructions. I don't think we hit this problem now, but we will once uniform branching is enabled. Tests will come in a later commit. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16600 llvm-svn: 258936	2016-01-27 15:53:52 +00:00
Marek Olsak	e86f252209	AMDGPU/SI: Stoney has only 16 LDS banks Summary: This is a candidate for stable, along with all patches that add the "stoney" processor. Reviewers: tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16485 llvm-svn: 258922	2016-01-27 11:19:45 +00:00
Benjamin Kramer	b3e8a6d2b8	Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs. llvm-svn: 258917	2016-01-27 10:01:28 +00:00
Matt Arsenault	b22828f2fb	AMDGPU: Fix default device handling When no device name is specified, default to kaveri for HSA since SI is not supported and it woud fail. Default to "tahiti" instead of "SI" since these are effectively the same, and tahiti is an actual device. Move default device handling to the TargetMachine rather than the AMDGPUSubtarget. The module ISA version is computed from the device name provided with the target machine, so the attributes printed by the AsmPrinter were inconsistent with those computed in the subtarget. Also remove DevName field from subtarget since it's redundant with getCPU() in the superclass. llvm-svn: 258901	2016-01-27 02:17:49 +00:00
Reid Kleckner	5b4637141e	[llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names This brings the compile time of Function.cpp from ~40s down to ~4s for me locally. It also shaves off about 400KB of object file size in a release+asserts build. I also realized that the AMDGPU backend does not have any GCC builtin names to match, so the extra lookup was a no-op. I removed it to silence a zero-length string table array warning. There should be no functional change here. This change really ends the story of PR11951. llvm-svn: 258897	2016-01-27 01:43:12 +00:00
Reid Kleckner	1c93b4cd7b	[llvm-tblgen] Stop emitting the intrinsic name matching code The AMDGPU backend was the last user of the old StringMatcher recognition code. Move it over to the new lookupLLVMIntrinsicName funciton, which is now improved to handle all of the interesting edge cases exposed by AMDGPU intrinsic names. llvm-svn: 258875	2016-01-26 23:01:21 +00:00
Chris Bieneman	e49730d4ba	Remove autoconf support Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861	2016-01-26 21:29:08 +00:00
Matt Arsenault	bee7575e1a	AMDGPU: Move AMDGPU intrinsics only used by R600 llvm-svn: 258790	2016-01-26 04:49:24 +00:00
Matt Arsenault	382d945d16	AMDGPU: Tidy minor td file issues Make comments and indentation more consistent. Rearrange a few things to be in a more consistent order, such as organizing subtarget features from those describing an actual device property, and those used as options. llvm-svn: 258789	2016-01-26 04:49:22 +00:00
Matt Arsenault	c5f6152911	AMDGPU: Make v32i8/v64i8 illegal types Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788	2016-01-26 04:43:48 +00:00
Matt Arsenault	018179fc46	AMDGPU: Remove old sample intrinsics I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787	2016-01-26 04:38:08 +00:00
Matt Arsenault	051d6f9fde	AMDGPU: Add new amdgcn intrinsics for cube instructions More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786	2016-01-26 04:29:56 +00:00
Matt Arsenault	9a10cea7fb	AMDGPU: Implement read_register and write_register intrinsics Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785	2016-01-26 04:29:24 +00:00
Matt Arsenault	0c3e2338fe	AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783	2016-01-26 04:14:16 +00:00
Matt Arsenault	7713162c32	AMDGPU: Remove more unused intrinsics Replace tests with lrp with basic IR expansion llvm-svn: 258612	2016-01-23 05:42:38 +00:00
Matt Arsenault	f75257aaa6	AMDGPU: Move amdgcn intrinsic handling into SITargetLowering llvm-svn: 258608	2016-01-23 05:32:20 +00:00
Matt Arsenault	f1341406bf	AMDGPU: Remove IntrNoMem from llvm.SI.sendmsg This has side effects. llvm-svn: 258607	2016-01-23 05:32:18 +00:00
Matt Arsenault	2a93bb6365	AMDGPU: Remove Feature64BitPtr This is a leftover from AMDIL that doesn't do anything and doesn't belong here. llvm-svn: 258606	2016-01-23 05:32:14 +00:00
Matt Arsenault	10ca39ca8b	AMDGPU: Add new name for barrier intrinsic llvm-svn: 258558	2016-01-22 21:30:43 +00:00
Matt Arsenault	bef34e21c7	AMDGPU: Rename intrinsics to use amdgcn prefix The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557	2016-01-22 21:30:34 +00:00
Matt Arsenault	0b783ef076	AMDGPU: Fix crash with invariant markers The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537	2016-01-22 19:47:54 +00:00
Matt Arsenault	59bd3014f2	AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523	2016-01-22 19:00:09 +00:00
Matt Arsenault	bb4ff5f5b6	AMDGPU: Remove unused R600 intrinsics llvm-svn: 258522	2016-01-22 18:52:14 +00:00
Matt Arsenault	7898b90ee1	AMDGPU: Change control flow intrinsics to use amdgcn prefix These aren't supposed to be used outside of the backend, so there aren't any users to worry about. llvm-svn: 258516	2016-01-22 18:42:55 +00:00
Matt Arsenault	8d903029e8	AMDGPU: Don't use separate mulhu/mulhs Pats llvm-svn: 258515	2016-01-22 18:42:49 +00:00
Matt Arsenault	ee0930821a	AMDGPU: Remove random TGSI intrinsic I don't think this was ever used. llvm-svn: 258514	2016-01-22 18:42:44 +00:00
Matt Arsenault	0cbaa1762b	AMDGPU: Remove AMDGPU.fract intrinsic Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513	2016-01-22 18:42:38 +00:00
Tom Stellard	de008d338c	AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute Summary: Currently the SI scheduler can be selected via command line option, but it turned out it would be better if it was selectable via a Target Attribute. This patch adds "si-scheduler" attribute to the backend. Reviewers: tstellarAMD, echristo Subscribers: echristo, arsenm Differential Revision: http://reviews.llvm.org/D16192 llvm-svn: 258386	2016-01-21 04:28:34 +00:00
Tom Stellard	d1efda8e9e	AMDGPU/SI: Promote i1 SETCC operations Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352	2016-01-20 21:48:24 +00:00
Matt Arsenault	7836f895fe	AMDGPU: Fix old comments that mention AMDIL llvm-svn: 258350	2016-01-20 21:22:21 +00:00
Matt Arsenault	7ba334a7d9	AMDGPU: Remove AMDGPU.trunc intrinsic llvm-svn: 258348	2016-01-20 21:05:53 +00:00
Matt Arsenault	15fbe49daf	AMDGPU: Remove AMDIL.fraction intrinsic llvm-svn: 258347	2016-01-20 21:05:49 +00:00
Matt Arsenault	7cccd2672e	AMDGPU: Remove AMDIL.round.nearest intrinsic llvm-svn: 258346	2016-01-20 21:05:40 +00:00
Matt Arsenault	1c9e4ef0df	AMDGPU: Remove abs intrinsic llvm-svn: 258343	2016-01-20 20:58:29 +00:00
Matt Arsenault	f7e6e89718	AMDGPU: Remove min/max intrinsics This removes support for mesa 11.0.x llvm-svn: 258342	2016-01-20 20:50:19 +00:00
Tom Stellard	77a177722f	Correctly initialize SIAnnotateControlFlow Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16304 llvm-svn: 258319	2016-01-20 15:48:27 +00:00
Matthias Braun	5d458617aa	RegisterPressure: Make liveness tracking subregister aware Differential Revision: http://reviews.llvm.org/D14968 llvm-svn: 258258	2016-01-20 00:23:26 +00:00
Tom Stellard	2e045bbc5f	AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256	2016-01-20 00:13:22 +00:00
Matt Arsenault	33e3ecee0c	AMDGPU: Reduce 64-bit SRAs llvm-svn: 258096	2016-01-18 22:09:04 +00:00
Matt Arsenault	6e3a45193a	AMDGPU: Split 64-bit and of constant up This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095	2016-01-18 22:01:13 +00:00
Matt Arsenault	3cbbc10488	AMDGPU: Generalize shl combine Reduce 64-bit shl with constant > 32. We already special cased this for the == 32 case, but this also works for any >= 32 constant. llvm-svn: 258092	2016-01-18 21:55:14 +00:00
Matt Arsenault	80edab99ff	AMDGPU: Reduce 64-bit lshr by constant to 32-bit 64-bit shifts are very slow on some subtargets. llvm-svn: 258090	2016-01-18 21:43:36 +00:00
Matt Arsenault	e83690c1cc	AMDGPU: Add subtarget feature for instruction rates llvm-svn: 258085	2016-01-18 21:13:50 +00:00
Manuel Jacob	5f6eaac611	GlobalValue: use getValueType() instead of getType()->getPointerElementType(). Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999	2016-01-16 20:30:46 +00:00
Rui Ueyama	da00f2fdf4	Update to use new name alignTo(). llvm-svn: 257804	2016-01-14 21:06:47 +00:00
Rafael Espindola	8340f94df1	Convert a few assert failures into proper errors. Fixes PR25944. llvm-svn: 257697	2016-01-13 22:56:57 +00:00
Changpeng Fang	c16be00313	AMDGPU/SI: Update ISA version for FIJI llvm-svn: 257666	2016-01-13 20:39:25 +00:00
Hans Wennborg	81efb6b418	Fix struct/class mismatch for MachineSchedContext llvm-svn: 257648	2016-01-13 18:59:45 +00:00
Marek Olsak	46dadbfab2	AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625	2016-01-13 17:23:20 +00:00
Marek Olsak	3c0ebc71f1	AMDGPU/SI: Remove ending s_endpgm from non-void functions Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16035 llvm-svn: 257623	2016-01-13 17:23:12 +00:00
Marek Olsak	8e9cc63bfb	AMDGPU/SI: Add s_waitcnt at the end of non-void functions Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622	2016-01-13 17:23:09 +00:00
Marek Olsak	8a0f335ad6	AMDGPU/SI: Add support for non-void functions Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621	2016-01-13 17:23:04 +00:00
Nicolai Haehnle	02c3291566	AMDGPU/SI: Add SI Machine Scheduler Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609	2016-01-13 16:10:10 +00:00
Marek Olsak	4e99b6ec01	AMDGPU/SI: Allow more shader inputs Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16032 llvm-svn: 257593	2016-01-13 11:46:48 +00:00
Marek Olsak	b6c8c3d165	AMDGPU/SI: Allow any number of PS inputs Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 llvm-svn: 257592	2016-01-13 11:46:10 +00:00
Marek Olsak	fccabaf57e	AMDGPU/SI: Add new target attribute InitialPSInputAddr Summary: This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM. The register assigns VGPR locations to PS inputs, while the ENA register determines whether or not they are loaded. Mesa needs to set some inputs as not-movable, so that a pixel shader prolog binary appended at the beginning can assume where some inputs are. v2: Make PSInputAddr private, because there is never enough silly getters and setters for people to read. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16030 llvm-svn: 257591	2016-01-13 11:45:36 +00:00
Marek Olsak	926c56f50c	AMDGPU/SI: Fix a bug in SIFoldOperands Summary: ret.ll will contain a test for this Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16029 llvm-svn: 257590	2016-01-13 11:44:29 +00:00
Tom Stellard	f421837250	AMDGPU: Emit note directive for HSA even if there are no functions Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16010 llvm-svn: 257488	2016-01-12 17:18:17 +00:00
Matt Arsenault	5e0bdb8b95	AMDGPU: Implement {{s\|u}}int_to_fp i64 -> f32 The old lowering for uint_to_fp failed opencl conformance. It might be OK for fast math mode, but I'm not sure. llvm-svn: 257393	2016-01-11 22:01:48 +00:00
Matt Arsenault	800fecf9de	AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target It might be better to let this be a select failure instead. llvm-svn: 257386	2016-01-11 21:18:33 +00:00
Matt Arsenault	5319b0add5	AMDGPU: Fix ctlz combine for sub 32-bit types llvm-svn: 257353	2016-01-11 17:02:06 +00:00
Matt Arsenault	de5fbe9c60	AMDGPU: Pattern match ffbh pattern to instruction. The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. llvm-svn: 257352	2016-01-11 17:02:00 +00:00
Matt Arsenault	f058d67643	AMDGPU: Custom lower i64 ctlz llvm-svn: 257348	2016-01-11 16:50:29 +00:00
Matt Arsenault	5ca3c72c5a	LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal llvm-svn: 257345	2016-01-11 16:37:46 +00:00
Matt Arsenault	02d45dfeda	AMDGPU: Remove dead target dag combine llvm-svn: 257344	2016-01-11 16:37:40 +00:00
Tom Stellard	4c4c72db48	AMDGPU/SI: Emit global variable sizes when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15952 llvm-svn: 257173	2016-01-08 14:50:28 +00:00
Tom Stellard	ad8f5e8111	AMDGPU: Emit functions sizes Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15951 llvm-svn: 257172	2016-01-08 14:50:23 +00:00
Nicolai Haehnle	82fc962c20	AMDGPU/SI: Fold operands with sub-registers Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074	2016-01-07 17:10:29 +00:00
Nicolai Haehnle	3c05d6d3b5	AMDGPU/SI: xnack_mask is always reserved on VI Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073	2016-01-07 17:10:20 +00:00
Nicolai Haehnle	a61e5a8d4e	AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader Summary: This is admittedly something that you could only run into by manually playing around with shader assembly because the SITypeWriter pass is skipped for compute. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15902 llvm-svn: 256980	2016-01-06 22:01:04 +00:00
Nicolai Haehnle	6035504ab3	AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870	2016-01-05 20:42:49 +00:00
Matt Arsenault	905042774d	AMDGPU: Remove redundant let mayLoad = 1 This is already set on the SMRD format class. llvm-svn: 256813	2016-01-05 04:50:28 +00:00
Tom Stellard	5cd09ade38	AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions for HSA Summary: This fixes a regression caused by r256282. Reviewers: arsenm, cfang Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15736 llvm-svn: 256810	2016-01-05 03:40:16 +00:00
Tom Stellard	2c82ee60c3	AMDGPU/SI: Consolidate FLAT patterns Summary: We had to sets of identical FLAT patterns one inside the HasFlatAddressSpace predicate and one inside the useFlatForGloabl predicate. This patch merges these sets into a single pattern under the isCIVI predicate. The reason we can remove the predicates is that when MUBUF instructions are legal, the instruction selector will prefer selecting those over FLAT instructions because MUBUF patterns have a higher complexity score. So, in this case having patterns for FLAT instructions will have no effect. This change also simplifies the process for forcing global address space loads to use FLAT instructions, since we no only have to disable the MUBUF patterns instead of having to disable the MUBUF patterns and enable the FLAT patterns. Reviewers: arsenm, cfang Subscribers: llvm-commits llvm-svn: 256807	2016-01-05 02:26:37 +00:00
Nicolai Haehnle	5b50497617	AMDGPU: add +xnack feature Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794	2016-01-04 23:35:53 +00:00
Tom Stellard	3da5672755	AMDGPU/SI: Move VI SMEM pattern back into VIInstructions.td Summary: This was accidently moved to CIInstructions.td in r256282 Reviewers: cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15763 llvm-svn: 256775	2016-01-04 20:23:10 +00:00
Nicolai Haehnle	e705aadd67	AMDGPU: Avoid assertions after SGPR spilling failed Summary: The comment explains it: emitError does not necessarily exit the compilation process, and then using NoRegister leads to assertions later on. This generates incorrect code, of course, but the user should know to not use the result when an error has been emitted. It would be nice to have a test-case for this inside the LLVM repository, but llc exits on error. shader-db tests trigger the underlying issue at least on Tonga. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15826 llvm-svn: 256757	2016-01-04 15:50:01 +00:00
Craig Topper	daf2e3ff7a	Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC llvm-svn: 256427	2015-12-25 22:10:01 +00:00
Matt Arsenault	4339b3ff35	AMDGPU: Fix getRegisterBitWidth for vectors llvm-svn: 256362	2015-12-24 05:14:55 +00:00
Tom Stellard	5ebdfbe562	AMDGPU/SI: Fix encoding of flat instructions on VI Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15735 llvm-svn: 256360	2015-12-24 03:18:18 +00:00
Tom Stellard	668f793049	AMDGPU/SI: Remove non-existent flat instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15734 llvm-svn: 256357	2015-12-24 02:41:55 +00:00
Changpeng Fang	b41574a961	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282	2015-12-22 20:55:23 +00:00
Rafael Espindola	4b0d24c00a	Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA" This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275	2015-12-22 19:46:44 +00:00
Changpeng Fang	9b8a9be058	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273	2015-12-22 19:32:28 +00:00

... 2 3 4 5 6 ...

683 Commits