llvm-project

Commit Graph

Author	SHA1	Message	Date
Tom Stellard	ef33c4b3f2	AMDGPU/SI: Emit fixups for long branches Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25366 llvm-svn: 283570	2016-10-07 16:01:18 +00:00
Artem Tamazov	73f1ab28cd	[AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3. Partially fixes Bug 28232. Lit tests added. Differential Revision: https://reviews.llvm.org/D25367 llvm-svn: 283567	2016-10-07 15:53:16 +00:00
Sam Kolton	a3ec5c10e2	[AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx to AMDGPUBaseInfo.h Reviewers: artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25084 llvm-svn: 283560	2016-10-07 14:46:06 +00:00
Konstantin Zhuravlyov	c09e2d7e46	[AMDGPU] AMDGPUCodeGenPrepare: remove extra ';' llvm-svn: 283558	2016-10-07 14:39:53 +00:00
Konstantin Zhuravlyov	f74fc60a7d	[AMDGPU] Promote uniform (i1, i16] operations to i32 Differential Revision: https://reviews.llvm.org/D25302 llvm-svn: 283555	2016-10-07 14:22:58 +00:00
Nicolai Haehnle	87bc4c218b	AMDGPU: Fix use-after-free in SIOptimizeExecMasking Summary: There was a bug with sequences like s_mov_b64 s[0:1], exec s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill> ... s_mov_b64_term exec, s[2:3] because s[2:3] was defined and used in the same instruction, ending up with SaveExecInst inside OtherUseInsts. Note that the test case also exposes an unrelated bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028 Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25306 llvm-svn: 283528	2016-10-07 08:40:14 +00:00
Peter Collingbourne	2261d78cd2	Target: Remove unused patterns and transforms. NFC. llvm-svn: 283515	2016-10-07 00:30:49 +00:00
Matt Arsenault	5e63a04e46	AMDGPU: Don't fold undef uses or copies with implicit uses llvm-svn: 283476	2016-10-06 18:12:13 +00:00
Matt Arsenault	c59a92387e	AMDGPU: Remove scheduling info from si_mask_branch llvm-svn: 283475	2016-10-06 18:12:07 +00:00
Matt Arsenault	c2ee42cd16	AMDGPU: Remove leftover implicit operands when folding immediates When constant folding an operation to a copy or an immediate mov, the implicit uses/defs of the old instruction were left behind, e.g. replacing v_or_b32 left the implicit exec use on the new copy. llvm-svn: 283471	2016-10-06 17:54:30 +00:00
Matt Arsenault	11f7402075	Reapply "AMDGPU: Support using tablegened MC pseudo expansions" Fix bad merge llvm-svn: 283470	2016-10-06 17:19:11 +00:00
Matt Arsenault	cbc879ee2f	Revert "AMDGPU: Support using tablegened MC pseudo expansions" llvm-svn: 283469	2016-10-06 17:08:01 +00:00
Matt Arsenault	d20a2dd7ac	AMDGPU: Support using tablegened MC pseudo expansions Make the necessary refactorings to make use of PseudoInstExpansion llvm-svn: 283467	2016-10-06 16:56:41 +00:00
Matt Arsenault	6bc43d8627	BranchRelaxation: Support expanding unconditional branches AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464	2016-10-06 16:20:41 +00:00
Sam Kolton	3381d7a216	[AMDGPU] Disassembler: print label names in branch instructions Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table. Initialize MCObjectFileInfo with some default values. Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D24802 llvm-svn: 283450	2016-10-06 13:46:08 +00:00
Matt Arsenault	10c17ca6c6	AMDGPU: Partially fix reported code size for some instructions These ones need to have the size on the pseudo instruction set for getInstSizeInBytes to work correctly. These also have a statically known size. llvm-svn: 283437	2016-10-06 10:13:23 +00:00
Konstantin Zhuravlyov	b4eb5d5049	[AMDGPU] Promote uniform i16 bitreverse intrinsic to i32 Differential Revision: https://reviews.llvm.org/D25121 llvm-svn: 283415	2016-10-06 02:20:46 +00:00
Matthias Braun	0a6916f303	AMDGPU: Do not re-use tmpreg in spill/restore lowering The register scavenging code does not support multiple definitions of the same vreg. Differential Revision: https://reviews.llvm.org/D25220 llvm-svn: 283369	2016-10-05 20:02:51 +00:00
Matt Arsenault	dcf0cfca4c	AMDGPU: Refactor indirect vector lowering Allow inserting multiple instructions in the expanded loop. llvm-svn: 283177	2016-10-04 01:41:05 +00:00
Matt Arsenault	283fbc24f6	AMDGPU: Factor SGPR spilling into separate functions llvm-svn: 283175	2016-10-04 01:14:56 +00:00
Konstantin Zhuravlyov	60a8373778	[AMDGPU] Pass optimization level to SelectionDAGISel llvm-svn: 283133	2016-10-03 18:47:26 +00:00
Konstantin Zhuravlyov	691e2e020b	[AMDGPU] Sign extend AShr when promoting (instead of zero extending) llvm-svn: 283130	2016-10-03 18:29:01 +00:00
Matt Arsenault	4a048a44e5	AMDGPU: Fix typo llvm-svn: 283108	2016-10-03 13:06:58 +00:00
Volkan Keles	1c38681ae6	Add new target hooks for LoadStoreVectorizer Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099	2016-10-03 10:31:34 +00:00
Konstantin Zhuravlyov	c9786f0d8a	[AMDGPU] Remove unused variables from SIOptimizeExecMasking Differential Revision: https://reviews.llvm.org/D25110 llvm-svn: 283087	2016-10-03 04:43:22 +00:00
Mehdi Amini	117296c0a0	Use StringRef in Pass/PassManager APIs (NFC) llvm-svn: 283004	2016-10-01 02:56:57 +00:00
Mehdi Amini	86eeda8e20	Revert "AMDGPU: Don't use offen if it is 0" This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003	2016-10-01 02:35:24 +00:00
Matt Arsenault	3070fdf798	AMDGPU: Don't use offen if it is 0 This removes many re-initializations of a base register to 0. llvm-svn: 282999	2016-10-01 01:37:15 +00:00
Konstantin Zhuravlyov	836cbff695	[AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa version Differential Revision: https://reviews.llvm.org/D24973 llvm-svn: 282877	2016-09-30 17:01:40 +00:00
Konstantin Zhuravlyov	d7bdf24f32	[AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction Differential Revision: https://reviews.llvm.org/D24985 llvm-svn: 282875	2016-09-30 16:50:36 +00:00
Konstantin Zhuravlyov	4658e5f7b0	[AMDGPU] Do not run scalar optimization passes at "-O0" Differential Revision: https://reviews.llvm.org/D25055 llvm-svn: 282873	2016-09-30 16:39:24 +00:00
Matt Arsenault	5d8eb25e78	AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832	2016-09-30 01:50:20 +00:00
Matt Arsenault	e6740754f0	AMDGPU: Partially fix control flow at -O0 Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667	2016-09-29 01:44:16 +00:00
Konstantin Zhuravlyov	e14df4b236	[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624	2016-09-28 20:05:39 +00:00
Nirav Dave	e524f50882	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604	2016-09-28 16:37:50 +00:00
Nirav Dave	e17e055b75	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600	2016-09-28 15:50:43 +00:00
Konstantin Zhuravlyov	da4687c531	[AMDGPU] Enable changing instprinter's behavior based on the per-function subtarget This is a prerequisite for coming waitcnt changes Differential Revision: https://reviews.llvm.org/D24939 llvm-svn: 282489	2016-09-27 14:42:48 +00:00
Tom Stellard	1b9748c6a2	AMDGPU/SI: Don't crash on anonymous GlobalValues Summary: We need to call AsmPrinter::getNameWithPrefix() in order to handle anonymous GlobalValues (e.g. @0, @1). Reviewers: arsenm, b-sumner Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D24865 llvm-svn: 282420	2016-09-26 17:29:25 +00:00
Sam Kolton	984461062f	Revert "[AMDGPU] Disassembler: print label names in branch instructions" This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550. llvm-svn: 282396	2016-09-26 11:29:03 +00:00
Sam Kolton	1559f76257	[AMDGPU] Disassembler: print label names in branch instructions Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table. Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D24802 llvm-svn: 282394	2016-09-26 10:05:50 +00:00
Valery Pykhtin	fbf2d93f73	[AMDGPU] Fix for bz30427: wrong MTBUF encoding on VI Differential revision: https://reviews.llvm.org/D24875 llvm-svn: 282296	2016-09-23 21:21:21 +00:00
Valery Pykhtin	355103f6c0	[AMDGPU] Refactor VOP1 and VOP2 instruction TD definitions Differential revision: https://reviews.llvm.org/D24738 llvm-svn: 282234	2016-09-23 09:08:07 +00:00
Tom Stellard	e88bbc34c6	AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D24835 llvm-svn: 282223	2016-09-23 01:33:26 +00:00
Artem Tamazov	2146a0a90e	[AMDGPU][mc] Add support for absolute expressions in DPP modifiers. Also added range checking for DPP attributes. Assembler tests added as well. Differential Revision: https://reviews.llvm.org/D24755 llvm-svn: 282145	2016-09-22 11:47:21 +00:00
Artem Tamazov	2e217b87cb	[AMDGPU][mc] Add support for ds_add_[rtn_]f32. Lit tests added. Resolves https://github.com/RadeonOpenCompute/hcc/issues/122. Differential Revision: https://reviews.llvm.org/D24765 llvm-svn: 282086	2016-09-21 16:35:44 +00:00
Tim Northover	862758ec14	GlobalISel: pass Function to lowerFormalArguments directly (NFC). The only implementation that exists immediately looks it up anyway, and the information is needed to handle various parameter attributes (stored on the function itself). llvm-svn: 282068	2016-09-21 12:57:35 +00:00
Sam Kolton	12b633beda	[AMDGPU] Assembler: remove unused AMDGPUMCObjectWriter. Summary: It is replaced by AMDGPUELFObjectWriter Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl Differential Revision: https://reviews.llvm.org/D24654 llvm-svn: 282065	2016-09-21 10:33:32 +00:00
Valery Pykhtin	e330cfa294	[AMDGPU] Refactor VOP3 instruction TD definitions Differential revision: https://reviews.llvm.org/D24664 llvm-svn: 281965	2016-09-20 10:41:16 +00:00
Valery Pykhtin	2828b9be1e	[AMDGPU] Refactor VOPC instruction TD definitions Differential Revision: https://reviews.llvm.org/D24546 llvm-svn: 281903	2016-09-19 14:39:49 +00:00
Sam Kolton	be7ffb90bf	[AMDGPU] Fix s_branch with -1 offset Summary: In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0. ''' label: s_branch label // should emit [0xff,0xff,0x82,0xbf] ''' Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result. Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl Differential Revision: https://reviews.llvm.org/D24671 llvm-svn: 281896	2016-09-19 10:20:55 +00:00
Matt Arsenault	ac0fc849cf	AMDGPU: Fix broken FrameIndex handling We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824	2016-09-17 16:09:55 +00:00
Matt Arsenault	bcfd94c298	AMDGPU: Rename spill operands to match real instruction llvm-svn: 281823	2016-09-17 15:52:37 +00:00
Matt Arsenault	d99ef1144b	AMDGPU: Push bitcasts through build_vector This reduces the number of copies and reg_sequences when using fp constant vectors. This significantly reduces the code size in local-stack-alloc-bug.ll llvm-svn: 281822	2016-09-17 15:44:16 +00:00
Matt Arsenault	7b1dc2c983	AMDGPU: Use i64 scalar compare instructions VI added eq/ne for i64, so use them. llvm-svn: 281800	2016-09-17 02:02:19 +00:00
Tom Stellard	7998db634c	AMDGPU/SI: Fix kernel argument ABI for HSA Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24621 llvm-svn: 281789	2016-09-16 22:20:24 +00:00
Matt Arsenault	6408c9135c	AMDGPU: Allow some control flow intrinsics to be CSEd These clean up some unnecessary or instructions in cases with complex loops. In the original testcase I noticed this, the same or with exec was repeated 5 or 6 times in a row. With this only one is emitted or sometimes a copy. llvm-svn: 281786	2016-09-16 22:11:18 +00:00
Tom Stellard	bbeb45aff6	AMDGPU: Refactor kernel argument lowering Summary: The main challenge in lowering kernel arguments for AMDGPU is determing the memory type of the argument. The generic calling convention code assumes that only legal register types can be stored in memory, but this is not the case for AMDGPU. This consolidates all the logic AMDGPU uses for deducing memory types into a single function. This will make it much easier to support different ABIs in the future. Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24614 llvm-svn: 281781	2016-09-16 21:53:00 +00:00
Matt Arsenault	7ccf6cd104	AMDGPU: Use SOPK compare instructions llvm-svn: 281780	2016-09-16 21:41:16 +00:00
Tom Stellard	0b76fc4c77	AMDGPU/SI: Add support for triples with the mesa3d operating system Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779	2016-09-16 21:34:26 +00:00
Eric Christopher	4367c7fb9a	Move the Mangler from the AsmPrinter down to TLOF and clean up the TLOF API accordingly. llvm-svn: 281708	2016-09-16 07:33:15 +00:00
Matt Arsenault	1b9fc8ed65	Finish renaming remaining analyzeBranch functions llvm-svn: 281535	2016-09-14 20:43:16 +00:00
Matt Arsenault	f40b70fa75	Revert "AMDGPU: Use SOPK compare instructions" Accidentally committed llvm-svn: 281514	2016-09-14 18:04:42 +00:00
Matt Arsenault	f757c87959	AMDGPU: Use SOPK compare instructions llvm-svn: 281513	2016-09-14 18:03:53 +00:00
Matt Arsenault	e8e0f5cac6	Make analyzeBranch family of instruction names consistent analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506	2016-09-14 17:24:15 +00:00
Matt Arsenault	a2b036e88b	AArch64: Use TTI branch functions in branch relaxation The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505	2016-09-14 17:23:48 +00:00
Matt Arsenault	2bc198a333	AMDGPU: Support folding FrameIndex operands This avoids test regressions in a future commit. llvm-svn: 281491	2016-09-14 15:51:33 +00:00
Matt Arsenault	fa5f767a38	AMDGPU: Improve splitting 64-bit bit ops by constants This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488	2016-09-14 15:19:03 +00:00
Matt Arsenault	a992f71bef	AMDGPU: Remove code I think is dead As far as I can tell, resolveFrameIndex is supposed to be called with a legal offset, so inserting an add shouldn't be necessary. llvm-svn: 281372	2016-09-13 19:15:25 +00:00
Matt Arsenault	25dba30017	AMDGPU: Support commuting a FrameIndex operand llvm-svn: 281369	2016-09-13 19:03:12 +00:00
Nicolai Haehnle	e58e0e3fe3	AMDGPU: Do not clobber SCC in SIWholeQuadMode Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22198 llvm-svn: 281230	2016-09-12 16:25:20 +00:00
Sam Kolton	fb0d9d9c13	[AMDGPU] Assembler: Move disabled SDWA and DPP instruction into Disable asm variant Summary: This removes disabled instructions from match tables so we will not match them at all. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: wdng, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D24452 llvm-svn: 281216	2016-09-12 14:42:43 +00:00
Duncan P. N. Exon Smith	1872096f1e	CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MI Now that MachineBasicBlock::reverse_instr_iterator knows when it's at the end (since r281168 and r281170), implement MachineBasicBlock::reverse_iterator directly on top of an ilist::reverse_iterator by adding an IsReverse template parameter to MachineInstrBundleIterator. This replaces another hard-to-reason-about use of std::reverse_iterator on list iterators, matching the changes for ilist::reverse_iterator from r280032 (see the "out of scope" section at the end of that commit message). MachineBasicBlock::reverse_iterator now has a handle to the current node and has obvious invalidation semantics. r280032 has a more detailed explanation of how list-style reverse iterators (invalidated when the pointed-at node is deleted) are different from vector-style reverse iterators like std::reverse_iterator (invalidated on every operation). A great motivating example is this commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp. Note: If your out-of-tree backend deletes instructions while iterating on a MachineBasicBlock::reverse_iterator or converts between MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator, you'll need to update your code in similar ways to r280032. The following table might help: [Old] ==> [New] delete &RI, RE = end() delete &RI++ RI->erase(), RE = end() RI++->erase() reverse_iterator(I) std::prev(I).getReverse() reverse_iterator(I) ++I.getReverse() --reverse_iterator(I) I.getReverse() reverse_iterator(std::next(I)) I.getReverse() RI.base() std::prev(RI).getReverse() RI.base() ++RI.getReverse() --RI.base() RI.getReverse() std::next(RI).base() RI.getReverse() (For more details, have a look at r280032.) llvm-svn: 281172	2016-09-11 18:51:28 +00:00
Justin Lebar	adbf09e8cf	[CodeGen] Split out the notions of MI invariance and MI dereferenceability. Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151	2016-09-11 01:38:58 +00:00
Valery Pykhtin	b66e5eb612	[AMDGPU] Refactor MUBUF/MTBUF instructions Differential revision: https://reviews.llvm.org/D24295 llvm-svn: 281137	2016-09-10 13:09:16 +00:00
Matt Arsenault	3354f42ae7	AMDGPU: Implement is{LoadFrom\|StoreTo}FrameIndex llvm-svn: 281128	2016-09-10 01:20:33 +00:00
Matt Arsenault	7348a7eadd	AMDGPU: Fix scheduling info for spill pseudos These defaulted to Write32Bit. I don't think this actually matters since these don't exist during scheduling. llvm-svn: 281127	2016-09-10 01:20:28 +00:00
Matt Arsenault	124384f08d	AMDGPU: Fix immediate folding logic when shrinking instructions If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. llvm-svn: 281117	2016-09-09 23:32:53 +00:00
Matt Arsenault	0efdd06b22	AMDGPU: Run LoadStoreVectorizer pass by default llvm-svn: 281112	2016-09-09 22:29:28 +00:00
Wei Ding	06f8d39424	AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type. Differential Revision: http://reviews.llvm.org/D23700 llvm-svn: 281081	2016-09-09 19:31:51 +00:00
Tom Stellard	b2869eb6e9	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24405 llvm-svn: 281080	2016-09-09 19:28:00 +00:00
Sam Kolton	1eeb11bfd4	AMDGPU] Assembler: better support for immediate literals in assembler. Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050	2016-09-09 14:44:04 +00:00
Sam Kolton	a2e5c88baf	[AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec Summary: Also removed duplicate code from AMDGPUTargetAsmStreamer. This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D24296 llvm-svn: 281028	2016-09-09 10:08:02 +00:00
Sam Kolton	d63d8a7c05	[AMDGPU] Assembler: match e32 VOP instructions before e64. Summary: Split assembler match table in 4 tables with assembler variants: Default - all instructions except VOP3, SDWA and DPP - VOP3 - SDWA - DPP First match Default table then VOP3, SDWA and DPP. Reviewers: tstellarAMD, artem.tamazov, vpykhtin Subscribers: arsenm, wdng, nhaehnle, AMDGPU Differential Revision: https://reviews.llvm.org/D24252 llvm-svn: 281023	2016-09-09 09:37:51 +00:00
Matt Arsenault	d745c28945	AMDGPU: Sign extend constants when splitting them This will confuse later passes which try to look at the immediate value and don't truncate first. llvm-svn: 280974	2016-09-08 17:44:36 +00:00
Matt Arsenault	be90f70d3a	AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32 llvm-svn: 280972	2016-09-08 17:35:41 +00:00
Matt Arsenault	bbb47da8a1	AMDGPU: Support commuting with immediate in src0 llvm-svn: 280970	2016-09-08 17:19:29 +00:00
Yaxun Liu	90658fff1b	AMDGPU: Remove a useless variable which caused build failure for lld. llvm-svn: 280841	2016-09-07 18:31:11 +00:00
Yaxun Liu	638914009a	AMDGPU: Add hidden kernel arguments to runtime metadata OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829	2016-09-07 17:44:00 +00:00
Matt Arsenault	479ba3aac0	AMDGPU: Make some scalar instructions commutable llvm-svn: 280784	2016-09-07 06:25:55 +00:00
Matt Arsenault	6cda10c950	Remove unnecessary call to getAllocatableRegClass This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783	2016-09-07 06:16:45 +00:00
Konstantin Zhuravlyov	1d65026ca6	[AMDGPU] Wave and register controls - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747	2016-09-06 20:22:28 +00:00
Tom Stellard	2add8a1140	AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies Summary: I put this code here, because I want to re-use it in a few other places. This supersedes some of the immediate folding code we have in SIFoldOperands. I think the peephole optimizers is probably a better place for folding immediates into copies, since it does some register coalescing in the same time. This will also make it easier to transition SIFoldOperands into a smarter pass, where it looks at all uses of instruction at once to determine the optimal way to fold operands. Right now, the pass just considers one operand at a time. Reviewers: arsenm Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23402 llvm-svn: 280744	2016-09-06 20:00:26 +00:00
Wei Ding	5e832e866e	AMDGPU : Add XNACK feature to GPUs that support it. Differential Revision: http://reviews.llvm.org/D24276 llvm-svn: 280742	2016-09-06 19:55:17 +00:00
Valery Pykhtin	8bc659637c	[AMDGPU] Refactor FLAT TD instructions Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655	2016-09-05 11:22:51 +00:00
Matt Arsenault	ac42ba8633	AMDGPU: Set sizes of spill pseudos llvm-svn: 280595	2016-09-03 17:25:44 +00:00
Matt Arsenault	5ffe3e1d93	AMDGPU: Fix adding duplicate implicit exec uses I'm not sure if this should be considered a bug in copyImplicitOps or not, but implicit operands that are part of the static instruction definition should not be copied. llvm-svn: 280594	2016-09-03 17:25:39 +00:00
Nicolai Haehnle	3bba6a8438	AMDGPU: Reduce the duration of whole-quad-mode Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads: 1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly. 2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock. This should always be a win or at best neutral. There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22092 llvm-svn: 280590	2016-09-03 12:26:38 +00:00
Nicolai Haehnle	a246dccc26	AMDGPU: Fix an interaction between WQM and polygon stippling Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 llvm-svn: 280589	2016-09-03 12:26:32 +00:00
Matt Arsenault	2510a31677	AMDGPU: Fix spilling of m0 readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584	2016-09-03 06:57:55 +00:00
Jan Vesely	ea45746d5a	AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements. Fixes R600 piglit regressions since r280298 Differential Revision: https://reviews.llvm.org/D24174 llvm-svn: 280535	2016-09-02 20:13:19 +00:00
Jan Vesely	00864886f4	AMDGPU/R600: Expand unaligned writes to local and global AS LOCAL and GLOBAL AS only PRIVATE needs special treatment Differential Revision: https://reviews.llvm.org/D23971 llvm-svn: 280526	2016-09-02 19:07:06 +00:00
Yaxun Liu	add05a8d95	AMDGPU: Add runtime metadata for pointee alignment of argument. Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer. Differential Revision: https://reviews.llvm.org/D24145 llvm-svn: 280399	2016-09-01 18:46:49 +00:00
Changpeng Fang	b28fe0307f	AMDGPU/SI: MIMG TD Refactoring. Summary: Created a new td file MIMGInstructions.td which contains all definitions of MIMG related instructions. Reviewed by: kzhuravl, vpykhtin Differential Revision: http://reviews.llvm.org/D24106 llvm-svn: 280385	2016-09-01 17:54:54 +00:00
Valery Pykhtin	1b13886b5f	[AMDGPU] Scalar Memory instructions TD refactoring Differential revision: https://reviews.llvm.org/D23996 llvm-svn: 280349	2016-09-01 09:56:47 +00:00
Matt Arsenault	b50eb8dc2b	AMDGPU: Fix introducing stack access on unaligned v16i8 llvm-svn: 280298	2016-08-31 21:52:27 +00:00
Matt Arsenault	1d2151781b	AMDGPU: Use copy instead of mov during frame lowering This occurs before RA pseudos are expanded. It's less code to emit the copy. llvm-svn: 280297	2016-08-31 21:52:25 +00:00
Matt Arsenault	57bc4324f8	AMDGPU: Refactor frame lowering This will make future changes easier. llvm-svn: 280296	2016-08-31 21:52:21 +00:00
Tom Stellard	ba5730884b	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned Summary: This fixes some OpenCV tests that were broken by libclc commit r276443. Reviewers: arsenm, jvesely Subscribers: arsenm, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24051 llvm-svn: 280274	2016-08-31 18:46:07 +00:00
Nikolay Haustov	eba808957e	AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass Summary: Simply replace usage of aliases to functions with aliasee. This came up when bitcode linking to builtin library and calls to aliases not being resolved. Also made minor improvements to existing test. Reviewers: tstellarAMD, alex-t, vpykhtin Subscribers: arsenm, wdng, rampitec Differential Revision: https://reviews.llvm.org/D24023 llvm-svn: 280221	2016-08-31 11:18:33 +00:00
Matt Arsenault	a609e2d5ce	AMDGPU: Relax SGPR asm constraint register class s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154	2016-08-30 20:50:08 +00:00
Valery Pykhtin	a34fb49f8f	[AMDGPU] Refactor SOP instructions TD files. Differential revision: https://reviews.llvm.org/D23617 llvm-svn: 280101	2016-08-30 15:20:31 +00:00
NAKAMURA Takumi	9720f57a17	SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable] llvm-svn: 280075	2016-08-30 11:50:21 +00:00
Jan Vesely	89876673cd	AMDGPU/R600: Cleanup DAGCombine Move SDLoc initialization to comon place. fall back to AMDGPU version in one place Differential Revision: https://reviews.llvm.org/D23900 llvm-svn: 280030	2016-08-29 23:21:46 +00:00
Jan Vesely	77ed6af416	AMDGPU/R600: Remove MergeVectorStores from legalization This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019	2016-08-29 22:05:06 +00:00
Saleem Abdulrasool	43e5fe3fac	AMDGPU: fix mismatch tags, NFC llvm-svn: 280006	2016-08-29 20:42:07 +00:00
Tom Stellard	0d23ebe888	AMDGPU/SI: Implement a custom MachineSchedStrategy Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995	2016-08-29 19:42:52 +00:00
Tom Stellard	c2ff0eb697	AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991	2016-08-29 19:15:22 +00:00
Matt Arsenault	b90fc9b3b4	AMDGPU/R600: Fix fixups used for constant arrays Fixes bug 29289 llvm-svn: 279986	2016-08-29 19:01:48 +00:00
Tom Stellard	5d3f71f721	AMDGPU/SI: Improve register allocation hints for sopk instructions Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968	2016-08-29 13:06:10 +00:00
Tom Stellard	662f330852	AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint() Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963	2016-08-29 12:05:32 +00:00
Jan Vesely	38814fa2fd	AMDGPU/R600: Enable Load combine Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925	2016-08-27 19:09:43 +00:00
Matt Arsenault	a15ea4e217	AMDGPU: Mark sched model complete Fixes bug 26800 llvm-svn: 279910	2016-08-27 03:39:27 +00:00
Matt Arsenault	71ed8a67e8	AMDGPU: Remove unneeded implicit exec uses/defs SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec. SI_IF_BREAK and SI_ELSE_BREAK do not read it either. llvm-svn: 279909	2016-08-27 03:00:51 +00:00
Matt Arsenault	2712d4a3d8	AMDGPU: Select mulhi 24-bit instructions llvm-svn: 279902	2016-08-27 01:32:27 +00:00
Matt Arsenault	22e417956d	AMDGPU: Move cndmask pseudo to be isel pseudo There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901	2016-08-27 01:00:37 +00:00
Matt Arsenault	e949744474	AMDGPU: Fix sched type for branches llvm-svn: 279900	2016-08-27 00:51:02 +00:00
Matt Arsenault	f98a596954	AMDGPU: Remove register operand from si_mask_branch It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899	2016-08-27 00:42:21 +00:00
Matt Arsenault	00e102baf4	AMDGPU: Improve error reporting for maximum branch distance Unfortunately this seems to only help the assembler diagnostic. llvm-svn: 279895	2016-08-27 00:21:22 +00:00
Tom Stellard	e175d8aba5	AMDGPU/SI: Canonicalize offset order for merged DS instructions Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870	2016-08-26 21:36:47 +00:00
Tom Stellard	4b5cd87ed3	XXX llvm-svn: 279868	2016-08-26 21:16:40 +00:00
Tom Stellard	7c463c9168	AMDGPU/SI: Use a better method for determining the largest pressure sets Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867	2016-08-26 21:16:37 +00:00
Reid Kleckner	a5b1eef846	[MC] Move .cv_loc management logic out of MCContext MCContext already has many tasks, and separating CodeView out from it is probably a good idea. The .cv_loc tracking was modelled on the DWARF tracking which lived directly in MCContext. Removes the inclusion of MCCodeView.h from MCContext.h, so now there are only 10 build actions while I hack on CodeView support instead of 265. llvm-svn: 279847	2016-08-26 17:58:37 +00:00
Changpeng Fang	75f0968b39	AMDGCN/SI: Implement readlane/readfirstlane intrinsics Summary: This patch implements readlane/readfirstlane intrinsics. TODO: need to define a new register class to consider the case that the source could be a vector register or M0. Reviewed by: arsenm and tstellarAMD Differential Revision: http://reviews.llvm.org/D22489 llvm-svn: 279660	2016-08-24 20:35:23 +00:00
Wei Ding	1041a646a9	AMDGPU : Add V_SAD_U32 instruction pattern. Differential Revision: http://reviews.llvm.org/D23069 llvm-svn: 279629	2016-08-24 14:59:47 +00:00
Matthias Braun	733fe3676c	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602	2016-08-24 01:52:46 +00:00
Richard Smith	8c3fbdc6c4	Revert r279564. It introduces undefined behavior (binding a reference to a dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580	2016-08-23 22:08:27 +00:00
Matthias Braun	90799ce8b2	MachineFunction: Introduce NoPHIs property I want to compute the SSA property of .mir files automatically in upcoming patches. The problem with this is that some inputs will be reported as static single assignment with some passes claiming not to support SSA form. In reality though those passes do not support PHI instructions => Track the presence of PHI instructions separate from the SSA property. Differential Revision: https://reviews.llvm.org/D22719 llvm-svn: 279573	2016-08-23 21:19:49 +00:00
Matthias Braun	4c1f1f120c	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses Re-apply this commit with the deletion of a MachineFunction delegated to a separate pass to avoid use after free when doing this directly in AsmPrinter. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279564	2016-08-23 20:58:29 +00:00
Matthias Braun	7f66202d38	Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses" Reverting while tracking down a use after free. This reverts commit r279502. llvm-svn: 279503	2016-08-23 05:17:11 +00:00
Matthias Braun	fd936841eb	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279502	2016-08-23 03:20:09 +00:00
Matt Arsenault	78fc9daf8d	AMDGPU: Split SILowerControlFlow into two pieces Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464	2016-08-22 19:33:16 +00:00
NAKAMURA Takumi	59a20649c6	Untabify. llvm-svn: 279408	2016-08-22 00:58:04 +00:00
Tim Shen	b5e0f5ac95	[GraphTraits] Make nodes_iterator dereference to NodeType/NodeRef Currently nodes_iterator may dereference to a NodeType or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later. Differential Revision: https://reviews.llvm.org/D23704 Differential Revision: https://reviews.llvm.org/D23705 llvm-svn: 279326	2016-08-19 21:20:13 +00:00
Michael Kuperstein	2bc3d4d46c	[SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround The names of the tablegen defs now match the names of the ISD nodes. This makes the world a slightly saner place, as previously "fround" matched ISD::FP_ROUND and not ISD::FROUND. Differential Revision: https://reviews.llvm.org/D23597 llvm-svn: 279129	2016-08-18 20:08:15 +00:00
Wei Ding	52bb661dec	AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type. Differential Revision: http://reviews.llvm.org/D23689 llvm-svn: 279126	2016-08-18 19:51:14 +00:00
Valery Pykhtin	609c2f8137	[AMDGPU] add s_incperflevel/s_decperflevel intrinsics. Differential revision: https://reviews.llvm.org/D23666 llvm-svn: 279106	2016-08-18 18:06:20 +00:00
Justin Bogner	cd1d5aaf2e	Replace a few more "fall through" comments with LLVM_FALLTHROUGH Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970	2016-08-17 20:30:52 +00:00
Matt Arsenault	d42d58cf21	AMDGPU: Remove dead option llvm-svn: 278965	2016-08-17 20:07:16 +00:00
Justin Bogner	b03fd12cef	Replace "fallthrough" comments with LLVM_FALLTHROUGH This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902	2016-08-17 05:10:15 +00:00
Chandler Carruth	67fc52f067	[PM] Port the always inliner to the new pass manager in a much more minimal and boring form than the old pass manager's version. This pass does the very minimal amount of work necessary to inline functions declared as always-inline. It doesn't support a wide array of things that the legacy pass manager did support, but is alse ... about 20 lines of code. So it has that going for it. Notably things this doesn't support: - Array alloca merging - To support the above, bottom-up inlining with careful history tracking and call graph updates - DCE of the functions that become dead after this inlining. - Inlining through call instructions with the always_inline attribute. Instead, it focuses on inlining functions with that attribute. The first I've omitted because I'm hoping to just turn it off for the primary pass manager. If that doesn't pan out, I can add it here but it will be reasonably expensive to do so. The second should really be handled by running global-dce after the inliner. I don't want to re-implement the non-trivial logic necessary to do comdat-correct DCE of functions. This means the -O0 pipeline will have to be at least 'always-inline,global-dce', but that seems reasonable to me. If others are seriously worried about this I'd like to hear about it and understand why. Again, this is all solveable by factoring that logic into a utility and calling it here, but I'd like to wait to do that until there is a clear reason why the existing pass-based factoring won't work. The final point is a serious one. I can fairly easily add support for this, but it seems both costly and a confusing construct for the use case of the always inliner running at -O0. This attribute can of course still impact the normal inliner easily (although I find that a questionable re-use of the same attribute). I've started a discussion to sort out what semantics we want here and based on that can figure out if it makes sense ta have this complexity at O0 or not. One other advantage of this design is that it should be quite a bit faster due to checking for whether the function is a viable candidate for inlining exactly once per function instead of doing it for each call site. Anyways, hopefully a reasonable starting point for this pass. Differential Revision: https://reviews.llvm.org/D23299 llvm-svn: 278896	2016-08-17 02:56:20 +00:00
Duncan P. N. Exon Smith	db53d99d02	AMDGPU: Avoid looking for the DebugLoc in end() The end() iterator isn't a safe thing to dereference. Pass the DebugLoc into EmitFetchClause and EmitALUClause to avoid it. llvm-svn: 278873	2016-08-17 00:06:43 +00:00
Konstantin Zhuravlyov	e0b87181cf	[AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops pass Differential Revision: https://reviews.llvm.org/D23556 llvm-svn: 278863	2016-08-16 22:30:11 +00:00
Matt Arsenault	7f19298bfa	AMDGPU: Remove excessive padding from ImmOp and RegOp. The structs ImmOp and RegOp are in AArch64AsmParser.cpp (inside anonymous namespace). This diff changes the order of fields and removes the excessive padding (8 bytes). Patch by Alexander Shaposhnikov llvm-svn: 278844	2016-08-16 20:28:06 +00:00
Reid Kleckner	229d32abfc	[AMDGPU] Give enum an explicit 64-bit type to fix MSVC 2013 failures Recall that MSVC always gives enums the type 'int', nothing else. MSVC 2015 does not appear to have this problem anymore. Clang-cl -Wmicrosoft-enum-value flags this, FWIW, so now I have a true positive for my warning. :) llvm-svn: 278762	2016-08-15 23:54:44 +00:00
Jan Vesely	0486f739a4	AMDGPU/R600: Convert buffer id to VTX_READ input Use patterns instead of multiple instructions Add buffer id to asm string https://reviews.llvm.org/D22650 llvm-svn: 278749	2016-08-15 21:38:30 +00:00
Yaxun Liu	c7cbd72921	AMDGPU: Update AMDGPURuntimeMetadata.h for enums of address space qualifiers llvm-svn: 278682	2016-08-15 16:54:25 +00:00
Matt Arsenault	3661e90e71	AMDGPU: Don't fold subregister extracts into tied operands llvm-svn: 278676	2016-08-15 16:18:36 +00:00
Valery Pykhtin	c761675ef4	[AMDGPU] fix failure on printing of non-existing instruction operands. Differential revision: https://reviews.llvm.org/D23323 llvm-svn: 278665	2016-08-15 10:56:48 +00:00
Matt Arsenault	c1ebd82ebe	AMDGPU: Fix not estimating MBB operand sizes correctly llvm-svn: 278590	2016-08-13 01:43:54 +00:00
Matt Arsenault	3cc1e0066d	AMDGPU: Fix missing test for addressing mode with odd offsets Add test if the constant offset looks unaligned. llvm-svn: 278589	2016-08-13 01:43:51 +00:00
Matt Arsenault	44f6d694b3	AMDGPU/R600: Remove macros llvm-svn: 278588	2016-08-13 01:43:46 +00:00
Hans Wennborg	0dd9ed1d45	Fix more dereferenced end() iterators after r278532 llvm-svn: 278587	2016-08-13 01:12:49 +00:00
Duncan P. N. Exon Smith	f197b1f78f	ADT: Remove all ilist_iterator => pointer casts, NFC Remove all ilist_iterator to pointer casts. There were two reasons for casts: - Checking for an uninitialized (i.e., null) iterator. I added MachineInstrBundleIterator::isValid() to check for that case. - Comparing an iterator against the underlying pointer value while avoiding converting the pointer value to an iterator. This is occasionally necessary in MachineInstrBundleIterator, since there is an assertion in the constructors that the underlying MachineInstr is not bundled (but we don't care about that if we're just checking for pointer equality). To support the latter case, I rewrote the == and != operators for ilist_iterator and MachineInstrBundleIterator. - The implicit constructors now use enable_if to exclude const-iterator => non-const-iterator conversions from overload resolution (previously it was a compiler error on instantiation, now it's SFINAE). - The == and != operators are now global (friends), and are not templated. - MachineInstrBundleIterator has overloads to compare against both const_pointer and const_reference. This avoids the implicit conversions to MachineInstrBundleIterator that assert, instead just checking the address (and I added unit tests to confirm this). Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked are in ilist.h, and no code outside of ilist.h directly relies on this UB end-iterator-to-pointer conversion anymore. It's still needed for ilist_sentinel_traits, but I'll clean that up soon. llvm-svn: 278478	2016-08-12 05:05:36 +00:00
David Majnemer	562e82945e	Use the range variant of find_if instead of unpacking begin/end No functionality change is intended. llvm-svn: 278443	2016-08-12 00:18:03 +00:00
David Majnemer	0d955d0bf5	Use the range variant of find instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433	2016-08-11 22:21:41 +00:00
Matt Arsenault	18da70dd2d	AMDGPU: Remove unused tablegen utilities llvm-svn: 278414	2016-08-11 21:08:43 +00:00
Wei Ding	70cda07526	AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32 Differential Revision: http://reviews.llvm.org/D23336 llvm-svn: 278403	2016-08-11 20:34:48 +00:00
Matt Arsenault	2ffe8fd2ce	AMDGPU: Prune includes llvm-svn: 278391	2016-08-11 19:18:50 +00:00
Matt Arsenault	56684d4538	AMDGPU: Fix crashes on memory functions llvm-svn: 278369	2016-08-11 17:31:42 +00:00
Matt Arsenault	4b5fc093d0	AMDGPU: Remove custom getSubReg This was kind of confusing, the subregister class shouldn't really be necessary. llvm-svn: 278362	2016-08-11 17:15:32 +00:00
Matt Arsenault	69fd2c1179	AMDGPU: Remove unused tracking of flat instructions llvm-svn: 278361	2016-08-11 17:15:28 +00:00
Wei Ding	34e1753585	AMDGPU : Add LLVM intrinsics for SAD related instructions. Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278354	2016-08-11 16:33:53 +00:00
Valery Pykhtin	82c73bee2b	Revert "[AMDGPU] fix failure on printing of non-existing instruction operands." This reverts revision 278333, newly added test failed. llvm-svn: 278336	2016-08-11 14:22:05 +00:00
Valery Pykhtin	3048ff6ec3	[AMDGPU] fix failure on printing of non-existing instruction operands. Differential revision: https://reviews.llvm.org/D23323 llvm-svn: 278333	2016-08-11 13:49:46 +00:00
Tim Northover	406024a108	GlobalISel: implement simple function calls on AArch64. We're still limited in the arguments we support, but this at least handles the basic cases. llvm-svn: 278293	2016-08-10 21:44:01 +00:00
Changpeng Fang	fb9c3818dd	AMDGPU/SI: Implement amdgcn image intrinsics with sampler Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291	2016-08-10 21:15:30 +00:00
Matt Arsenault	61f8ba8b79	AMDGPU: s_setpc_b64 should be an indirect branch llvm-svn: 278278	2016-08-10 19:20:02 +00:00
Matt Arsenault	c6b1350039	AMDGPU: Set sizes on control flow pseudos llvm-svn: 278276	2016-08-10 19:11:51 +00:00
Matt Arsenault	f4af802381	AMDGPU: Remove empty file comment llvm-svn: 278275	2016-08-10 19:11:48 +00:00
Matt Arsenault	11587d97be	AMDGPU: Remove unnecessary cast llvm-svn: 278274	2016-08-10 19:11:45 +00:00
Matt Arsenault	57431c9680	AMDGPU: Change insertion point of si_mask_branch Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273	2016-08-10 19:11:42 +00:00
Matt Arsenault	b920e9987d	AMDGPU: Use CreateStackObject instead of CreateSpillStackObject I'm not sure what the difference is, but no other target uses this for emergency spill slots. llvm-svn: 278272	2016-08-10 19:11:36 +00:00
Marek Olsak	355a8642b4	AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland Summary: This is the setting of the Vulkan closed source driver. It decreases the max wave count from 10 to 8. 26010 shaders in 14650 tests Totals: VGPRS: 829593 -> 808440 (-2.55 %) Spilled SGPRs: 81878 -> 42226 (-48.43 %) Spilled VGPRs: 367 -> 358 (-2.45 %) Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread Code Size: 36677864 -> 35923932 (-2.06 %) bytes There is a massive decrease in SGPR spilling in general and -7.4% spilled VGPRs for DiRT Showdown (= SGPRs spilled to scratch?) Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23034 llvm-svn: 277867	2016-08-05 21:23:29 +00:00
Yaxun Liu	86c052238a	[OpenCL] Add missing tests for getOCLTypeName Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22964 llvm-svn: 277759	2016-08-04 19:45:00 +00:00
Alina Sbirlea	6f937b1144	LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments. Summary: TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses. A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted. Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures: - x86 failing tests either did not have the right target, or the right alignment. - NVPTX failing tests did not have the right alignment. - AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks for a maximum size allowed and relies on the next condition checking for %4 for correctness. This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list). Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure), so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context. Reviewers: arsenm, jlebar, tstellarAMD Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23068 llvm-svn: 277735	2016-08-04 16:38:44 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Matt Arsenault	979902b3ff	AMDGPU: fdiv -1, x -> rcp -x llvm-svn: 277535	2016-08-02 22:25:04 +00:00
Nicolai Haehnle	8a482b33fe	AMDGPU: Stay in WQM for non-intrinsic stores Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504	2016-08-02 19:31:14 +00:00
Nicolai Haehnle	bef0e90cf1	AMDGPU: Track physical registers in SIWholeQuadMode Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500	2016-08-02 19:17:37 +00:00
Valery Pykhtin	902db3101b	[AMDGPU] refactor DS instruction definitions. NFC. Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344	2016-08-01 14:21:30 +00:00
Benjamin Kramer	22ff865a83	[AMDGPU] Fix lifetime of SmallVector temporaries. Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277265	2016-07-30 11:31:16 +00:00
Matt Arsenault	749035b7b1	AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior This should really be true for any immediate, not just inline ones. llvm-svn: 277260	2016-07-30 01:40:36 +00:00
Matt Arsenault	d2141b6030	AMDGPU: Set s_setpc_b64 as a terminator llvm-svn: 277259	2016-07-30 01:40:34 +00:00
Matt Arsenault	dc744412ad	AMDGPU: Remove unused pattern llvm-svn: 277258	2016-07-30 01:40:30 +00:00
Sjoerd Meijer	0eb96ed0de	TargetInstrInfo: add virtual function getInstSizeInBytes This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126	2016-07-29 08:16:16 +00:00
Changpeng Fang	26fb9d268b	AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB. Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073	2016-07-28 23:01:45 +00:00
Matthias Braun	941a705b7b	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Wei Ding	07e03712d3	AMDGPU : Add intrinsics for compare with the full wavefront result Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998	2016-07-28 16:42:13 +00:00
Tom Stellard	19f4301099	AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980	2016-07-28 14:30:43 +00:00
Nicolai Haehnle	3b572002a2	AMDGPU: add execfix flag to SI_ELSE Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972	2016-07-28 11:39:24 +00:00
Matt Arsenault	edc7dcb2aa	AMDGPU: Turn dead checks into asserts llvm-svn: 276946	2016-07-28 00:32:05 +00:00
Matt Arsenault	fe26775992	AMDGPU: Remove analyzeImmediate This no longer uses the more complicated classification of constants. llvm-svn: 276945	2016-07-28 00:32:02 +00:00
Reid Kleckner	46cb48c74a	Remove MCAsmInfo.h include from TargetOptions.h TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. llvm-svn: 276883	2016-07-27 16:03:57 +00:00
Ahmed Bougacha	6756a2c953	[GlobalISel] Introduce an instruction selector. And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875	2016-07-27 14:31:55 +00:00
Matt Arsenault	e3862cdc93	AMDGPU: Use rcp for fdiv 1, x with fpmath metadata Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823	2016-07-26 23:25:44 +00:00
Matt Arsenault	c6b69a9114	AMDGPU: Use implicit_def for selecting anyext llvm-svn: 276819	2016-07-26 23:06:33 +00:00
Matt Arsenault	60a750fa69	AMDGPU/R600: Remove dead custom inserters The intrinsics for these were removed, so this is dead. llvm-svn: 276805	2016-07-26 21:03:38 +00:00
Matt Arsenault	b06db8f666	AMDGPU: Minor AsmPrinter cleanups llvm-svn: 276804	2016-07-26 21:03:36 +00:00
Matt Arsenault	52ef4019fd	AMDGPU: Make AMDGPUMachineFunction fields private ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766	2016-07-26 16:45:58 +00:00
Matt Arsenault	32fc527c65	AMDGPU: Add fp legacy instruction intrinsics This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764	2016-07-26 16:45:45 +00:00
Jan Vesely	b64c8925e9	AMDGPU: Remove read_workdim intrinsic Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682	2016-07-25 20:17:02 +00:00
Matt Arsenault	2fa171c43a	AMDGPU: Make skip threshold an option llvm-svn: 276680	2016-07-25 19:48:29 +00:00
Matt Arsenault	cdae95bef2	AMDGPU: Delete dead code llvm-svn: 276675	2016-07-25 19:06:25 +00:00
Joel Jones	373d7d30dd	MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654	2016-07-25 17:18:28 +00:00
Matt Arsenault	b40d8600ca	AMDGPU: Delete dead code This has been dead since r269479 llvm-svn: 276518	2016-07-23 07:07:14 +00:00
Tom Stellard	b8253c88b6	Revert "[AMDGPU] Emit read-only data to .rodata for hsa" This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498	2016-07-22 23:46:40 +00:00
Tim Northover	33b07d6725	GlobalISel: implement legalization pass, with just one transformation. This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461	2016-07-22 20:03:43 +00:00
Matt Arsenault	3c07c813c0	AMDGPU: Fix groupstaticsize for large LDS The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438	2016-07-22 17:01:33 +00:00
Matt Arsenault	8d718dcfda	AMDGPU: Add HSA dispatch id intrinsic llvm-svn: 276437	2016-07-22 17:01:30 +00:00
Matt Arsenault	f9245b75c0	AMDGPU: Delete more dead code Remove dead code from r600 intrinsic removal. Remove unset members, rename StackSize to be less ambiguous. llvm-svn: 276436	2016-07-22 17:01:25 +00:00
Matt Arsenault	7fb961f3e6	AMDGPU: Fix i1 fp_to_int R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435	2016-07-22 17:01:21 +00:00
Matt Arsenault	d40ded6681	AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs llvm-svn: 276434	2016-07-22 17:01:15 +00:00
Konstantin Zhuravlyov	3c0d8d22fe	[AMDGPU] Emit read-only data to .rodata for hsa Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298	2016-07-21 15:59:23 +00:00
Konstantin Zhuravlyov	155626238b	AMDGPU/SI: Add support for R_AMDGPU_ABS32 Differential Revision: https://reviews.llvm.org/D21646 llvm-svn: 276294	2016-07-21 15:29:19 +00:00
Sam Kolton	6c4efcadfe	[AMDGPU] Some code cleaning in SIRegisterInfo.td Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22620 llvm-svn: 276274	2016-07-21 13:29:57 +00:00
Matt Arsenault	f0ba86a4d5	AMDGPU: Fix phis from blocks split due to register indexing llvm-svn: 276257	2016-07-21 09:40:57 +00:00
Yaxun Liu	4b1d9f7f18	AMDGPU: Fix bug causing crash due to invalid opencl version metadata. Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119	2016-07-20 14:38:06 +00:00
Matt Arsenault	a1fe17c9ad	AMDGPU: Change fdiv lowering based on !fpmath metadata If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051	2016-07-19 23:16:53 +00:00
Davide Italiano	63e5968033	[AMDGPU] Remove spurious line (should've been removed in r276029). llvm-svn: 276030	2016-07-19 21:16:30 +00:00
Davide Italiano	1576e38598	[AMDGPU] Remove dead code. LGTM'd by Matt Arsenault. llvm-svn: 276029	2016-07-19 21:10:49 +00:00
Matt Arsenault	03006fd3c4	AMDGPU: Only use legal inline immediates with kill pseudo Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988	2016-07-19 16:27:56 +00:00
Matt Arsenault	fe358066ea	AMDGPU/SI: Fix SI scheduler refcount issue Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935	2016-07-19 00:35:22 +00:00
Matt Arsenault	cb540bc03c	AMDGPU: Expand register indexing pseudos in custom inserter This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934	2016-07-19 00:35:03 +00:00
Matt Arsenault	210b7cf3e2	AMDGPU: Remove pointless dyn_cast_or_null This is already casted above so non-null llvm-svn: 275881	2016-07-18 19:00:07 +00:00
Matt Arsenault	b51dcb97bb	AMDGPU: Fix missing switch case warning llvm-svn: 275873	2016-07-18 18:40:51 +00:00
Matt Arsenault	c96e1deffa	AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32 llvm-svn: 275871	2016-07-18 18:35:05 +00:00
Matt Arsenault	4c519d3518	AMDGPU/R600: Replace barrier intrinsics llvm-svn: 275870	2016-07-18 18:34:59 +00:00
Matt Arsenault	efb24540b1	AMDGPU: Remove dead check in AMDGPUPromoteAlloca This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869	2016-07-18 18:34:53 +00:00
Matt Arsenault	2e08e181a7	AMDGPU: Remove dead code and redundant check Non intrinsic calls aren't really handled, and this IntrinsicInst dyn_cast checks for the function for us. llvm-svn: 275868	2016-07-18 18:34:48 +00:00
Nicolai Haehnle	bef1ceb815	AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions. Summary: The work item intrinsics are not available for the shader calling conventions. And even if we did hook them up most shader stages haves some extra restrictions on the amount of available LDS. Reviewers: tstellarAMD, arsenm Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D20728 llvm-svn: 275779	2016-07-18 09:02:47 +00:00
Yaxun Liu	a711cc7951	Re-commit [AMDGPU] Add metadata for runtime Attempting to fix lit test failure on ppc. llvm-svn: 275676	2016-07-16 05:09:21 +00:00
Matt Arsenault	73d2f8954a	AMDGPU: Fix verifier error from partially undef copy In this situation: %VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11, %VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use> %VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3 %VGPR4<def> = COPY %VGPR2 The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1, but VGPR4 is defined immediately after this copy. llvm-svn: 275635	2016-07-15 22:32:02 +00:00
Matt Arsenault	a65e6b8335	AMDGPU: Remove brev intrinsic llvm-svn: 275620	2016-07-15 21:27:13 +00:00
Matt Arsenault	82e5e1e564	AMDGPU: Fix TargetPrefix for remaining r600 intrinsics llvm-svn: 275619	2016-07-15 21:27:08 +00:00
Matt Arsenault	11d3e21f2b	AMDGPU: Remove AMDGPU.ldexp llvm-svn: 275618	2016-07-15 21:26:56 +00:00
Matt Arsenault	09b2c4aee8	AMDGPU: Remove legacy rsq.clamped intrinsic Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining. Also fix mismatch with non-IEEE rsq selecting to IEEE rsq. llvm-svn: 275617	2016-07-15 21:26:52 +00:00
Matt Arsenault	d7f4414543	AMDGPU/R600: Delete dead code. Dead or the same as the base implementation. llvm-svn: 275616	2016-07-15 21:26:46 +00:00
Vitaly Buka	7f64844481	Revert "[AMDGPU] Add metadata for runtime" This reverts commit r275566. llvm-svn: 275599	2016-07-15 19:14:57 +00:00
Justin Lebar	9c375817ac	[SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592	2016-07-15 18:27:10 +00:00
Yaxun Liu	b3d17690eb	[AMDGPU] Add metadata for runtime Added emitting metadata to elf for runtime. Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream. Differential Revision: https://reviews.llvm.org/D21849 llvm-svn: 275566	2016-07-15 14:58:21 +00:00
Jacques Pienaar	71c30a14b7	Rename AnalyzeBranch* to analyzeBranch*. Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564	2016-07-15 14:41:04 +00:00
Matt Arsenault	b91805ea2b	AMDGPU: Fix not expanding control flow after some kill blocks Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510	2016-07-15 00:58:15 +00:00
Matt Arsenault	fa5a86a403	AMDGPU: Fix trying to skip from a block with no successors Found while reducing bug 28550 llvm-svn: 275509	2016-07-15 00:58:13 +00:00
Matt Arsenault	83ab049af2	AMDGPU: Fix splitting kill blocks with defs before kill llvm-svn: 275508	2016-07-15 00:58:09 +00:00
Sam Kolton	7a2a323feb	[AMDGPU] Assembler: fix row_bcast parsing Summary: This change fix bug 28538 Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22355 llvm-svn: 275422	2016-07-14 14:50:35 +00:00
Matt Arsenault	ca7f5701f8	AMDGPU/R600: Delete/rename intrinsics no longer used by mesa Use the replacement pass to update the tests, and delete old names. llvm-svn: 275375	2016-07-14 05:47:17 +00:00
Matt Arsenault	648e422bd9	AMDGPU/R600: Remove intrinsics with no tests and no users Mesa removed this path, so nothing is using these anymore. llvm-svn: 275372	2016-07-14 05:23:23 +00:00
Matt Arsenault	897eee4187	AMDGPU: Remove unused intrinsics llvm-svn: 275371	2016-07-14 05:23:19 +00:00
Matt Arsenault	0bf9984bc8	AMDGPU: Remove dead code llvm-svn: 275369	2016-07-14 05:23:08 +00:00
Matt Arsenault	f071102647	AMDGPU: Remove last AMDIL intrinsics llvm-svn: 275309	2016-07-13 19:42:06 +00:00
Marek Olsak	0532c190f7	AMDGPU/SI: Emit the number of SGPR and VGPR spills Summary: v2: don't count SGPRs spilled to scratch twice I think this is sufficient. It doesn't count private memory usage, which happens often and uses scratch but isn't technically a spill. The private memory usage can be computed by: [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills]. The fact SGPR spills add very high numbers to the scratch size make that computation a guessing game, but I don't have a solution to that. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D22197 llvm-svn: 275288	2016-07-13 17:35:15 +00:00
Tom Stellard	418beb7671	AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268	2016-07-13 14:23:33 +00:00
Matt Arsenault	0056868c4a	AMDGPU: Fold out no-op kill intrinsics llvm-svn: 275253	2016-07-13 06:04:22 +00:00
Matt Arsenault	8dff86d878	AMDGPU: WQM cleanups - Add new TTI instruction checks - Don't use const for blocks that are mutated. - Checking isBranch and isTerminator should be redundant llvm-svn: 275252	2016-07-13 05:55:15 +00:00
Matt Arsenault	786724a22e	AMDGPU: Follow up to r275203 I meant to squash this into it. llvm-svn: 275220	2016-07-12 21:41:32 +00:00
Matt Arsenault	657f871a4e	AMDGPU: Fix verifier error with kill intrinsic Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. llvm-svn: 275203	2016-07-12 19:01:23 +00:00
Matt Arsenault	10531d1020	AMDGPU: Set isConvergent on v_cmpx* instructions No test since these aren't used now, except for one place in a pre-emit pass. llvm-svn: 275200	2016-07-12 18:41:03 +00:00
Wei Ding	5b2636a152	AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8 Differential Revision: http://reviews.llvm.org/D22239 llvm-svn: 275197	2016-07-12 18:02:14 +00:00
Nicolai Haehnle	7968c34586	AMDGPU: Unify MOVRELSOffset and MOVRELDOffset Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160	2016-07-12 08:12:16 +00:00
Matt Arsenault	fc7e6a0a0e	AMDGPU: Cleanup pseudoinstructions llvm-svn: 275133	2016-07-12 00:23:17 +00:00
Matt Arsenault	840593e19d	AMDGPU: Fix missing scc def on control flow pseudos These are all expanded to instructions that include an scc def. llvm-svn: 275132	2016-07-12 00:08:14 +00:00
Matt Arsenault	e3742466b9	AMDGPU: Enable trackLivenessAfterRegAlloc This has caught a number of bugs. llvm-svn: 275131	2016-07-11 23:56:30 +00:00
Nicolai Haehnle	c06bfa1daa	AMDGPU: Treat texture gather instructions more like other MIMG instructions Summary: Setting MIMG to 0 has a bunch of unexpected side effects, including that isVMEM returns false which leads to incorrect treatment in the hazard recognizer. The reason I noticed it is that it also leads to incorrect treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug. The only reason why MIMG was set to 0 is to signal the special handling of dmasks, but that can be checked differently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22210 llvm-svn: 275113	2016-07-11 21:59:43 +00:00
Nicolai Haehnle	f52c3cf272	AMDGPU: fix local stack slot allocation bugs Summary: The main bug fix here is using the 32-bit encoding of V_ADD_I32 in materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary immediates work. The second part is that we may now require the SegmentWaveByteOffset even when there are initially no stack objects and VGPR spilling isn't enabled, for stack slots that are allocated later. This means that some bits become effectively dead and can be cleaned up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602 Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21551 llvm-svn: 275108	2016-07-11 21:44:40 +00:00
Nirav Dave	8603062ee4	Fix branch relaxation in 16-bit mode. Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation to generate jumps with 16-bit sized immediates in 16-bit mode. This fixes PR22097. Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D20830 llvm-svn: 275068	2016-07-11 14:23:53 +00:00
Artem Tamazov	53c9de08d2	[AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions. Fixes issue mentioned at: https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13. Lit tests added. Differential Revision: http://reviews.llvm.org/D22133 llvm-svn: 275054	2016-07-11 12:07:18 +00:00
Jan Vesely	2fa28c330c	AMDGPU/R600: Add implicitarg.ptr intrinsic Differential Revision: http://reviews.llvm.org/D21622 llvm-svn: 275024	2016-07-10 21:20:29 +00:00
Matt Arsenault	52a4d9b429	AMDGPU: Move R600 only pieces into R600 classes llvm-svn: 274979	2016-07-09 18:11:15 +00:00
Matt Arsenault	48d70cb486	Revert "AMDGPU: Remove unused control flow intrinsic" llvm-svn: 274978	2016-07-09 17:18:39 +00:00
NAKAMURA Takumi	f35424b73f	AMDGPU: Prune AMDGPUAsmParser in libdeps. llvm-svn: 274970	2016-07-09 07:54:27 +00:00
Matt Arsenault	dfec5ce032	AMDGPU: Fix fdiv lowering when f32 denormals supported Also fix test not actually using function labels. llvm-svn: 274969	2016-07-09 07:48:11 +00:00
Matt Arsenault	1322b6f8bb	AMDGPU: Improve offset folding for register indexing llvm-svn: 274954	2016-07-09 01:13:56 +00:00
Matt Arsenault	95c7897555	AMDGPU: Simplify isSchedulingBoundary llvm-svn: 274953	2016-07-09 01:13:51 +00:00
Matt Arsenault	8f0a92f0ba	AMDGPU: Remove unused control flow intrinsic llvm-svn: 274939	2016-07-08 21:39:44 +00:00
Duncan P. N. Exon Smith	4d29511894	AMDGPU: Remove implicit iterator conversions, NFC Remove remaining implicit conversions from MachineInstrBundleIterator to MachineInstr* from the AMDGPU backend. In most cases, I made them less attractive by preferring MachineInstr& or using a ranged-based for loop. Once all the backends are fixed I'll make the operator explicit so that this doesn't bitrot back. llvm-svn: 274906	2016-07-08 19:16:05 +00:00
Duncan P. N. Exon Smith	221847ef63	AMDGPU: Make infinite loop clear, NFC Change a while loop that was checking for nullptr on an iterator-to-pointer conversion to an infinite for loop. Now it's clear that the condition doesn't terminate. The only change in behaviour is if an invalid iterator (holding nullptr) was passed into AMDGPUCFGStructurizer::reversePredicateSetter. There are only two callers, and they both dereference the iterator before sending it in, so rather than adding an early return to avoid the loop I've just asserted (using a static_cast, to avoid an implicit conversion to pointer). llvm-svn: 274902	2016-07-08 19:00:17 +00:00
Matt Arsenault	b63f18c9c3	AMDGPU: Minor adjustment to r274817 The commit message is inaccurate, modifiesRegister will check for partial defs of exec. We currently don't ever emit partial defs of exec, so it doesn't really matter. llvm-svn: 274886	2016-07-08 17:06:48 +00:00
Valery Pykhtin	68853ab2c5	[AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371) Differential Revision: http://reviews.llvm.org/D22049 llvm-svn: 274852	2016-07-08 15:12:46 +00:00
Matt Arsenault	a74374a86b	AMDGPU: Move si_mask_branch register operand to be a use llvm-svn: 274818	2016-07-08 00:55:44 +00:00
Matt Arsenault	d4a84b1ed2	AMDGPU: Cleanup. Use definesRegister instead of manual loop Also this will be more precise since it will check exec_lo/exec_hi writes. llvm-svn: 274817	2016-07-08 00:55:39 +00:00
Valery Pykhtin	af8b1bddbd	[AMDGPU] fix ds_write_src2 encoding (bz26027) Differential revision: http://reviews.llvm.org/D22041 llvm-svn: 274756	2016-07-07 14:23:38 +00:00
Michael Kuperstein	aa71bdd3af	[TTI] The cost model should not assume vector casts get completely scalarized The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642	2016-07-06 17:30:56 +00:00
Nicolai Haehnle	e40530ea7b	AMDGPU: Fix return of non-void-returning shaders Summary: Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that ensures that a non-void-returning shader falls off the end of the last basic block was effectively disabled, since SI_RETURN is now used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21975 llvm-svn: 274612	2016-07-06 08:35:17 +00:00
Matt Arsenault	e8dbf791b1	AMDGPU: Remove unnecessary string usage in AsmPrinter Registers are printed a lot, so don't create temporary std::strings. Using char instead of a string to an ostream saves a function call. llvm-svn: 274581	2016-07-05 22:06:56 +00:00
Matt Arsenault	ffc8275f2b	AMDGPU: Fix folding SGPRs into madak/madmk src0 Because of the special immediate operand, the constant bus is already used so SGPRs are never useful. r263212 changed the name of the immediate operand, which broke the verifier check for the restriction. llvm-svn: 274564	2016-07-05 17:09:01 +00:00
Tom Stellard	a4b746d808	AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISel Summary: These have been replaced with TableGen code (except for isConstantLoad, which is still used for R600). The queries were broken for cases where MemOperand was a PseudoSourceValue. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21684 llvm-svn: 274561	2016-07-05 16:10:44 +00:00
Valery Pykhtin	e65b39ec09	[AMDGPU] rename DS_1A1D_Off8_NORET to DS_1A2D_Off8_NORET as ds_write2xx use 2 source registers. NFC. llvm-svn: 274556	2016-07-05 15:15:28 +00:00
Sam Kolton	a9cd6aa895	[AMDGPU] Assembler: Fix parsing error with floating-point literals passed to integer instructions Differential Revision: http://reviews.llvm.org/D21972 llvm-svn: 274551	2016-07-05 14:01:11 +00:00
Tom Stellard	4a105d73a9	AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loads This moves of the r600 logic out of isGlobalLoad() and into the TableGen files. Differential Revision: http://reviews.llvm.org/D21710 llvm-svn: 274527	2016-07-05 00:12:51 +00:00
Tom Stellard	17a0ec5400	AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructions Summary: The isGlobalLoad() query was returning true for constant address space loads with memory types less than 32-bits, which is wrong. This logic has been replaced with PatFrag in the TableGen files, to provide the same functionality. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21696 llvm-svn: 274521	2016-07-04 20:41:48 +00:00
Jan Vesely	991dfd7b07	AMDGPU/R600: Add indentation to VTX and TEX fetch asm strings These are printed as part of Fetch clauses. Differential Revision: http://reviews.llvm.org/D21730 llvm-svn: 274517	2016-07-04 19:45:00 +00:00
Matt Arsenault	7f681ac7a9	AMDGPU: Add feature for unaligned access llvm-svn: 274398	2016-07-01 23:03:44 +00:00
Matt Arsenault	8af47a09e5	AMDGPU: Expand unaligned accesses early Due to visit order problems, in the case of an unaligned copy the legalized DAG fails to eliminate extra instructions introduced by the expansion of both unaligned parts. llvm-svn: 274397	2016-07-01 22:55:55 +00:00
Matt Arsenault	327bb5ad82	AMDGPU: Improve load/store of illegal types. There was a combine before to handle the simple copy case. Split this into handling loads and stores separately. We might want to change how this handles some of the vector extloads, since this can result in large code size increases. llvm-svn: 274394	2016-07-01 22:47:50 +00:00
Matt Arsenault	105c2a204c	AMDGPU/SI: Enable testing several variants for si scheduler Enable testing different scheduling variants if sgpr usage is very high. It was previously disabled because of a bug in handleMove, but it has been fixed since. Patch by Axel Davy llvm-svn: 274372	2016-07-01 18:03:46 +00:00
Nikolay Haustov	beb24f5b20	Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2. This was reverted in r268740 because of problems with corresponding Clang change. Clang change was updated and resubmitted in r274220. Check calling convention in AMDGPUMachineFunction::isKernel This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF. Also, in the future unused non-kernels may be optimized. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19917 llvm-svn: 274341	2016-07-01 10:00:58 +00:00
Sam Kolton	5196b88f07	[AMDGPU] Assembler: support SDWA for VOPC instructions Summary: dst_sel and dst_unused disabled for VOPC as they have no effect on result Reviewers: artem.tamazov, tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21376 llvm-svn: 274340	2016-07-01 09:59:21 +00:00
NAKAMURA Takumi	566597330a	Update libdeps; AMDGPUCodeGen requires LLVMVectorize. llvm-svn: 274339	2016-07-01 09:55:23 +00:00
Matt Arsenault	908b9e26a6	AMDGPU: Add option to run the load/store vectorizer llvm-svn: 274329	2016-07-01 03:33:52 +00:00
Matt Arsenault	0994bd57fb	AMDGPU: Implement getLoadStoreVecRegBitWidth llvm-svn: 274312	2016-07-01 00:56:27 +00:00
Duncan P. N. Exon Smith	632987296f	Target: Remove unused arguments from overrideSchedPolicy, NFC TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr* arguments (begin and end) that invite implicit conversions from MachineInstrBundleIterator. One option would be to change their type to an iterator, but since they don't seem to have been used since the API was added in 2010, I'm deleting the dead code. llvm-svn: 274304	2016-07-01 00:23:27 +00:00
Duncan P. N. Exon Smith	e4f5e4f4d1	CodeGen: Use MachineInstr& in TargetLowering, NFC This is a mechanical change to make TargetLowering API take MachineInstr& (instead of MachineInstr), since the argument is expected to be a valid MachineInstr. In one case, changed a parameter from MachineInstr to MachineBasicBlock::iterator, since it was used as an insertion point. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. llvm-svn: 274287	2016-06-30 22:52:52 +00:00
Matt Arsenault	c1142725bd	AMDGPU: Add m0 vgpr load loop block as successor This shows up as a verifier error when I move this earlier, not sure why it didn't before. llvm-svn: 274275	2016-06-30 20:49:28 +00:00
Rafael Espindola	d86e8bb0ed	Delete MCCodeGenInfo. MC doesn't really care about CodeGen stuff, so this was just complicating target initialization. llvm-svn: 274258	2016-06-30 18:25:11 +00:00
Duncan P. N. Exon Smith	9cfc75c214	CodeGen: Use MachineInstr& in TargetInstrInfo, NFC This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189	2016-06-30 00:01:54 +00:00
Matt Arsenault	eb9025d698	AMDGPU: Fix global isel crashes llvm-svn: 274039	2016-06-28 17:42:09 +00:00
Matt Arsenault	254a6450dd	AMDGPU: Fix typo llvm-svn: 274034	2016-06-28 16:59:53 +00:00
Matt Arsenault	3d846501fb	AMDGPU: Remove unused function llvm-svn: 274033	2016-06-28 16:59:49 +00:00
Matt Arsenault	b4d9503171	AMDGPU: Fix out of bounds indirect indexing errors This was producing acceses to registers beyond the super register's limits, resulting in verifier failures. llvm-svn: 273977	2016-06-28 01:09:00 +00:00
Matt Arsenault	55dff27122	AMDGPU: Fix global isel build llvm-svn: 273964	2016-06-28 00:11:26 +00:00
Matt Arsenault	ed1fd7e056	AMDGPU: Set MinInstAlignment Not sure this actually changes anything llvm-svn: 273947	2016-06-27 21:42:49 +00:00
Matt Arsenault	59c0ffa22a	AMDGPU: Implement per-function subtargets llvm-svn: 273940	2016-06-27 20:48:03 +00:00
Matt Arsenault	03d8584590	AMDGPU: Move subtarget feature checks into passes llvm-svn: 273937	2016-06-27 20:32:13 +00:00
Matt Arsenault	21a4625a16	AMDGPU: Fix verifier errors with undef vector indices Also fix pointlessly adding exec to liveins. llvm-svn: 273916	2016-06-27 19:57:44 +00:00
Simon Pilgrim	634dde36b4	Fix "not all control paths return a value" warning on MSVC llvm-svn: 273872	2016-06-27 12:58:10 +00:00
NAKAMURA Takumi	5cbd41e0a8	SIMachineFunctionInfo.cpp: Appease msc18 to use std::array. llvm-svn: 273860	2016-06-27 10:26:43 +00:00
NAKAMURA Takumi	f619b50fab	Reformat. llvm-svn: 273859	2016-06-27 10:26:36 +00:00
NAKAMURA Takumi	d377ad806e	Reformat blank lines. llvm-svn: 273858	2016-06-27 10:26:25 +00:00
Jan Vesely	3bc1af2be4	AMDGPU/R600: Fix GlobalValue regressions. Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary expressions by including offset even if the offset it 0 (we haven't hit this sooner since tested workloads don't include static offsets) We don't really care about the type of expression, so set it directly. Fixes: r272705 Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32. Fixes: r273166 Differential Revision: http://reviews.llvm.org/D21633 llvm-svn: 273785	2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov	f2f3d14774	[AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue. Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format: - offset 0: work group ID x - offset 4: work group ID y - offset 8: work group ID z - offset 16: work item ID x - offset 20: work item ID y - offset 24: work item ID z Set - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled Differential Revision: http://reviews.llvm.org/D20335 llvm-svn: 273769	2016-06-25 03:11:28 +00:00
Tom Stellard	b164a9843b	AMDGPU/SI: Make sure not to fold offsets into local address space globals Summary: Offset folding only works if you are emitting relocations, and we don't emit relocations for local address space globals. Reviewers: arsenm, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21647 llvm-svn: 273765	2016-06-25 01:59:16 +00:00
Matthias Braun	1e374a7aa6	AMDGPU: Define a schedule class for COPY. COPY was lacking a scheduling class, define it to avoid regressions in the upcoming change to the bidirectional MachineScheduler. Approved by tstellar on IRC. Differential Revision: http://reviews.llvm.org/D21540 llvm-svn: 273751	2016-06-24 23:52:11 +00:00
Matt Arsenault	86de486d31	AMDGPU: Add stub custom CodeGenPrepare pass This will do various things including ones CodeGenPrepare does, but with knowledge of uniform values. llvm-svn: 273657	2016-06-24 07:07:55 +00:00
Matt Arsenault	c581611e11	AMDGPU: Remove disable-irstructurizer subtarget feature The only real reason to use it is for testing, so replace it with a command line option instead of a potentially function dependent feature. llvm-svn: 273653	2016-06-24 06:30:22 +00:00
Matt Arsenault	43e92fe306	AMDGPU: Cleanup subtarget handling. Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652	2016-06-24 06:30:11 +00:00
Tom Stellard	14416ae6cd	Support/ELF: Add R_AMDGPU_GOTPCREL relocation Summary: We will start generating this in a future patch. Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21482 llvm-svn: 273628	2016-06-23 23:11:29 +00:00
Matt Arsenault	8d4b0eddd6	AMDGPU: Add option to disable spilling SGPRs to VGPRs. This can help debug spilling problems. llvm-svn: 273605	2016-06-23 20:00:34 +00:00
Valery Pykhtin	a852d695b8	[AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields. Differential Revision: http://reviews.llvm.org/D21380 llvm-svn: 273561	2016-06-23 14:13:06 +00:00
Diana Picus	e440f99913	[AMDGPU] Remove exit-on-error in test (PR27761) The exit-on-error flag was necessary in order to avoid an assertion when handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize. We can avoid the assertion by creating some dummy nodes. This enables us to remove the exit-on-error flag on the first 2 run lines (SI), but on the third run line (R600) we would run into another assertion when trying to reserve indirect registers. This patch also replaces that assertion with an early exit from the function. Fixes PR27761. Differential Revision: http://reviews.llvm.org/D20852 llvm-svn: 273550	2016-06-23 09:19:16 +00:00
Matt Arsenault	529cf25e60	AMDGPU: readlane/writelane do not read exec llvm-svn: 273525	2016-06-23 01:26:16 +00:00
Matt Arsenault	3cb4ddeb4e	AMDGPU: Fix liveness when expanding m0 loop llvm-svn: 273514	2016-06-22 23:40:57 +00:00
Changpeng Fang	47efe1f6db	AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32 Reviewers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D21533 llvm-svn: 273496	2016-06-22 21:33:49 +00:00
Matt Arsenault	4a07bf6372	AMDGPU: Run verifier after 2nd run of SIShrinkInstructions llvm-svn: 273469	2016-06-22 20:26:24 +00:00
Matt Arsenault	9babdf4265	AMDGPU: Fix verifier errors in SILowerControlFlow The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467	2016-06-22 20:15:28 +00:00
Matt Arsenault	0e5befe315	AMDGPU: Make FrameLowering stack alignment 16 We don't need it to be that high. The natural alignment for a single workitem's stack is 16. llvm-svn: 273448	2016-06-22 17:47:39 +00:00
Matt Arsenault	180e0d5cef	AMDGPU: Fix gcc warnings Mostly removing dead code. Apparently gcc's warning for unused functions is better llvm-svn: 273363	2016-06-22 01:53:49 +00:00
Rafael Espindola	7b4ef068c6	Delete more dead code. Found by gcc 6. llvm-svn: 273322	2016-06-21 21:51:41 +00:00
Jan Vesely	fea814d531	AMDGPU: Add implicitarg.ptr intrinsic. Points to the start of implicit arguments (appended after explicit arguments) Differential Revision: http://reviews.llvm.org/D20297 llvm-svn: 273317	2016-06-21 20:46:20 +00:00
Rafael Espindola	48975881ab	Delete some dead code. Found by gcc 6. llvm-svn: 273303	2016-06-21 19:48:12 +00:00
Matt Arsenault	2209625387	AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32 The implicit operand is added by the initial instruction construction, so this was adding an additional vcc use. The original one was missing the undef flag the original condition had, so the verifier would complain. llvm-svn: 273182	2016-06-20 18:34:00 +00:00
Matt Arsenault	b6d8c37e1a	AMDGPU: Fold more custom nodes to undef This will help sneak undefs past GVN into the DAG for some tests. Also add missing intrinsic for rsq_legacy, even though the node was already selected to the instruction. Also start passing the debug location to intrinsic errors. llvm-svn: 273181	2016-06-20 18:33:56 +00:00
Matt Arsenault	ff98241f37	Generalize DiagnosticInfoStackSize to support other limits Backends may want to report errors on resources other than stack size. llvm-svn: 273177	2016-06-20 18:13:04 +00:00
Matt Arsenault	a9720c67f1	AMDGPU: Use correct method for determining instruction size llvm-svn: 273172	2016-06-20 17:51:32 +00:00
Tom Stellard	5350894265	AMDGPU: Add support for R_AMDGPU_REL32 relocations Reviewers: arsenm, kzhuravl, rafael Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21401 llvm-svn: 273168	2016-06-20 17:33:43 +00:00
Tom Stellard	1c89eb7db0	AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations Reviewers: arsenm, rafael, kzhuravl Subscribers: rafael, arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21400 llvm-svn: 273166	2016-06-20 16:59:44 +00:00
NAKAMURA Takumi	fd92154b20	Reformat blank lines. llvm-svn: 273131	2016-06-20 01:05:15 +00:00
NAKAMURA Takumi	fe1202c4cb	Untabify. llvm-svn: 273129	2016-06-20 00:37:41 +00:00
Matt Arsenault	e935f05a94	AMDGPU: Fix kernel argument alignment impacting stack size Don't use AllocateStack because kernel arguments have nothing to do with the stack. The ensureMaxAlignment call was still changing the stack alignment. llvm-svn: 273080	2016-06-18 05:15:53 +00:00
Matt Arsenault	0bb294b224	AMDGPU: Temporarily select trap to s_endpgm This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062	2016-06-17 22:27:03 +00:00
Tom Stellard	0114bb5aa0	AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress() This change were suggested in http://reviews.llvm.org/D21154. llvm-svn: 273059	2016-06-17 22:22:09 +00:00
Matt Arsenault	8885910f8e	AMDGPU: Remove llvm.SI.tid intrinsic Mesa doesn't emit this for llvm >= 3.8 anymore. llvm-svn: 273050	2016-06-17 21:18:41 +00:00
Changpeng Fang	3e06e1edac	AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958	2016-06-16 21:20:47 +00:00
Matt Arsenault	01e062f5c6	AMDGPU: Fix maximum instruction size for amdgcn This was causing the conservative estimate of inline asm size to be twice as big as expected. llvm-svn: 272956	2016-06-16 21:14:05 +00:00
Wei Ding	ab3d91b8f1	AMDGPU: Add v_mad 16-bit instructions definition. Differential Revision: http://reviews.llvm.org/D21362 llvm-svn: 272919	2016-06-16 16:50:04 +00:00
Valery Pykhtin	02e2086e41	[AMDGPU] Fix few coding style issues. NFC. llvm-svn: 272785	2016-06-15 13:55:09 +00:00
Nicolai Haehnle	a609259832	AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761	2016-06-15 07:13:05 +00:00
Tom Stellard	82785e9fe7	AMDGPU/SI: Correctly encode constant expressions Summary: We we have an MCConstantExpr, we can encode it directly into the instruction instead of emitting fixups. Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21236 Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948 llvm-svn: 272750	2016-06-15 03:09:39 +00:00
Tom Stellard	89049702ce	AMDGPU/AsmParser: Add support for parsing symbol operands Summary: We can now reference symbols directly in operands, like this: s_mov_b32 s0, global Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21038 llvm-svn: 272748	2016-06-15 02:54:14 +00:00
Matt Arsenault	f42c69206d	AMDGPU: Run pointer optimization passes llvm-svn: 272736	2016-06-15 00:11:01 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Tom Stellard	bf3e6e5bb4	AMDGPU/SI: Refactor fixup handling for constant addrspace variables Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705	2016-06-14 20:29:59 +00:00
Tom Stellard	b1a523fa68	Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables" This reverts commit r272675. llvm-svn: 272677	2016-06-14 15:16:35 +00:00
Tom Stellard	5e6298b0f2	AMDGPU/SI: Refactor fixup handling for constant addrspace variables Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675	2016-06-14 15:11:01 +00:00
Artem Tamazov	17091364d1	[AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly source (.option.machine_version...) The feature allows for conditional assembly etc. TODO: make those symbols read-only. Test added. Differential Revision: http://reviews.llvm.org/D21238 llvm-svn: 272673	2016-06-14 15:03:59 +00:00
Marek Olsak	e93f6d6923	AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556	2016-06-13 16:05:57 +00:00
Matt Arsenault	80bc355048	AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554	2016-06-13 15:53:52 +00:00
Benjamin Kramer	d3f4c05aea	Move instances of std::function. Or replace with llvm::function_ref if it's never stored. NFC intended. llvm-svn: 272513	2016-06-12 16:13:55 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Tom Stellard	f3af841462	AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations Summary: We need to set the fixup type to FK_Data_4 for the SCRATCH_RSRC_DWORD[01] symbols, since these require absolute relocations, and fixup_si_rodata is for relative relocations. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21153 llvm-svn: 272417	2016-06-10 19:26:38 +00:00
Sam Kolton	945231ada5	[AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand. Summary: sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported. Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier. Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers. Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...). Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20968 llvm-svn: 272384	2016-06-10 09:57:59 +00:00
Matt Arsenault	37fefd68cc	AMDGPU: Fix trailing whitespace llvm-svn: 272364	2016-06-10 02:18:02 +00:00
Matt Arsenault	58ddad5bd6	AMDGPU: v_cndmask_b32 does not def vcc Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351	2016-06-10 00:18:41 +00:00
Tom Stellard	26a2ab7477	AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_permute Summary: This fixes a bug with ds_permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349	2016-06-10 00:01:04 +00:00
Tom Stellard	1d3940e8cc	AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMI Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19823 llvm-svn: 272346	2016-06-09 23:48:02 +00:00
Matt Arsenault	7757c59e48	AMDGPU: Fix flat atomics The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345	2016-06-09 23:42:54 +00:00
Matt Arsenault	887018179a	AMDGPU: Fix i64 global cmpxchg This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344	2016-06-09 23:42:48 +00:00
Matt Arsenault	e2bd9a32bc	AMDGPU: Run verifer after insert waits pass llvm-svn: 272338	2016-06-09 23:19:14 +00:00
Matt Arsenault	cfb61e780a	AMDGPU: Remove incorrect assertion I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337	2016-06-09 23:19:08 +00:00
Matt Arsenault	c3a01ec9db	AMDGPU: Properly initialize SIShrinkInstructions llvm-svn: 272336	2016-06-09 23:18:47 +00:00
Wei Ding	ed0f97fad2	AMDGPU/SI: Fix 32-bit fdiv lowering We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292	2016-06-09 19:17:15 +00:00
Sam Kolton	c9bdcb75c4	[AMDGPU] Disassembler: Support for sdwa instructions Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21129 llvm-svn: 272255	2016-06-09 11:04:45 +00:00
Nicolai Haehnle	c00e03b8f5	AMDGPU: Add amdgpu-ps-wqm-outputs function attributes Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063	2016-06-07 21:37:17 +00:00
Eric Christopher	538d09d0dd	Revert "Differential Revision: http://reviews.llvm.org/D20557 " Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056	2016-06-07 20:27:12 +00:00
Wei Ding	a70216f1b3	Differential Revision: http://reviews.llvm.org/D20557 llvm-svn: 272044	2016-06-07 19:04:44 +00:00
Matt Arsenault	02458c2d27	AMDGPU: Add function for getting instruction size llvm-svn: 271936	2016-06-06 20:10:33 +00:00
Matt Arsenault	3b2e2a59e8	AMDGPU: Fix constantexpr addrspacecasts If we had a constant group address space cast the queue pointer wasn't enabled for the function, resulting in a crash on noreg later. llvm-svn: 271935	2016-06-06 20:03:31 +00:00
Artem Tamazov	135487767b	[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC. Another step for unification llvm assembler/disassembler with sp3. Besides, CodeGen output is a bit improved, thus changes in CodeGen tests. Assembler/Disassembler tests updated/added. Differential Revision: http://reviews.llvm.org/D20796 llvm-svn: 271900	2016-06-06 15:23:43 +00:00
Artem Tamazov	f88397c84c	[test/AMDGPU] Square-braced-syntax for registers: add macro test/example. Test added as per discussion in http://reviews.llvm.org/D20588. The macro is just a demonstration, useless in practice. Coding style fixes. Differential Revision: http://reviews.llvm.org/D20797 llvm-svn: 271675	2016-06-03 14:41:17 +00:00
Sam Kolton	a4a99ad1bc	[AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float modifiers. Summary: Depends on D20625 Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20674 llvm-svn: 271662	2016-06-03 11:43:09 +00:00
Sam Kolton	05ef1c940f	[AMDGPU] Assembler: Custom converters for SDWA instructions. Support for _dpp and _sdwa suffixes in mnemonics. Summary: Added custom converters for SDWA instruction to support optional operands and modifiers. Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20625 llvm-svn: 271655	2016-06-03 10:27:37 +00:00
Matt Arsenault	43578ec2a8	AMDGPU: Handle flat in getMemOpBaseRegImmOfs It can still report the base register, and the uses give up when it fails. llvm-svn: 271575	2016-06-02 20:05:20 +00:00
Matt Arsenault	d1097a38e2	AMDGPU: Cleanup load tests There are a lot of different kinds of loads to test for, and these were scattered around inconsistently with some redundancy. Try to comprehensively test all loads in a consistent way. llvm-svn: 271571	2016-06-02 19:54:26 +00:00
Matt Arsenault	52dec8d36a	AMDGPU: Temporary fix for broken store combine llvm-svn: 271567	2016-06-02 19:00:55 +00:00
Matt Arsenault	8e00194be8	AMDGPU: Fix crashes on unknown processor name If the processor name failed to parse for amdgcn, the resulting output would have R600 ISA in it. If the processor name was missing or invalid for R600, the wavefront size would not be set and there would be crashes from missing itinerary data. Fixes crashes in future commit caused by dividing by the unset/0 wavefront size. llvm-svn: 271561	2016-06-02 18:37:16 +00:00
Matt Arsenault	598f55387a	AMDGPU: Fix incorrectly setting kill flag when copying register tuples This fixes some verifier errors when trackLivenessAfterRegAlloc is enabled. llvm-svn: 271446	2016-06-02 00:04:30 +00:00
Matt Arsenault	d3e4c646ea	AMDGPU: SIDebuggerInsertNops preserves CFG This saves an additional run of the DominatorTree and MachineLoopInfo llvm-svn: 271444	2016-06-02 00:04:22 +00:00
Matt Arsenault	ec30eb507e	AMDGPU: Remove unused address space Also return a single StringRef instead of building a string. llvm-svn: 271296	2016-05-31 16:57:45 +00:00
Matt Arsenault	d8d304d1d6	AMDGPU: Fix trailing whitespace llvm-svn: 271081	2016-05-28 00:50:51 +00:00
Matt Arsenault	7401516985	AMDGPU: Add fract intrinsic Remove broken patterns matching it. This was matching the unsafe math pattern and expanding the fix for the buggy instruction from the pattern. The problems are also on CI. Remove the workarounds and only use fract with unsafe math or from the intrinsic. llvm-svn: 271078	2016-05-28 00:19:52 +00:00
Artem Tamazov	7da9b82e02	[AMDGPU][llvm-mc] Square-braced-syntax for registers - make ":expr2" optional. Register numbers may be specified as assembly-time expressions. This feature can be useful in macros and alike. However, expressions are supported within sqare braces only. Sqare braces were initially intended to support specifying of multiple (pairs/quads...) registers. Syntax like v[8:8] which specifies single register is also supported. That allows expressions but looks a bit unnatural. This change supports syntax REG[EXPR]. Tests added. Differential Revision: http://reviews.llvm.org/D20588 llvm-svn: 270990	2016-05-27 12:50:13 +00:00
Benjamin Kramer	4fed928f53	Avoid some copies by using const references. clang-tidy's performance-unnecessary-copy-initialization with some manual fixes. No functional changes intended. llvm-svn: 270988	2016-05-27 12:30:51 +00:00
Benjamin Kramer	3e9a5d3468	Apply clang-tidy's misc-static-assert where it makes sense. Also fold conditions into assert(0) where it makes sense. No functional change intended. llvm-svn: 270982	2016-05-27 11:36:04 +00:00
Changpeng Fang	71369b3a39	AMDGPU/SI: Enable load-store-opt by default. Summary: Enable load-store-opt by default, and update LIT tests. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D20694 llvm-svn: 270894	2016-05-26 19:35:29 +00:00
Artem Tamazov	6edc135d0f	[AMDGPU][llvm-mc] s_getreg/setreg* - hwreg - factor out strings/literals etc. Hwreg(...) syntax implementation unified with sendmsg(...). Common strings moved to Utils MathExtras.h functionality utilized. Added missing build dependency in Disassembler. Differential Revision: http://reviews.llvm.org/D20381 llvm-svn: 270871	2016-05-26 17:00:33 +00:00
Artem Tamazov	b49c3361e5	Fix build warning introduced in r270552 "[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers." llvm-svn: 270859	2016-05-26 15:52:16 +00:00
Diana Picus	81bc3170e8	[AMDGPU] Remove exit-on-error flag from test (PR27762) Similar to r269948, but for argument lowering. Fixes PR27762 Differential Revision: http://reviews.llvm.org/D20430 llvm-svn: 270856	2016-05-26 15:24:55 +00:00
Matt Arsenault	e57206d81b	AMDGPU: Fix v2i64/v2f64 bitcasts These operations tend to get promoted away to v4i32 so this doesn't happen often. llvm-svn: 270740	2016-05-25 18:07:36 +00:00
Matt Arsenault	1cc4991412	AMDGPU: Fix inconsistent lowering of select of vectors f32 vectors would use a sequence of BFI instructions instead of unrolled cmp + select. This was better in the case of a VALU select with SGPR inputs, but we don't have a way of dealing with that in the DAG. llvm-svn: 270731	2016-05-25 17:34:58 +00:00
Nirav Dave	e003bb7ead	Soften assertion in AMDGPU emitPrologue. [AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not the scratch descriptor. Continue search if unused register found fails other requirements. Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20526 llvm-svn: 270646	2016-05-25 01:45:42 +00:00
Konstantin Zhuravlyov	29ddd2b2f2	[AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs Differential Revision: http://reviews.llvm.org/D20081 llvm-svn: 270594	2016-05-24 18:37:18 +00:00
Sam Kolton	11de370cca	[AMDGPU] Assembler: rework parsing of optional operands. Summary: Change process of parsing of optional operands. All optional operands use same parsing method - parseOptionalOperand(). No default values are added to OperandsVector. Get rid of WORKAROUND_USE_DUMMY_OPERANDS_INSTEAD_MUTIPLE_DEFAULT_OPERANDS. Reviewers: tstellarAMD, vpykhtin, artem.tamazov, nhaustov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20527 llvm-svn: 270556	2016-05-24 12:38:33 +00:00
Artem Tamazov	212a251c8d	[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers. Differential Revision: http://reviews.llvm.org/D20476 llvm-svn: 270552	2016-05-24 12:05:16 +00:00
Sam Kolton	1bdcef7697	[AMDGPU] Assembler: refactor parsing of modifiers and immediates. Allow modifiers for imms. Reviewers: nhaustov, tstellarAMD Subscribers: kzhuravl, arsenm Differential Revision: http://reviews.llvm.org/D20166 llvm-svn: 270415	2016-05-23 09:59:02 +00:00
Matt Arsenault	7f9eabd2c2	AMDGPU: Define priorities for register classes Allocating larger register classes first should give better allocation results (and more importantly for myself, make the lit tests more stable with respect to scheduler changes). Patch by Matthias Braun llvm-svn: 270312	2016-05-21 03:55:07 +00:00
Matt Arsenault	71e6676169	AMDGPU: Cleanup lowering actions These are kind of a mess and hard to follow, particularly for loads and stores. Fix various redundant, unnecessary and dead settings. llvm-svn: 270307	2016-05-21 02:27:49 +00:00
Matt Arsenault	81a709503d	AMDGPU: Fix high bits after division optimization This is essentially doing a 24-bit signed division with FP. We need to truncate to the N bit result. llvm-svn: 270305	2016-05-21 01:53:33 +00:00
Matt Arsenault	b6e1cc2a92	AMDGPU: Fix verifier error when spilling SGPRs The current SGPR spilling test does not stress this because it is using s_buffer_load instructions to increase SGPR pressure and spill, but their output operands have the same SReg_32_XM0 constraint. This fixes an error when the SReg_32 output from most instructions is spilled. llvm-svn: 270301	2016-05-21 00:53:42 +00:00
Matt Arsenault	8f5e008534	AMDGPU: Fix relationship between SReg_32 and SReg_32_XM0 llvm-svn: 270300	2016-05-21 00:53:28 +00:00
Matt Arsenault	4945905f5f	AMDGPU: Handle cbranch vccz/vccnz llvm-svn: 270297	2016-05-21 00:29:40 +00:00
Matt Arsenault	72fcd5f597	AMDGPU: Implement ReverseBranchCondition llvm-svn: 270296	2016-05-21 00:29:34 +00:00
Matt Arsenault	6d09380532	AMDGPU: Implement AnalyzeBranch Original patch by Tom Stellard llvm-svn: 270295	2016-05-21 00:29:27 +00:00
Matt Arsenault	4e3d383c46	AMDGPU: Remove pointless conversions llvm-svn: 270139	2016-05-19 21:09:58 +00:00
Matt Arsenault	4318ea354a	AMDGPU: Also look for s_cbranch_vccz llvm-svn: 270091	2016-05-19 18:20:25 +00:00
Artem Tamazov	8ce1f7177b	[AMDGPU][llvm-mc] Fixes to support buffer atomics. Fixes for MUBUF_Atomic instructions to make operand list valid: - For RTN insns, make a copy of $vdata_in operand as $vdata. - Do not add operand for GLC, it is hardcoded and comes as a token. Workaround to avoid adding multiple default optional operands. Tests added. Differential Revision: http://reviews.llvm.org/D20257 llvm-svn: 270049	2016-05-19 12:22:39 +00:00
Matt Arsenault	c5bebac934	AMDGPU: Fix verifier error when spilling undef subreg llvm-svn: 270002	2016-05-18 23:35:53 +00:00
Matt Arsenault	c438ef574d	AMDGPU: Fix promote alloca for pointer loads If the load has a pointer type, we don't want to change its type. llvm-svn: 270000	2016-05-18 23:20:24 +00:00
Rafael Espindola	8c34dd8257	Delete Reloc::Default. Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988	2016-05-18 22:04:49 +00:00
Jan Vesely	ae265c03f7	AMDGPU: Fix incorrect simm check Use signed division otherwise all back jumps fail the check Fixes regression introduced in r269951 Differential Revision: http://reviews.llvm.org/D20380 llvm-svn: 269972	2016-05-18 19:07:58 +00:00
Matt Arsenault	a519cf593f	AMDGPU: Error if branch distance exceeds limit llvm-svn: 269951	2016-05-18 16:10:24 +00:00
Matt Arsenault	1735da460b	AMDGPU: Other sizes of popcnt are fast We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950	2016-05-18 16:10:19 +00:00
Matt Arsenault	9430b9113a	AMDGPU: Fix assert when erroring on a call For some reason an assert is now hit when a valid chain is not returned, so return the entry chain. llvm-svn: 269948	2016-05-18 16:10:11 +00:00
Matt Arsenault	891fccc0c1	AMDGPU: Handle alloca promoting with null operands If the second pointer in a multi-pointer instruction is a constant, we can replace the type. llvm-svn: 269945	2016-05-18 15:57:21 +00:00
Matt Arsenault	bde80346c1	AMDGPU: Don't run passes that aren't useful llvm-svn: 269943	2016-05-18 15:41:07 +00:00
Matt Arsenault	ab3429c2b4	AMDGPU: Fix assert on ttmp registers Use register class that does not include them when looking for unallocated registers. This is hit by the udiv v8i64 test in the opencl integer conformance test, and takes a few seconds to compile in a debug build so no test included. llvm-svn: 269938	2016-05-18 15:19:50 +00:00
Jan Vesely	687ca8df18	AMDGPU/R600: Use correct number of vector elements when lowering private loads Reviewer: tstellardAMD, arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D20032 llvm-svn: 269725	2016-05-16 23:56:32 +00:00
Matt Arsenault	8a028bf4d7	AMDGPU: Fix promote alloca pass creating huge arrays This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708	2016-05-16 21:19:59 +00:00
Jan Vesely	91aacad9c3	AMDGPU: Unify LowerGlobalAddress Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19794 llvm-svn: 269481	2016-05-13 20:39:34 +00:00
Jan Vesely	1680039a7a	AMDGPU/R600: Fold global address operand Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19793 llvm-svn: 269480	2016-05-13 20:39:31 +00:00
Jan Vesely	f97de00745	AMDGPU/R600: Implement memory loads from constant AS Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19792 llvm-svn: 269479	2016-05-13 20:39:29 +00:00
Jan Vesely	a1f9fdfcbc	AMDGPU/R600: Add support for emitting MCExpr Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19791 llvm-svn: 269478	2016-05-13 20:39:26 +00:00
Jan Vesely	7971464da8	AMDGPU: Add support for MCExpr to instruction printer Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19790 llvm-svn: 269477	2016-05-13 20:39:24 +00:00
Jan Vesely	4368c1cf7e	AMDGPU/R600: Use machine operands instead of ints to track literals This will be used for global addresses Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19789 llvm-svn: 269476	2016-05-13 20:39:22 +00:00
Jan Vesely	fac8d7ecb1	AMDGPU/R600: There are other uses for ALU_LITERAL besides Imm This will be used for GV Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19788 llvm-svn: 269475	2016-05-13 20:39:20 +00:00
Jan Vesely	fbcb754e66	AMDGPU: Make CONST_DATA_PTR available to R600 Rename to AMDGPUconstdata_ptr Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19786 llvm-svn: 269474	2016-05-13 20:39:18 +00:00
Jan Vesely	81f1b30035	AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2) Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19785 llvm-svn: 269473	2016-05-13 20:39:16 +00:00
Konstantin Zhuravlyov	e3d322af57	[AMDGPU] Update nop insertion for debugger usage - Insert one nop for each high level statement instead of two - Do not insert nop before prologue Differential Revision: http://reviews.llvm.org/D20215 llvm-svn: 269452	2016-05-13 18:21:28 +00:00
Matt Arsenault	999f7dd84c	AMDGPU: Remove verifier check for scc live ins We only really need this to be true for SIFixSGPRCopies. I'm not sure there's any way this could happen before that point. Fixes a case where MachineCSE could introduce a cross block scc use. llvm-svn: 269391	2016-05-13 04:15:48 +00:00
Justin Bogner	95927c0fd0	SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISel - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. - Where we were calling SelectNodeTo, just return afterwards. Part of llvm.org/pr26808. llvm-svn: 269349	2016-05-12 21:03:32 +00:00
Matt Arsenault	8300272823	AMDGPU: Fix getIntegerAttribute type and error message llvm-svn: 269268	2016-05-12 02:45:18 +00:00
Matt Arsenault	a61cb48dd2	AMDGPU: Fix breaking IR on instructions with multiple pointer operands The promote alloca pass would attempt to promote an alloca with a select, icmp, or phi user, even though the other operand was from a non-promotable source, producing a select on two different pointer types. Only do this if we know that both operands derive from the same alloca. In the future we should be able to relax this to an alloca which will also be promoted. llvm-svn: 269265	2016-05-12 01:58:58 +00:00
Matt Arsenault	4234542503	AMDGPU: Make some instructions convergent llvm-svn: 269147	2016-05-11 00:32:31 +00:00
Matt Arsenault	e8ed8e59e5	AMDGPU: Change private_element_size to 4 llvm-svn: 269145	2016-05-11 00:28:54 +00:00
Peter Collingbourne	dba995601b	Cloning: Clean up the interface to the CloneFunction function. Remove the ModuleLevelChanges argument, and the ability to create new subprograms for cloned functions. The latter was added without review in r203662, but it has no in-tree clients (all non-test callers pass false for ModuleLevelChanges [1], so it isn't reachable outside of tests). It also isn't clear that adding a duplicate subprogram to the compile unit is always the right thing to do when cloning a function within a module. If this functionality comes back it should be accompanied with a more concrete use case. Furthermore, all in-tree clients add the returned function to the module. Since that's pretty much the only sensible thing you can do with the function, just do that in CloneFunction. [1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction Differential Revision: http://reviews.llvm.org/D18628 llvm-svn: 269110	2016-05-10 20:23:24 +00:00
Konstantin Zhuravlyov	a791932145	[AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops Differential Revision: http://reviews.llvm.org/D20117 llvm-svn: 269098	2016-05-10 18:33:41 +00:00
Matthias Braun	31d19d43c7	CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC Many files include Passes.h but only a fraction needs to know about the TargetPassConfig class. Move it into an own header. Also rename Passes.cpp to TargetPassConfig.cpp while we are at it. llvm-svn: 269011	2016-05-10 03:21:59 +00:00
Simon Pilgrim	0a81921cdb	Fixed unused but set variable warning llvm-svn: 268931	2016-05-09 16:42:23 +00:00
Matt Arsenault	a949dc619c	AMDGPU: Fold shift into cvt_f32_ubyteN llvm-svn: 268930	2016-05-09 16:29:50 +00:00
Artem Tamazov	f0b6b40fa4	[AMDGPU][llvm-mc] Some refactoring of .td files Some custom Operands and AsmOperandClasses moved to proper place. No functional changes. Differential Revision: http://reviews.llvm.org/D20012 llvm-svn: 268780	2016-05-06 19:32:38 +00:00
Artem Tamazov	ebe71ce36a	[AMDGPU][llvm-mc] Add support for sendmsg(...) syntax. Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax in s_sendmsg and s_sendmsghalt instructions. The syntax matches the SP3 assembler/disassembler rules. That is why implicit inputs (like M0 and EXEC) are not printed to disassembly output anymore. sendmsg(...) allows only known message types and attributes, even if literals are used instead of symbolic names. However, raw literal (without "sendmsg") still can be used, and that allows for any 16-bit value. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19596 llvm-svn: 268762	2016-05-06 17:48:48 +00:00
Nikolay Haustov	6eb050ea4e	Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2." This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380. It broke calling OpenCL kernel from another kernel. llvm-svn: 268739	2016-05-06 14:59:04 +00:00
Sam Kolton	5f10a137d0	[TableGen] AsmMatcher: support for default values for optional operands Summary: This change allows to specify "DefaultMethod" for optional operand (IsOptional = 1) in AsmOperandClass that return default value for operand. This is used in convertToMCInst to set default values in MCInst. Previously if you wanted to set default value for operand you had to create custom converter method. With this change it is possible to use standard converters even when optional operands presented. Reviewers: tstellarAMD, ab, craig.topper Subscribers: jyknight, dsanders, arsenm, nhaustov, llvm-commits Differential Revision: http://reviews.llvm.org/D18242 llvm-svn: 268726	2016-05-06 11:31:17 +00:00
Nikolay Haustov	dc1bb79b92	AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2. Summary: Check calling convention in AMDGPUMachineFunction::isKernel This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF. Also, in the future unused non-kernels may be optimized. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19917 llvm-svn: 268719	2016-05-06 09:23:13 +00:00
Justin Bogner	b012699741	SDAG: Rename Select->SelectImpl and repurpose Select as returning void This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693	2016-05-05 23:19:08 +00:00
Matt Arsenault	539ca882c6	AMDGPU: Simplify control flow / conditions llvm-svn: 268676	2016-05-05 20:27:02 +00:00
Nicolai Haehnle	ffbd56a1c9	AMDGPU: Uniform branch conditions can originate with intrinsics Summary: Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS GL43-CTS.shader_storage_buffer_object.advanced-matrix. In this particular case, the buffer load intrinsic fed into a uniform conditional branch, and led the brcond lowering down the wrong path. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19931 llvm-svn: 268650	2016-05-05 17:36:36 +00:00
Tom Stellard	fcfaea4cff	AMDGPU/SI: Add support for AMD code object version 2. Summary: Version 2 is now the default. If you want to emit version 1, use the amdgcn--amdhsa-amdcov1 triple. Reviewers: arsenm, kzhuravl Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19283 llvm-svn: 268647	2016-05-05 17:03:33 +00:00
Jan Vesely	bbc2231983	AMDGPU/R600: Minor cleanup in InstrInfo Use std::make_pair instead of constructor Use C++11 loop Reuse helper var Reviewers: tstellardAMD Subsribers: arsenm Differential Revision: http://reviews.llvm.org/D19787 llvm-svn: 268503	2016-05-04 14:55:45 +00:00
Tom Stellard	4a304b3886	AMDGPU/SI: Use range loops to simplify some code in the SI Scheduler Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19822 llvm-svn: 268396	2016-05-03 16:30:56 +00:00
Aaron Ballman	3bd56b3b43	Silence unused variable warning; NFC. llvm-svn: 268392	2016-05-03 15:17:25 +00:00
Matt Arsenault	bcdfee7030	AMDGPU: Custom lower v2i32 loads and stores This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296	2016-05-02 20:13:51 +00:00
Tom Stellard	154c9cdd24	AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295	2016-05-02 20:11:44 +00:00
Matt Arsenault	2b957b5a6f	AMDGPU: Make i64 loads/stores promote to v2i32 Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293	2016-05-02 20:07:26 +00:00
Reid Kleckner	0549ab6033	Fix instance of -Winconsistent-missing-override in AMDGPU code llvm-svn: 268289	2016-05-02 19:45:10 +00:00
Tom Stellard	ce5e994887	AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287	2016-05-02 19:37:56 +00:00
Tom Stellard	27233b727f	AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cpp Reviewers: arsenm Subscribers: jvesely, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19736 llvm-svn: 268267	2016-05-02 18:05:17 +00:00
Tom Stellard	341e293d67	AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260 We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265	2016-05-02 18:02:24 +00:00
Tom Stellard	1f520e5c98	AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260	2016-05-02 17:39:06 +00:00
Nicolai Haehnle	119d3d80cb	AMDGPU: llvm.SI.fs.constant is a source of divergence Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259	2016-05-02 17:37:01 +00:00
Tom Stellard	a27007eb4f	AMDGPU/SI: Use hazard recognizer to detect DPP hazards Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247	2016-05-02 16:23:09 +00:00
Aaron Ballman	5c190d056d	Silence unused variable warnings; NFC. llvm-svn: 268234	2016-05-02 14:48:03 +00:00
Rafael Espindola	92dd7b82be	Add missing override. llvm-svn: 268163	2016-04-30 15:18:21 +00:00
Tom Stellard	c51e4468b7	AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits This was supposed to be part of r268143. llvm-svn: 268154	2016-04-30 04:04:48 +00:00
Tom Stellard	cb6ba62d6f	AMDGPU/SI: Enable the post-ra scheduler Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143	2016-04-30 00:23:06 +00:00
Matt Arsenault	701c21ea10	AMDGPU: Fix crash with unreachable terminators. If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119	2016-04-29 21:52:13 +00:00
Matt Arsenault	dc4ebad6d4	AMDGPU: Add kernarg.segment.ptr intrinsic llvm-svn: 268105	2016-04-29 21:16:52 +00:00
Matt Arsenault	cf2744f1c8	AMDGPU/SI: Move post regalloc run of SIShrinkInstructions Move to addPreEmitPass. This is so it runs after post-RA scheduling so we can merge s_nops emitted by the scheduler and hazard recognizer. llvm-svn: 268095	2016-04-29 20:23:42 +00:00
Artem Tamazov	38e496b175	Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD." Previously reverted by r267752. r267733 review: Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 268066	2016-04-29 17:04:50 +00:00
Tom Stellard	92b24f324b	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043	2016-04-29 14:34:26 +00:00
Nikolay Haustov	4f672a34ed	AMDGPU/SI: Assembler: Unify parsing/printing of operands. Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015	2016-04-29 09:02:30 +00:00
Matt Arsenault	7d1b6c81af	AMDGPU: Stop reporting an addressing mode for unknown addrspace This was being treated the same as private, which has an immediate offset. For unknown, it probably means it's for a computation not actually being used for accessing memory, so it should not have a nontrivial addressing mode. llvm-svn: 268002	2016-04-29 06:25:10 +00:00
Matt Arsenault	1c4d0efe56	AMDGPU: Emit error if too much LDS is used llvm-svn: 267922	2016-04-28 19:37:35 +00:00
Matt Arsenault	c5fce69031	AMDGPU: Fix mishandling array allocations when promoting alloca The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916	2016-04-28 18:38:48 +00:00
Craig Topper	33772c5375	[CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853	2016-04-28 03:34:31 +00:00
Matt Arsenault	0547b016b1	AMDGPU: Account for globals in AMDGPUPromoteAlloca pass Patch by Bas Nieuwenhuizen llvm-svn: 267791	2016-04-27 21:05:08 +00:00
Chad Rosier	03e1647d19	Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD." This reverts commit r267733 due to a -Werror,-Wunused-function error. llvm-svn: 267752	2016-04-27 18:29:11 +00:00
Reid Kleckner	7f0ae15e9d	Silence a -Wdangling-else llvm-svn: 267737	2016-04-27 16:46:33 +00:00
Artem Tamazov	3896f8f83d	[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD. Added support of TTMP quads. Reworked M0 exclusion machinery for SMRD and similar instructions to enable usage of TTMP registers in those instructions as destinations. Tests added. Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 267733	2016-04-27 16:20:23 +00:00
Nicolai Haehnle	f66bdb5ea8	AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729	2016-04-27 15:46:01 +00:00
Artem Tamazov	5cd55b1784	[AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724	2016-04-27 15:17:03 +00:00
Ahmed Bougacha	128f8732a5	[CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI. Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606	2016-04-26 21:15:30 +00:00
Konstantin Zhuravlyov	71515e57f9	[AMDGPU] Move reserved vgpr count for trap handler usage to SIMachineFunctionInfo + minor commenting changes Differential Revision: http://reviews.llvm.org/D19537 llvm-svn: 267573	2016-04-26 17:24:40 +00:00
Konstantin Zhuravlyov	1d99c4d03c	[AMDGPU] Reserve VGPRs for trap handler usage if instructed Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563	2016-04-26 15:43:14 +00:00
Sam Kolton	3025e7f25f	[AMDGPU] Assembler: basic support for SDWA instructions Support for SDWA instructions for VOP1 and VOP2 encoding. Not done yet: - converters for support optional operands and modifiers - VOPC - sext() modifier - intrinsics - VOP2b (see vop_dpp.s) - V_MAC_F32 (see vop_dpp.s) Differential Revision: http://reviews.llvm.org/D19360 llvm-svn: 267553	2016-04-26 13:33:56 +00:00
Andrew Kaylor	7de74af929	Add optimization bisect opt-in calls for AMDGPU passes Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485	2016-04-25 22:23:44 +00:00
Matt Arsenault	074ea2851c	AMDGPU/SI: Optimize adjacent s_nop instructions Use the operand for how long to wait. This is somewhat distasteful, since it would be better to just emit s_nop with the right argument in the first place. This would require changing TII::insertNoop to emit N operands, which would be easy. Slightly more problematic is the post-RA scheduler and hazard recognizer represent nops as a single null node, and would require inventing another way of representing N nops. llvm-svn: 267456	2016-04-25 19:53:22 +00:00
Matt Arsenault	99c14524ec	AMDGPU: Implement addrspacecast llvm-svn: 267452	2016-04-25 19:27:24 +00:00
Matt Arsenault	48ab526f12	AMDGPU: Add queue ptr intrinsic llvm-svn: 267451	2016-04-25 19:27:18 +00:00
Matt Arsenault	dfaf4261ab	AMDGPU: Add DAG to debug dump Also reorder case to match enum order llvm-svn: 267449	2016-04-25 19:27:09 +00:00
Etienne Bergeron	06c14ec31e	Fix incorrect redundant expression in target AMDGPU. Summary: The expression is detected as a redundant expression. Turn out, this is probably a bug. ``` /home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression] if (isSMRD(FirstLdSt) && isSMRD(FirstLdSt)) { ``` Reviewers: rnk, tstellarAMD Subscribers: arsenm, cfe-commits Differential Revision: http://reviews.llvm.org/D19460 llvm-svn: 267415	2016-04-25 15:06:33 +00:00
Artem Tamazov	d6468666b5	[AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax. Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410	2016-04-25 14:13:51 +00:00
Craig Topper	855d182656	Fix a couple assertions that can never fire because they just contained the text string which always evaluates to true. Add a ! so they'll evaluate to false. llvm-svn: 267312	2016-04-24 02:01:25 +00:00
Matt Arsenault	7e8de01f84	AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size llvm-svn: 267244	2016-04-22 22:59:16 +00:00
Matt Arsenault	efa3fe14d1	AMDGPU: Re-visit nodes in performAndCombine This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240	2016-04-22 22:48:38 +00:00
Konstantin Zhuravlyov	a40d8358e7	[AMDGPU] Insert nop pass: take care of outstanding feedback - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167	2016-04-22 17:04:51 +00:00
Nicolai Haehnle	b0c9748709	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102	2016-04-22 04:04:08 +00:00
Matt Arsenault	98f8394e7c	AMDGPU: Fix debug name of pass to better match I get this wrong every time I try to debug this. llvm-svn: 267030	2016-04-21 18:21:54 +00:00
Nicolai Haehnle	97788020c5	Split IntrReadArgMem into IntrReadMem and IntrArgMemOnly Summary: IntrReadWriteArgMem simply becomes IntrArgMemOnly. So there are fewer intrinsic properties that express their orthogonality better, and correspond more closely to the corresponding IR attributes. Suggested by: Philip Reames Reviewers: joker.eph, reames, tstellarAMD Subscribers: jholewinski, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19291 llvm-svn: 267021	2016-04-21 17:48:02 +00:00
Sam Kolton	201398e8a3	[AMDGPU] Assembler: prevent parseDPPCtrlOps from eating invalid tokens Reviewers: nhaustov, tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19317 llvm-svn: 266984	2016-04-21 13:14:24 +00:00
Nikolay Haustov	fb5c307ccd	AMDGPU/SI: Assembler: improvements to support trap handlers. Add ParseAMDGPURegister which can be invoked recursively for parsing lists. Rename getRegForName to getSpecialRegForName. Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi]. Add 64-bit registers TBA, TMA where missing. Add some tests. Differential Revision: http://reviews.llvm.org/D19163 llvm-svn: 266865	2016-04-20 09:34:48 +00:00
Nicolai Haehnle	b48275f134	Add IntrWrite[Arg]Mem intrinsic property Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826	2016-04-19 21:58:33 +00:00
Nicolai Haehnle	e2dda4f750	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Nicolai Haehnle	7483937bf0	AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi Summary: A shader stored the live mask (initial exec mask) in an SGPR which was then spilled during register allocation. The allocator quite reasonably optimized turned the spill into v_writelane_b32 %vgpr, exec_lo, N v_writelane_b32 %vgpr, exec_hi, N+1 at the beginning of the shader, confusing the SGPR accounting. No test case, because si-sgpr-spill.ll together with an upcoming patch for WQM handling exhibits the problem. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19199 llvm-svn: 266824	2016-04-19 21:58:17 +00:00
Konstantin Zhuravlyov	8c273ad719	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626	2016-04-18 16:28:23 +00:00
Artem Tamazov	e2762423c2	[AMDGPU][llvm-mc] s_setreg* - Fix order of operands Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first. Differential Revision: http://reviews.llvm.org/D19161 llvm-svn: 266617	2016-04-18 14:54:26 +00:00
Aaron Ballman	2eeefe8ed8	Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616	2016-04-18 14:47:19 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Matt Arsenault	c10783c42d	AMDGPU: Enable LocalStackSlotAllocation pass This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508	2016-04-16 02:13:37 +00:00
Matt Arsenault	b6be202779	AMDGPU: Use s_addk_i32 / s_mulk_i32 llvm-svn: 266506	2016-04-16 01:46:49 +00:00
Jun Bum Lim	4c5bd58ebe	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Nicolai Haehnle	750082d1fe	AMDGPU/SI: Fix regression with no-return atomics Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433	2016-04-15 14:42:36 +00:00
Matt Arsenault	9c499c3a74	AMDGPU: Remove custom load/store scalarization llvm-svn: 266385	2016-04-14 23:31:26 +00:00
Matt Arsenault	fd8ab09c0e	AMDGPU: Include LDS size in printed comment llvm-svn: 266382	2016-04-14 22:11:51 +00:00
Matt Arsenault	3d1c1deb04	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Matt Arsenault	4ac341c8b3	AMDGPU: Directly emit m0 initialization with s_mov_b32 Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377	2016-04-14 21:58:15 +00:00
Matt Arsenault	7900334dd5	AMDGPU: Fold bitcasts of scalar constants to vectors This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376	2016-04-14 21:58:07 +00:00
Tom Stellard	000c5af3e6	AMDGPU: Add skeleton GlobalIsel implementation Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356	2016-04-14 19:09:28 +00:00
Nicolai Haehnle	05b127da06	[StructurizeCFG] Annotate branches that were treated as uniform Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346	2016-04-14 17:42:35 +00:00
Nicolai Haehnle	723b73b4eb	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345	2016-04-14 17:42:29 +00:00
Nicolai Haehnle	19f0f5177d	AMDGPU: change a redundant if () to an assert(). NFC Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344	2016-04-14 17:42:18 +00:00
Tom Stellard	79a1fd718c	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Tom Stellard	f110f8f9f7	AMDGPU/SI: Use the correct scratch wave offset register for shaders. Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336	2016-04-14 16:27:03 +00:00
Matt Arsenault	9cd90712f0	AMDGPU: Implement canonicalize Also add generic DAG node for it. llvm-svn: 266272	2016-04-14 01:42:16 +00:00
Tom Stellard	b951f10701	AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244	2016-04-13 20:44:16 +00:00
Artem Tamazov	eb4d5a9b0b	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205	2016-04-13 16:18:41 +00:00
Tom Stellard	703b2ec43f	AMDGPU/SI: Fix spilling of 96-bit registers Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152	2016-04-12 23:57:30 +00:00
Nicolai Haehnle	df77c9ada4	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126	2016-04-12 21:18:10 +00:00
Tom Stellard	ab1d3a9d50	AMDGPU/SI: Insert wait states required after v_readfirstlane on SI Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105	2016-04-12 18:40:43 +00:00
Matt Arsenault	3b08238f78	AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32 This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104	2016-04-12 18:24:38 +00:00
Nicolai Haehnle	279970c0dc	AMDGPU/SI: Fix a mis-compilation of multi-level breaks Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088	2016-04-12 16:10:38 +00:00
Matt Arsenault	64fa2f4513	AMDGPU: Implement i64 global atomics llvm-svn: 266075	2016-04-12 14:05:11 +00:00
Matt Arsenault	a9dbdcae04	AMDGPU: Add atomic_inc + atomic_dec intrinsics These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074	2016-04-12 14:05:04 +00:00
Matt Arsenault	21ecfe43ba	AMDGPU: Remove trailing whitespace llvm-svn: 266073	2016-04-12 14:04:54 +00:00
Tom Stellard	0ffdf65eaa	Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute" This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992	2016-04-11 20:38:40 +00:00
Jan Vesely	43b7b5b846	AMDGPU/SI: Implement atomic load/store for i32 and i64 Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709	2016-04-07 19:23:11 +00:00
Tom Stellard	9112758077	AMDGPU/SI: Add latency for export instructions Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708	2016-04-07 18:30:05 +00:00
Tom Stellard	d37630e461	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678	2016-04-07 14:47:07 +00:00
Valery Pykhtin	e23b6deb01	[AMDGPU] fix readlane/readfirstlane src vgpr operand type. For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand). readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding). Differential Revision: http://reviews.llvm.org/D18696 llvm-svn: 265670	2016-04-07 13:41:51 +00:00
Benjamin Kramer	4fb78518f1	Make helper functions static. NFC. llvm-svn: 265653	2016-04-07 10:10:09 +00:00
Nicolai Haehnle	df3a20cd80	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Sam Kolton	ff90c60a78	[AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp. Summary: 1. Disable DPP encoding for instructions that do not support it: - VOP1: - v_readfirstlane_b32 - v_clrexcp - v_movreld_b32 - v_movrels_b32 - v_movrelsd_b32 - VOP2: - v_madmk_f16/32 - v_madak_f16/32 - VOPC, VINTRP, VOP3 2. Fix DPP for v_nop 3. New DPP tests for VOP1 and VOP2 instructions Reviewers: nhaustov, tstellarAMD, vpykhtin Subscribers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D18552 llvm-svn: 265538	2016-04-06 13:29:59 +00:00
Matthias Braun	7dc03f060e	RegisterScavenger: Take a reference as enterBasicBlock() argument. Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511	2016-04-06 02:47:09 +00:00
Konstantin Zhuravlyov	e63e02cb0c	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408	2016-04-05 16:00:58 +00:00
Tom Stellard	354a43c7bc	AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2} Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170	2016-04-01 18:27:37 +00:00
Valery Pykhtin	5b3559c1ec	[AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields. $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141	2016-04-01 13:13:12 +00:00
Sam Kolton	1048fb1818	[AMDGPU] Disassembler: support for DPP Review: http://reviews.llvm.org/D18642 llvm-svn: 265015	2016-03-31 14:15:04 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Konstantin Zhuravlyov	ecc7cbf611	Test commit access llvm-svn: 264736	2016-03-29 15:15:44 +00:00
Tom Stellard	a76bcc2ea1	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Justin Bogner	f2a0d349a6	AMDGPU: Fix a use-after free and a missing break We're erasing MI here, but then immediately using it again inside the `if`. This moves the erase after we're done using it. Doing that reveals a second problem though - this case is missing a break, so we fall through to the default and dereference MI again. This is obviously a bug, though I don't know how to write a test that triggers it - all we do in the error case is print some extra debug output. Both of these issue crash on lots of tests under ASAN with the recycling allocator changes from PR26808 applied. llvm-svn: 264442	2016-03-25 18:33:16 +00:00
Matt Arsenault	8c8fcb2585	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376	2016-03-25 01:16:40 +00:00
Matt Arsenault	9651813ee0	AMDGPU: Partially implement getArithmeticInstrCost for FP ops llvm-svn: 264374	2016-03-25 01:00:32 +00:00
Matt Arsenault	59767cea79	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. llvm-svn: 264364	2016-03-25 00:14:11 +00:00
Tom Stellard	9babad25e5	AMDGPU/SI: Add Polaris support Patch By: Sonny Jiang llvm-svn: 264295	2016-03-24 15:31:05 +00:00
Vasileios Kalintiris	b8a37205d2	Fix sequence point warning. NFC. llvm-svn: 264255	2016-03-24 10:53:28 +00:00
Matt Arsenault	30d37a74da	AMDGPU: Remove atomic inc/dec patterns There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max llvm-svn: 264215	2016-03-23 23:23:38 +00:00
Matt Arsenault	0a30e456b4	AMDGPU: Promote alloca should skip volatiles llvm-svn: 264214	2016-03-23 23:17:29 +00:00
Matt Arsenault	f43c2a0b49	AMDGPU: Insert moves of frame index to value operands Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200	2016-03-23 21:49:25 +00:00
Valery Pykhtin	c0a77c5064	[AMDGPU] Fix missing assembler predicates. Differential Revision: http://reviews.llvm.org/D18351 llvm-svn: 264137	2016-03-23 04:27:26 +00:00
Tom Stellard	52ecd2d69b	AMDGPU: Cache information about register pressure sets We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126	2016-03-23 01:53:22 +00:00
Nicolai Haehnle	0a33abdfd2	AMDGPU: Fix dangling references introduced by r263982 Fixes Valgrind errors on the test cases that were reported as failing by buildbots. llvm-svn: 264000	2016-03-21 22:54:02 +00:00
Nicolai Haehnle	a56e6b6a53	AMDGPU: Coding style fixes I meant to add these before committing r263982 as per the review, but I forgot to squash. llvm-svn: 263983	2016-03-21 20:39:24 +00:00
Nicolai Haehnle	213e87f2ee	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982	2016-03-21 20:28:33 +00:00
Tom Stellard	92339e888f	AMDGPU/SI: Fix threshold calculation for branching when exec is zero Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969	2016-03-21 18:56:58 +00:00
Matt Arsenault	cb38a6bd35	AMDGPU: Remove SignBitIsZero for mubuf scratch offsets These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964	2016-03-21 18:02:18 +00:00
Matt Arsenault	b96b57347a	AMDGPU: Add frexp_mant intrinsic llvm-svn: 263948	2016-03-21 16:11:05 +00:00
Nicolai Haehnle	fa771811b3	AMDGPU: add missing braces around multi-line if block This fixes an issue with rL263658 pointed out by Tom Stellard. llvm-svn: 263823	2016-03-18 20:32:04 +00:00
Nicolai Haehnle	95e8ffd398	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 llvm-svn: 263792	2016-03-18 16:24:40 +00:00
Nicolai Haehnle	ad63638f6d	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	3003ba00a3	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790	2016-03-18 16:24:20 +00:00
Sam Kolton	a74cd526e9	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 Review: http://reviews.llvm.org/D18267 llvm-svn: 263789	2016-03-18 15:35:51 +00:00
Changpeng Fang	234fcb81d3	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 llvm-svn: 263720	2016-03-17 16:43:50 +00:00
Nicolai Haehnle	79cad857a0	AMDGPU: mark atomic instructions as sources of divergence Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719	2016-03-17 16:21:59 +00:00
Nicolai Haehnle	ef160de3e5	AMDGPU: Prevent uniform loops from becoming infinite Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658	2016-03-16 20:14:33 +00:00
Michel Danzer	302f83ac4e	AMDGPU: Verify instructions in non-debug builds as well And emit an error if it fails. This prevents illegal instructions from getting sent to the GPU, which would potentially result in a hang. This is a candidate for the stable branch(es). Reviewed-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 263627	2016-03-16 09:10:42 +00:00
Michel Danzer	beb79ceb19	AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626	2016-03-16 09:10:35 +00:00
Changpeng Fang	01f6062227	AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 llvm-svn: 263563	2016-03-15 17:28:44 +00:00
Sanjay Patel	5719584129	[DAG] use isUndef() ; NFCI llvm-svn: 263448	2016-03-14 17:28:46 +00:00
Tom Stellard	331f981cc9	AMDGPU/SI: Handle wait states required for DPP instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447	2016-03-14 17:05:56 +00:00
Marek Olsak	ed2213e6ef	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441	2016-03-14 15:57:14 +00:00
Nicolai Haehnle	74127fe8d7	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440	2016-03-14 15:37:18 +00:00
Nikolay Haustov	79af6b33e0	[AMDGPU] Assembler: SOP* instruction fixes s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit. s_rfe_b64 has just one destination operand and no source. Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that. Add s_memrealtime test and change comments in smem.s to follow common style. Change test for s_memtime to use non-zero register to make it really test encoding. Add tests for s_buffer_load*. Add tests for SOPC instructions (same for SI and VI) Differential Revision: http://reviews.llvm.org/D18040 llvm-svn: 263420	2016-03-14 11:17:19 +00:00
Valery Pykhtin	0f97f17152	[AMDGPU] AsmParser: Factor out parseRegister. NFC. llvm-svn: 263411	2016-03-14 07:43:42 +00:00
Valery Pykhtin	9e33c7f5d3	[AMDGPU] AsmParser: refactor post push_back vector access. NFC. llvm-svn: 263409	2016-03-14 05:25:44 +00:00
Valery Pykhtin	f91911c3ae	[AMDGPU] AsmParser: remove redundant isReg checks. NFC. llvm-svn: 263407	2016-03-14 05:01:45 +00:00
Valery Pykhtin	a7f480b4e9	[AMDGPU] Fix VOPC instruction operand namings Differential Revision: http://reviews.llvm.org/D17966 llvm-svn: 263242	2016-03-11 14:53:28 +00:00
Nikolay Haustov	6560781c4f	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212	2016-03-11 09:27:25 +00:00
Matt Arsenault	bafc9dc591	AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca Frontend authors are strongly encouraged to keep allocas in the entry block, so don't bother visiting every instruction in the other blocks of the function. llvm-svn: 263206	2016-03-11 08:20:50 +00:00
Matt Arsenault	6b6a2c37bc	AMDGPU: R600 code splitting cleanup Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204	2016-03-11 08:00:27 +00:00
Matt Arsenault	9a19c240c0	AMDGPU: Materialize sign bits with bfrev If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201	2016-03-11 07:42:49 +00:00
Nicolai Haehnle	b142770bfe	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140	2016-03-10 18:43:50 +00:00
Changpeng Fang	278a5b31a5	AMDGPU/SI: Define S_GETREG Intrinsic Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 llvm-svn: 263124	2016-03-10 16:47:15 +00:00
Valery Pykhtin	a4db224d54	[AMDGPU] Fix SMEM instructions encoding/operand namings Differential Revision: http://reviews.llvm.org/D17651 llvm-svn: 263108	2016-03-10 13:06:08 +00:00
Chad Rosier	c27a18f39f	[TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC. http://reviews.llvm.org/D17967 llvm-svn: 263021	2016-03-09 16:00:35 +00:00
Teresa Johnson	e50b23c67f	Fix build error due to unsigned compare >= 0 in r263008 (NFC) Fixes error from building with clang: /usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12: error: comparison of unsigned expression >= 0 is always true [-Werror,-Wtautological-compare] if ((Imm >= 0x000) && (Imm <= 0x0ff)) { ~~~ ^ ~~~~~ llvm-svn: 263014	2016-03-09 14:58:23 +00:00
Sam Kolton	dfa29f7c5b	[AMDGPU] Assembler: Support DPP instructions. Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 llvm-svn: 263008	2016-03-09 12:29:31 +00:00
Nikolay Haustov	9b7577ed22	[AMDGPU] Assembler: Support abs() syntax. Support legacy SP3 abs(v1) syntax. InstPrinter still uses \|v1\|. Add tests. Differential Revision: http://reviews.llvm.org/D17887 llvm-svn: 263006	2016-03-09 11:03:21 +00:00
Nikolay Haustov	8e3f099497	[AMDGPU] Assembler: Fix s_setpc_b64 s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to. Differential Revision: http://reviews.llvm.org/D17888 llvm-svn: 263005	2016-03-09 10:56:19 +00:00
Matt Arsenault	c89f2919a4	AMDGPU: Match more med3 integer patterns llvm-svn: 262864	2016-03-07 21:54:48 +00:00
Matt Arsenault	56356c8a9c	AMDGPU: Remove a fixme for ptrrtoint handling llvm-svn: 262854	2016-03-07 21:12:46 +00:00
Matt Arsenault	81d06015c6	AMDGPU: Move function only used by R600 llvm-svn: 262853	2016-03-07 21:10:13 +00:00
Valery Pykhtin	dc11054f20	[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler. Engages code from r262804. Differential Revision: http://reviews.llvm.org/D17151 llvm-svn: 262808	2016-03-06 20:25:36 +00:00
Valery Pykhtin	50cd3c4ec7	fix sanitizer-ppc64be-linux failure for r262804 error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move] http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/930 llvm-svn: 262805	2016-03-06 15:13:54 +00:00
Valery Pykhtin	499a5c6323	[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields Differential Revision: http://reviews.llvm.org/D17150 llvm-svn: 262804	2016-03-06 13:27:13 +00:00
Valery Pykhtin	0c6293da68	[AMDGPU] SOPxx instructions operand naming fixed in td files. dst -> sdst ssrcN -> srcN Differential Revision: http://reviews.llvm.org/D17646 llvm-svn: 262801	2016-03-06 10:31:44 +00:00
Tom Stellard	649b5db557	AMDGPU/SI: Add support for spiling SGPRs to scratch buffer Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732	2016-03-04 18:31:18 +00:00
Tom Stellard	ebef6f9771	AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter Summary: This allows us to use virtual registers when we need extra registers for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex(). Once all the frame indices have been eliminated, the PrologEpilogueInserter does an extra pass over the program to replace all virtual registers with physical ones. This allows us to make more efficient use of our emergency spill slots, so we only need to create one. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17591 llvm-svn: 262728	2016-03-04 18:02:01 +00:00
Sam Kolton	f51f4b8370	Test commit access llvm-svn: 262714	2016-03-04 12:29:14 +00:00
Valery Pykhtin	824e804bf6	test commit llvm-svn: 262709	2016-03-04 10:59:50 +00:00
Nikolay Haustov	5bf46ac150	AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. Initial change by Nicolai H.hnle Differential Revision: http://reviews.llvm.org/D17401 llvm-svn: 262701	2016-03-04 10:39:50 +00:00
Tom Stellard	cc7067a668	AMDGPU: Insert two S_NOP instructions for every high level source statement. Patch by: Konstantin Zhuravlyov Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input. Reviewers: tstellarAMD, arsenm Subscribers: echristo, dblaikie, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17454 llvm-svn: 262579	2016-03-03 03:53:29 +00:00
Tom Stellard	600ca6fd39	AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs Summary: When there were no free SGPRs, we were trying to move this value into some of the reserved registers which was causing a segmentation fault. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17590 llvm-svn: 262577	2016-03-03 03:45:09 +00:00
Matt Arsenault	8226fc4829	AMDGPU: Simplify boolean conditional return statements Patch by Richard Thomson llvm-svn: 262536	2016-03-02 23:00:21 +00:00
Nikolay Haustov	f2fbabe9c1	Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields" Build failure with clang. llvm-svn: 262477	2016-03-02 11:16:56 +00:00
Nikolay Haustov	f0f24628cb	Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler." Build failure with clang. llvm-svn: 262475	2016-03-02 10:54:21 +00:00
Nikolay Haustov	73447a9714	[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler. complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17151 llvm-svn: 262474	2016-03-02 10:36:30 +00:00
Nikolay Haustov	6c8c74969a	[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields This is going to be used in .hsatext disassembler and can be used in current assembler parser (lit tests passed on parsing). Code using this helpers isn't included in this patch. Benefits: unified approach fast field name lookup on parsing Later I would like to enhance some of the field naming/syntax using this code. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17150 llvm-svn: 262473	2016-03-02 10:36:25 +00:00
Matt Arsenault	f2dcb4737b	AMDGPU: Fix bug 26659. Fix checking the same instruction twice instead of the second branch that uses vccz. I don't think this matters currently because s_branch_vccnz is always used currently. llvm-svn: 262457	2016-03-02 04:12:39 +00:00
Matt Arsenault	a266bd8760	AMDGPU: Cleanup suggested in bug 23960 llvm-svn: 262456	2016-03-02 04:05:14 +00:00
Matt Arsenault	5de68cbc4c	Bug 20810: Use report_fatal_error instead of unreachable llvm-svn: 262455	2016-03-02 03:33:55 +00:00
Matthias Braun	17cb57995e	TableGen: Check scheduling models for completeness TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384	2016-03-01 20:03:21 +00:00
Changpeng Fang	24f035af32	AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356	2016-03-01 17:51:23 +00:00
Nikolay Haustov	e309e1415d	[AMDGPU] Remove unused disassembler code. llvm-svn: 262346	2016-03-01 16:02:40 +00:00
Nikolay Haustov	47a115cd41	[AMDGPU] Fix build warnings. llvm-svn: 262338	2016-03-01 14:50:59 +00:00
Nikolay Haustov	ac106add0f	[AMDGPU] Disassembler code refactored + error messages. Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out. 64bit instructions decoding added. Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code. The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel. Previous TODOs were saved whenever possible. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17720 llvm-svn: 262332	2016-03-01 13:57:29 +00:00
Nikolay Haustov	ea8febde04	[TableGen] AsmMatcher: Skip optional operands in the midle of instruction if it is not present Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented. For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString: string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod"; Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod). Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal. Patch by: Sam Kolton Review: http://reviews.llvm.org/D17568 [AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods With this change you should place optional operands in order specified by asm string: clamp -> omod offset -> glc -> slc -> tfe Fixes for several tests. Depends on D17568 Patch by: Sam Kolton Review: http://reviews.llvm.org/D17644 llvm-svn: 262314	2016-03-01 08:34:43 +00:00
Matt Arsenault	d275fcabcb	AMDGPU: Don't emit build_pair during udivrem legalization Technically you aren't supposed to emit these after type legalization for some reason, and we use vector extracts of bitcasted integers as the canonical way to do this. llvm-svn: 262298	2016-03-01 05:06:05 +00:00
Matt Arsenault	f4dfc1a027	AMDGPU: Don't use estimated stack size when we know the real stack size llvm-svn: 262297	2016-03-01 04:58:20 +00:00
Matt Arsenault	59b8b77405	AMDGPU: Set HasExtractBitInsn This currently does not have the control over the bitwidth, and there are missing optimizations to reduce the integer to 32-bit if it can be. But in most situations we do want the sinking to occur. llvm-svn: 262296	2016-03-01 04:58:17 +00:00
Matt Arsenault	3a61985b2f	AMDGPU: More bits of frame index are known to be zero The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153	2016-02-27 20:26:57 +00:00
Duncan P. N. Exon Smith	be8f8c4478	CodeGen: Update LiveIntervalAnalysis API to use MachineInstr&, NFC These parameters aren't expected to be null, so take them by reference. llvm-svn: 262151	2016-02-27 20:14:29 +00:00
Duncan P. N. Exon Smith	5702287809	CodeGen: Update DFAPacketizer API to take MachineInstr&, NFC In all but one case, change the DFAPacketizer API to take MachineInstr& instead of MachineInstr*. In DFAPacketizer::endPacket(), take MachineBasicBlock::iterator. Besides cleaning up the API, this is in search of PR26753. llvm-svn: 262142	2016-02-27 19:09:00 +00:00
Matt Arsenault	9d82ee7526	AMDGPU: Split vi-insts subtarget feature This will be more useful for marking builtins acceptable for which subtargets. llvm-svn: 262121	2016-02-27 08:53:55 +00:00
Matt Arsenault	274d34e725	AMDGPU: Add s_sleep intrinsic llvm-svn: 262120	2016-02-27 08:53:52 +00:00
Matt Arsenault	61738cbcb6	AMDGPU: Implement readcyclecounter This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119	2016-02-27 08:53:46 +00:00
Duncan P. N. Exon Smith	3ac9cc6156	CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are never null, so this cleans up the API a bit. It also incidentally removes a few implicit conversions from MachineInstrBundleIterator to MachineInstr* (see PR26753). At a couple of call sites it was convenient to convert to a range-based for loop over MachineBasicBlock::instr_begin/instr_end, so I added MachineBasicBlock::instrs. llvm-svn: 262115	2016-02-27 06:40:41 +00:00
Nikolay Haustov	2f684f1347	[AMDGPU] Assembler: Basic support for MIMG Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 llvm-svn: 261995	2016-02-26 09:51:05 +00:00
Nikolay Haustov	161a158e5c	[AMDGPU] Disassembler: Support for all VOP1 instructions. Support all instructions with VOP1 encoding with 32 or 64-bit operands for VI subtarget: VGPR_32 and VReg_64 operand register classes VS_32 and VS_64 operand register classes with inline and literal constants Tests for VOP1 instructions. Patch by: skolton Reviewers: arsenm, tstellarAMD Review: http://reviews.llvm.org/D17194 llvm-svn: 261878	2016-02-25 16:09:14 +00:00
Nikolay Haustov	2e4c72977c	[AMDGPU] Assembler: Simplify handling of optional operands Resubmit with index problem fixed. Verified with valgrind. Prepare to support DPP encodings. For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead. Mark more operands as IsOptional = 1 in .td files. Do not add default values for optional operands to OperandVector in AMDGPUAsmParser. Add default values for optional operands during conversion using new helper addOptionalImmOperand. Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one. Separate cvtFlat and cvtFlatAtomic. Fix CNDMASK_B32 definition to have no modifiers. Review: http://reviews.llvm.org/D17445 llvm-svn: 261856	2016-02-25 10:58:54 +00:00
NAKAMURA Takumi	3d3d0f4151	Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands" It brought undefined behavior. llvm-svn: 261839	2016-02-25 08:35:27 +00:00
Nikolay Haustov	4f073ca7fa	[AMDGPU] Assembler: Simplify handling of optional operands Prepare to support DPP encodings. For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead. Mark more operands as IsOptional = 1 in .td files. Do not add default values for optional operands to OperandVector in AMDGPUAsmParser. Add default values for optional operands during conversion using new helper addOptionalImmOperand. Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one. Separate cvtFlat and cvtFlatAtomic. Fix CNDMASK_B32 definition to have no modifiers. Review: http://reviews.llvm.org/D17445 Reviewers: tstellarAMD llvm-svn: 261742	2016-02-24 14:22:47 +00:00
Nikolay Haustov	3c728e43aa	[AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields) lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t. Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com) Reviewers: arsenm llvm-svn: 261732	2016-02-24 10:54:25 +00:00
Matt Arsenault	cd09961fb3	AMDGPU: Check cheaper condition before SignBitIsZero Don't do an expensive computeKnownBits call when we can do the cheap check for legal offsets first. llvm-svn: 261720	2016-02-24 04:55:29 +00:00
Nikolay Haustov	2a62b3c244	[AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64 src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64). src0 and src1 of s_bfm_b64 are 32-bit. Update tests. Review: http://reviews.llvm.org/D17480 Reviewers: arsenm llvm-svn: 261621	2016-02-23 09:19:14 +00:00
Duncan P. N. Exon Smith	6307eb5518	CodeGen: TII: Take MachineInstr& in predicate API, NFC Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605	2016-02-23 02:46:52 +00:00
Duncan P. N. Exon Smith	d84f600653	CodeGen: Bring back MachineBasicBlock::iterator::getInstrIterator()... This is a little embarrassing. When I reverted r261504 (getIterator() => getInstrIterator()) in r261567, I did a `git grep` to see if there were new calls to `getInstrIterator()` that I needed to migrate. There were 10-20 hits, and I blindly did a `sed ...` before calling `ninja check`. However, these were `MachineInstrBundleIterator::getInstrIterator()`, which predated r261567. Perhaps coincidentally, these had an identical name and return type. This commit undoes my careless sed and restores `MachineBasicBlock::iterator::getInstrIterator()`. llvm-svn: 261577	2016-02-22 21:30:15 +00:00
Matt Arsenault	fa67bdbde0	AMDGPU/R600: Implement allowsMisalignedMemoryAccess This avoids some test regressions in a future commit when unaligned operations are expanded when they have custom lowering. llvm-svn: 261570	2016-02-22 21:04:16 +00:00
Duncan P. N. Exon Smith	c5b668deb8	Revert "CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC" This reverts commit r261504, since it's not obvious the new name is better: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160222/334298.html I'll recommit if we get consensus that it's the right direction. llvm-svn: 261567	2016-02-22 20:49:58 +00:00
Tom Stellard	d93a34f714	[AMDGPU][llvm-mc] Support for 32-bit inline literals Patch by: Artem Tamazov Summary: Note: Support for 64-bit inline literals TBD Added: Support of abs/neg modifiers for literals (incomplete; parsing TBD). Added: Some TODO comments. Reworked/clarity: rename isInlineImm() to isInlinableImm() Reworked/robustness: disallow BitsToFloat() with undefined value in isInlinableImm() Reworked/reuse: isSSrc32/64(), isVSrc32/64() Tests added. Reviewers: tstellarAMD, arsenm Subscribers: vpykhtin, nhaustov, SamWot, arsenm Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D17204 llvm-svn: 261559	2016-02-22 19:17:56 +00:00
Tom Stellard	82fc153baa	[AMDGPU] [llvm-mc] [VI] Fix encoding of LDS/GDS instructions. Patch by: Artem Tamazov Summary: Tests added. Reviewers: tstellarAMD, arsenm Subscribers: vpykhtin, SamWot, #llvm-amdgpu-spb Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D17271 llvm-svn: 261558	2016-02-22 19:17:53 +00:00
Duncan P. N. Exon Smith	dc0848c029	CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC Delete MachineInstr::getIterator(), since the term "iterator" is overloaded when talking about MachineInstr. - Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so that ilist_node::getIterator() is still available. - Add it back as MachineInstr::getInstrIterator(). This matches the naming in MachineBasicBlock. - Add MachineInstr::getBundleIterator(). This is explicitly called "bundle" (not matching MachineBasicBlock) to disintinguish it clearly from ilist_node::getIterator(). - Update all calls. Some of these I switched to `auto` to remove boiler-plate, since the new name is clear about the type. There was one call I updated that looked fishy, but it wasn't clear what the right answer was. This was in X86FrameLowering::inlineStackProbe(), added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to leave the behaviour unchanged, but I'll reply to the original commit on the list in a moment. llvm-svn: 261504	2016-02-21 22:58:35 +00:00
Tom Stellard	467b5b9024	AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385	2016-02-20 00:37:25 +00:00
Tom Stellard	2d26fe7aa6	AMDGPU/SI: Fix s_waitcnt insertion for flat instructions Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330	2016-02-19 15:33:13 +00:00
Nicolai Haehnle	f2c64db55a	AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224	2016-02-18 16:44:18 +00:00
Nikolay Haustov	10813a4efa	Test commit access. llvm-svn: 261199	2016-02-18 10:02:12 +00:00
Tom Stellard	e1818af8c5	[AMDGPU] Disassembler: Added basic disassembler for AMDGPU target Changes: - Added disassembler project - Fixed all decoding conflicts in .td files - Added DecoderMethod=“NONE” option to Target.td that allows to disable decoder generation for an instruction. - Created decoding functions for VS_32 and VReg_32 register classes. - Added stubs for decoding all register classes. - Added several tests for disassembler Disassembler only supports: - VI subtarget - VOP1 instruction encoding - 32-bit register operands and inline constants [Valery] One of the point that requires to pay attention to is how decoder conflicts were resolved: - Groups of target instructions were separated by using different DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate approach. - There were conflicts in IMAGE_<> instructions caused by two different reasons: 1. dmask wasn’t specified for the output (fixed) 2. There are image instructions that differ only by the number of the address components but have the same encoding by the HW spec. The actual number of address components is determined by the HW at runtime using image resource descriptor starting from the VGPR encoded in an IMAGE instruction. This means that we should choose only one instruction from conflicting group to be the rule for decoder. I didn’t find the way to disable decoder generation for an arbitrary instruction and therefore made a onelinear fix to tablegen generator that would suppress decoder generation when DecoderMethod is set to “NONE”. This is a change that should be reviewed and submitted first. Otherwise I would need to specify different DecoderNamespace for every instruction in the conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not used in other targets. 3. IMAGE_GATHER decoder generation is for now disabled and to be done later. [/Valery] Patch By: Sam Kolton Differential Revision: http://reviews.llvm.org/D16723 llvm-svn: 261185	2016-02-18 03:42:32 +00:00
Tom Stellard	cc4c8718ed	[AMDGPU] Rename $dst operand to $vdst for VOP instructions. Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986	2016-02-16 18:14:56 +00:00
Matt Arsenault	f2ddbf00ed	AMDGPU: Prepare for reducing private element size. Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804	2016-02-13 04:18:53 +00:00
Tom Stellard	4409051d00	AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792	2016-02-13 02:09:49 +00:00
Matt Arsenault	d2759212b8	AMDGPU: Cleanup includes and random macros llvm-svn: 260784	2016-02-13 01:24:08 +00:00
Matt Arsenault	ce56a0ef54	AMDGPU: Add intrinsics for sin/cos These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782	2016-02-13 01:19:56 +00:00
Matt Arsenault	79963e80b8	AMDGPU: Rename intrinsic to better match instruction name Also fixes missing f32 test. llvm-svn: 260780	2016-02-13 01:03:00 +00:00
Tom Stellard	607051640c	AMDGPU/SI: Add instruction defs for VOP1 DPP instructions Reviewers: nhaustov, cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17159 llvm-svn: 260774	2016-02-13 00:51:31 +00:00
Matt Arsenault	16f48d7c55	AMDGPU: Fix broken condition causing warning llvm-svn: 260773	2016-02-13 00:36:10 +00:00
Tom Stellard	bc4497b13c	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765	2016-02-12 23:45:29 +00:00
Tom Stellard	46937ca4e7	[AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694	2016-02-12 17:57:54 +00:00
Changpeng Fang	e07f1aa8fa	AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass. Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692	2016-02-12 17:11:04 +00:00
Benjamin Kramer	ac5e36f52e	Fix uninitialized memory read. Found by msan. llvm-svn: 260676	2016-02-12 12:37:21 +00:00
Matt Arsenault	296b849163	AMDGPU: Set flat_scratch from flat_scratch_init reg This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658	2016-02-12 06:31:30 +00:00
Matt Arsenault	24ee0785dd	AMDGPU: Set element_size in private resource descriptor Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651	2016-02-12 02:40:47 +00:00
Matt Arsenault	16f7bcb661	AMDGPU: Fix mishandling alignment when scalarizing vector loads/stores I don't think this was causing any real problems, so I'm not sure how to test for this. llvm-svn: 260646	2016-02-12 02:22:21 +00:00
Matt Arsenault	55d49cfe2d	AMDGPU: Initialize SILowerControlFlow llvm-svn: 260645	2016-02-12 02:16:10 +00:00
Matt Arsenault	806dd0a532	AMDGPU: Remove trailing whitespace llvm-svn: 260644	2016-02-12 02:16:07 +00:00
Tom Stellard	1397d49ef5	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599	2016-02-11 21:45:07 +00:00
Tom Stellard	fa8f204c5b	AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist Summary: When we split SMRD instructions into two MUBUFs we were adding the users of the newly created MUBUFs to the VALU worklist. However, the only users these instructions had was the REG_SEQUENCE that was inserted by splitSMRD when the original SMRD instruction was split. We need to make sure to add the users of the original SMRD to the VALU worklist before it is split. I have a test case, but it requires one other bug fix, so it will be added in a later commt. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17101 llvm-svn: 260588	2016-02-11 21:14:34 +00:00
Tom Stellard	e993451f5c	[AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing Summary: Added support for "VOP3Only" attribute in VOP3bInst encoding. Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns. Added support for multi-dest instructions in AMDGPUAs::cvt*(). Added lit test for "V_DIV_SCALE_F64\|F32 vreg,vcc\|sreg,vreg,vreg,vreg". Reviewers: tstellarAMD, arsenm Subscribers: arsenm, SamWot, nhaustov, vpykhtin Differential Revision: http://reviews.llvm.org/D16995 Patch By: Artem Tamazov llvm-svn: 260560	2016-02-11 18:25:26 +00:00
Matt Arsenault	fcb345f172	AMDGPU: Fix constant bus use check with subregisters If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495	2016-02-11 06:15:39 +00:00
Matt Arsenault	427c548925	AMDGPU: Fix passes depending on dominator tree for no reason llvm-svn: 260494	2016-02-11 06:15:34 +00:00
Matt Arsenault	fe26def35c	AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis llvm-svn: 260491	2016-02-11 05:32:51 +00:00
Matt Arsenault	9524566314	AMDGPU: Split R600 and SI store lowering These were only sharing some somewhat incorrect logic for when to scalarize or split vectors. llvm-svn: 260490	2016-02-11 05:32:46 +00:00
Tom Stellard	a90b9526df	[AMDGPU] Assembler: Fix VOP3 only instructions Separate methods to convert parsed instructions to MCInst: - VOP3 only instructions (always create modifiers as operands in MCInst) - VOP2 instrunctions with modifiers (create modifiers as operands in MCInst when e64 encoding is forced or modifiers are parsed) - VOP2 instructions without modifiers (do not create modifiers as operands in MCInst) - Add VOP3Only flag. Pass HasMods flag to VOP3Common. - Simplify code that deals with modifiers (-1 is now same as 0). This is no longer needed. - Add few tests (more will be added separately). Update error message now correct. Patch By: Nikolay Haustov Differential Revision: http://reviews.llvm.org/D16778 llvm-svn: 260483	2016-02-11 03:28:15 +00:00
Nicolai Haehnle	d791bd07c7	AMDGPU: Release the scavenged offset register during VGPR spill Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427	2016-02-10 20:13:58 +00:00
Matt Arsenault	a14364115e	AMDGPU: Fix indentation and variable names llvm-svn: 260399	2016-02-10 18:21:45 +00:00
Matt Arsenault	6dfda9625d	AMDGPU: Split R600 and SI load lowering These weren't actually sharing anything in the common LowerLOAD. llvm-svn: 260398	2016-02-10 18:21:39 +00:00
Ahmed Bougacha	f8dfb47c02	[CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI. llvm-svn: 260316	2016-02-09 22:54:12 +00:00
Tom Stellard	309617645d	AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit Summary: We will hit this once we have enabled uniform branches. The smrd-vccz-bug.ll test will be added with the uniform branch commit. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16725 llvm-svn: 260137	2016-02-08 19:49:20 +00:00
Matt Arsenault	92edab2df9	AMDGPU: Remove bfi and bfm intrinsics Nothing is using them. llvm-svn: 260123	2016-02-08 19:06:01 +00:00
Matt Arsenault	7f83397d72	AMDGPU: Account for LDS alignment The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912	2016-02-05 19:47:29 +00:00
Matt Arsenault	cf84e26fb6	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Tom Stellard	1242ce9695	AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900	2016-02-05 18:44:57 +00:00
Tom Stellard	5dde1d2eb3	AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16863 llvm-svn: 259897	2016-02-05 18:29:17 +00:00
Tom Stellard	6e1967ef66	AMDGPU/SI: Correctly initialize SIInsertWaits pass Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894	2016-02-05 17:42:38 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	fb8cdbae0c	AMDGPU: Minor cleanups for AMDGPUPromoteAlloca Mostly convert to use range loops. llvm-svn: 259550	2016-02-02 19:32:35 +00:00
Matt Arsenault	e5737f7cac	AMDGPU: Report AMDGPUPromoteAlloca changed the function llvm-svn: 259547	2016-02-02 19:18:57 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Oliver Stannard	7e7d983a87	Refactor backend diagnostics for unsupported features Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498	2016-02-02 13:52:43 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00
Matt Arsenault	d0799df707	AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work correctly. llvm-svn: 259296	2016-01-30 05:10:59 +00:00
Matt Arsenault	43976df0da	AMDGPU: Add new amdgcn workitem intrinsics These use the correct prefix and follow the HSA naming convention rather than the config register option names. llvm-svn: 259293	2016-01-30 04:25:19 +00:00
Matt Arsenault	295875efda	AMDGPU: Remove 24-bit intrinsics The known bit matching code seems to work reasonably well, so these shouldn't really be needed. llvm-svn: 259180	2016-01-29 10:05:16 +00:00
Matt Arsenault	5b39b34ca5	AMDGPU: Match fmed3 patterns with legacy fmin/fmax llvm-svn: 259090	2016-01-28 20:53:48 +00:00
Matt Arsenault	f639c32739	AMDGPU: Match some med3 patterns llvm-svn: 259089	2016-01-28 20:53:42 +00:00
Matt Arsenault	7293f9895e	AMDGPU: Set DX10Clamp bit llvm-svn: 259088	2016-01-28 20:53:35 +00:00
Tom Stellard	3d2c852958	AMDGPU: waitcnt operand fixes Summary: Allow lgkmcnt up to 0xF (hardware allows that). Fix mask for ExpCnt in AMDGPUInstPrinter. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16314 Patch by: Nikolay Haustov llvm-svn: 259059	2016-01-28 17:13:44 +00:00
Tom Stellard	2ff726272a	AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054	2016-01-28 16:04:37 +00:00
Oliver Stannard	02fa1c80c4	Revert r259035, it introduces a cyclic library dependency llvm-svn: 259045	2016-01-28 13:19:47 +00:00
Oliver Stannard	b4b092ea1b	Add backend dignostic printer for unsupported features Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035	2016-01-28 10:07:27 +00:00
NAKAMURA Takumi	628a7a0aef	Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016	2016-01-28 04:41:32 +00:00
Oliver Stannard	1e67a9f196	Refactor backend diagnostics for unsupported features The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951	2016-01-27 17:30:33 +00:00
Tom Stellard	6e3b14de62	AMDGPU/SI: Fix commuting of 32-bit VOPC instructions Summary: We didn't have entries in the commuting table for the 32-bit instructions. I don't think we hit this problem now, but we will once uniform branching is enabled. Tests will come in a later commit. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16600 llvm-svn: 258936	2016-01-27 15:53:52 +00:00
Marek Olsak	e86f252209	AMDGPU/SI: Stoney has only 16 LDS banks Summary: This is a candidate for stable, along with all patches that add the "stoney" processor. Reviewers: tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16485 llvm-svn: 258922	2016-01-27 11:19:45 +00:00
Benjamin Kramer	b3e8a6d2b8	Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs. llvm-svn: 258917	2016-01-27 10:01:28 +00:00
Matt Arsenault	b22828f2fb	AMDGPU: Fix default device handling When no device name is specified, default to kaveri for HSA since SI is not supported and it woud fail. Default to "tahiti" instead of "SI" since these are effectively the same, and tahiti is an actual device. Move default device handling to the TargetMachine rather than the AMDGPUSubtarget. The module ISA version is computed from the device name provided with the target machine, so the attributes printed by the AsmPrinter were inconsistent with those computed in the subtarget. Also remove DevName field from subtarget since it's redundant with getCPU() in the superclass. llvm-svn: 258901	2016-01-27 02:17:49 +00:00
Reid Kleckner	5b4637141e	[llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names This brings the compile time of Function.cpp from ~40s down to ~4s for me locally. It also shaves off about 400KB of object file size in a release+asserts build. I also realized that the AMDGPU backend does not have any GCC builtin names to match, so the extra lookup was a no-op. I removed it to silence a zero-length string table array warning. There should be no functional change here. This change really ends the story of PR11951. llvm-svn: 258897	2016-01-27 01:43:12 +00:00
Reid Kleckner	1c93b4cd7b	[llvm-tblgen] Stop emitting the intrinsic name matching code The AMDGPU backend was the last user of the old StringMatcher recognition code. Move it over to the new lookupLLVMIntrinsicName funciton, which is now improved to handle all of the interesting edge cases exposed by AMDGPU intrinsic names. llvm-svn: 258875	2016-01-26 23:01:21 +00:00
Chris Bieneman	e49730d4ba	Remove autoconf support Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861	2016-01-26 21:29:08 +00:00
Matt Arsenault	bee7575e1a	AMDGPU: Move AMDGPU intrinsics only used by R600 llvm-svn: 258790	2016-01-26 04:49:24 +00:00
Matt Arsenault	382d945d16	AMDGPU: Tidy minor td file issues Make comments and indentation more consistent. Rearrange a few things to be in a more consistent order, such as organizing subtarget features from those describing an actual device property, and those used as options. llvm-svn: 258789	2016-01-26 04:49:22 +00:00
Matt Arsenault	c5f6152911	AMDGPU: Make v32i8/v64i8 illegal types Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788	2016-01-26 04:43:48 +00:00
Matt Arsenault	018179fc46	AMDGPU: Remove old sample intrinsics I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787	2016-01-26 04:38:08 +00:00
Matt Arsenault	051d6f9fde	AMDGPU: Add new amdgcn intrinsics for cube instructions More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786	2016-01-26 04:29:56 +00:00
Matt Arsenault	9a10cea7fb	AMDGPU: Implement read_register and write_register intrinsics Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785	2016-01-26 04:29:24 +00:00
Matt Arsenault	0c3e2338fe	AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783	2016-01-26 04:14:16 +00:00
Matt Arsenault	7713162c32	AMDGPU: Remove more unused intrinsics Replace tests with lrp with basic IR expansion llvm-svn: 258612	2016-01-23 05:42:38 +00:00
Matt Arsenault	f75257aaa6	AMDGPU: Move amdgcn intrinsic handling into SITargetLowering llvm-svn: 258608	2016-01-23 05:32:20 +00:00
Matt Arsenault	f1341406bf	AMDGPU: Remove IntrNoMem from llvm.SI.sendmsg This has side effects. llvm-svn: 258607	2016-01-23 05:32:18 +00:00
Matt Arsenault	2a93bb6365	AMDGPU: Remove Feature64BitPtr This is a leftover from AMDIL that doesn't do anything and doesn't belong here. llvm-svn: 258606	2016-01-23 05:32:14 +00:00
Matt Arsenault	10ca39ca8b	AMDGPU: Add new name for barrier intrinsic llvm-svn: 258558	2016-01-22 21:30:43 +00:00
Matt Arsenault	bef34e21c7	AMDGPU: Rename intrinsics to use amdgcn prefix The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557	2016-01-22 21:30:34 +00:00
Matt Arsenault	0b783ef076	AMDGPU: Fix crash with invariant markers The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537	2016-01-22 19:47:54 +00:00
Matt Arsenault	59bd3014f2	AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523	2016-01-22 19:00:09 +00:00
Matt Arsenault	bb4ff5f5b6	AMDGPU: Remove unused R600 intrinsics llvm-svn: 258522	2016-01-22 18:52:14 +00:00
Matt Arsenault	7898b90ee1	AMDGPU: Change control flow intrinsics to use amdgcn prefix These aren't supposed to be used outside of the backend, so there aren't any users to worry about. llvm-svn: 258516	2016-01-22 18:42:55 +00:00
Matt Arsenault	8d903029e8	AMDGPU: Don't use separate mulhu/mulhs Pats llvm-svn: 258515	2016-01-22 18:42:49 +00:00
Matt Arsenault	ee0930821a	AMDGPU: Remove random TGSI intrinsic I don't think this was ever used. llvm-svn: 258514	2016-01-22 18:42:44 +00:00
Matt Arsenault	0cbaa1762b	AMDGPU: Remove AMDGPU.fract intrinsic Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513	2016-01-22 18:42:38 +00:00
Tom Stellard	de008d338c	AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute Summary: Currently the SI scheduler can be selected via command line option, but it turned out it would be better if it was selectable via a Target Attribute. This patch adds "si-scheduler" attribute to the backend. Reviewers: tstellarAMD, echristo Subscribers: echristo, arsenm Differential Revision: http://reviews.llvm.org/D16192 llvm-svn: 258386	2016-01-21 04:28:34 +00:00
Tom Stellard	d1efda8e9e	AMDGPU/SI: Promote i1 SETCC operations Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352	2016-01-20 21:48:24 +00:00
Matt Arsenault	7836f895fe	AMDGPU: Fix old comments that mention AMDIL llvm-svn: 258350	2016-01-20 21:22:21 +00:00
Matt Arsenault	7ba334a7d9	AMDGPU: Remove AMDGPU.trunc intrinsic llvm-svn: 258348	2016-01-20 21:05:53 +00:00
Matt Arsenault	15fbe49daf	AMDGPU: Remove AMDIL.fraction intrinsic llvm-svn: 258347	2016-01-20 21:05:49 +00:00
Matt Arsenault	7cccd2672e	AMDGPU: Remove AMDIL.round.nearest intrinsic llvm-svn: 258346	2016-01-20 21:05:40 +00:00
Matt Arsenault	1c9e4ef0df	AMDGPU: Remove abs intrinsic llvm-svn: 258343	2016-01-20 20:58:29 +00:00
Matt Arsenault	f7e6e89718	AMDGPU: Remove min/max intrinsics This removes support for mesa 11.0.x llvm-svn: 258342	2016-01-20 20:50:19 +00:00
Tom Stellard	77a177722f	Correctly initialize SIAnnotateControlFlow Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16304 llvm-svn: 258319	2016-01-20 15:48:27 +00:00
Matthias Braun	5d458617aa	RegisterPressure: Make liveness tracking subregister aware Differential Revision: http://reviews.llvm.org/D14968 llvm-svn: 258258	2016-01-20 00:23:26 +00:00
Tom Stellard	2e045bbc5f	AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256	2016-01-20 00:13:22 +00:00
Matt Arsenault	33e3ecee0c	AMDGPU: Reduce 64-bit SRAs llvm-svn: 258096	2016-01-18 22:09:04 +00:00
Matt Arsenault	6e3a45193a	AMDGPU: Split 64-bit and of constant up This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095	2016-01-18 22:01:13 +00:00
Matt Arsenault	3cbbc10488	AMDGPU: Generalize shl combine Reduce 64-bit shl with constant > 32. We already special cased this for the == 32 case, but this also works for any >= 32 constant. llvm-svn: 258092	2016-01-18 21:55:14 +00:00
Matt Arsenault	80edab99ff	AMDGPU: Reduce 64-bit lshr by constant to 32-bit 64-bit shifts are very slow on some subtargets. llvm-svn: 258090	2016-01-18 21:43:36 +00:00
Matt Arsenault	e83690c1cc	AMDGPU: Add subtarget feature for instruction rates llvm-svn: 258085	2016-01-18 21:13:50 +00:00
Manuel Jacob	5f6eaac611	GlobalValue: use getValueType() instead of getType()->getPointerElementType(). Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999	2016-01-16 20:30:46 +00:00
Rui Ueyama	da00f2fdf4	Update to use new name alignTo(). llvm-svn: 257804	2016-01-14 21:06:47 +00:00
Rafael Espindola	8340f94df1	Convert a few assert failures into proper errors. Fixes PR25944. llvm-svn: 257697	2016-01-13 22:56:57 +00:00
Changpeng Fang	c16be00313	AMDGPU/SI: Update ISA version for FIJI llvm-svn: 257666	2016-01-13 20:39:25 +00:00
Hans Wennborg	81efb6b418	Fix struct/class mismatch for MachineSchedContext llvm-svn: 257648	2016-01-13 18:59:45 +00:00
Marek Olsak	46dadbfab2	AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625	2016-01-13 17:23:20 +00:00
Marek Olsak	3c0ebc71f1	AMDGPU/SI: Remove ending s_endpgm from non-void functions Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16035 llvm-svn: 257623	2016-01-13 17:23:12 +00:00
Marek Olsak	8e9cc63bfb	AMDGPU/SI: Add s_waitcnt at the end of non-void functions Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622	2016-01-13 17:23:09 +00:00
Marek Olsak	8a0f335ad6	AMDGPU/SI: Add support for non-void functions Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621	2016-01-13 17:23:04 +00:00
Nicolai Haehnle	02c3291566	AMDGPU/SI: Add SI Machine Scheduler Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609	2016-01-13 16:10:10 +00:00
Marek Olsak	4e99b6ec01	AMDGPU/SI: Allow more shader inputs Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16032 llvm-svn: 257593	2016-01-13 11:46:48 +00:00
Marek Olsak	b6c8c3d165	AMDGPU/SI: Allow any number of PS inputs Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 llvm-svn: 257592	2016-01-13 11:46:10 +00:00

... 14 15 16 17 18 ...

1912 Commits