llvm-project

Commit Graph

Author	SHA1	Message	Date
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Alex Shlyapnikov	79f2c720b5	Revert "AMDGPU: enable 128-bit for local addr space under an option" This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610	2018-04-09 19:47:38 +00:00
Marek Olsak	52b033b827	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591	2018-04-09 16:56:32 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Farhana Aleen	e80aeac0f2	[AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131	2018-04-03 23:00:30 +00:00
Farhana Aleen	936947349a	Revert "MSG" This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119	2018-04-03 21:51:45 +00:00
Farhana Aleen	3ab409dc86	MSG llvm-svn: 329114	2018-04-03 21:20:39 +00:00
Nicolai Haehnle	5d0d30304c	AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics Summary: Avoids having to list all intrinsics manually. This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44937 llvm-svn: 328938	2018-04-01 17:09:07 +00:00
Matt Arsenault	6c041a3cab	AMDGPU: Fix selection error on constant loads with < 4 byte alignment llvm-svn: 328818	2018-03-29 19:59:28 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
Matt Arsenault	0a0c871f60	AMDGPU: Fix crash when MachinePointerInfo invalid The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652	2018-03-27 18:39:45 +00:00
Matt Arsenault	e9f3679031	AMDGPU: Fix FP restore from being reordered with stack ops In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650	2018-03-27 18:38:51 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
Farhana Aleen	c6c9dc8773	[AMDGPU] Supported ds_write_b128 generation. Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726	2018-03-16 18:12:00 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Craig Topper	80d3bb3b4b	[TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832	2018-03-06 19:44:52 +00:00
Alexander Timofeev	2e5eeceeb7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703	2018-03-05 15:12:21 +00:00
Jan Vesely	b283ea0f0f	AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 llvm-svn: 326535	2018-03-02 02:50:22 +00:00
Changpeng Fang	da38b5fd49	AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355	2018-02-16 16:31:30 +00:00
Stanislav Mekhanoshin	c078ca92eb	[AMDGPU] Remove non-temporal flag from argument loads Kernel arguments likely read by all workitems and should not bypass cache. Fixes performance hit in sub-dword argument loads. Differential Revision: https://reviews.llvm.org/D43249 llvm-svn: 325146	2018-02-14 18:05:14 +00:00
Matt Arsenault	923712b6b5	Reapply "AMDGPU: Add 32-bit constant address space" This reverts r324494 and reapplies r324487. llvm-svn: 324747	2018-02-09 16:57:57 +00:00
Matt Arsenault	bcf7bec4b8	AMDGPU: Fix layering issue Move utility function that depends on codegen. Fixes build with r324487 reapplied. llvm-svn: 324746	2018-02-09 16:57:48 +00:00
Rafael Espindola	f4e3f3e31c	Revert "AMDGPU: Add 32-bit constant address space" This reverts commit r324487. It broke clang tests. llvm-svn: 324494	2018-02-07 18:09:35 +00:00
Marek Olsak	871c30e540	AMDGPU: Add 32-bit constant address space Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487	2018-02-07 16:01:00 +00:00
Marek Olsak	13e4741275	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908	2018-01-31 20:18:04 +00:00
Daniil Fukalov	6e1dc68117	[AMDGPU] fix LDS f32 intrinsics - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516	2018-01-26 11:09:38 +00:00
Changpeng Fang	4737e892de	AMDGPU/SI: Add d16 support for image intrinsics. Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903	2018-01-18 22:08:53 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Changpeng Fang	44dfa1de3b	AMDGPU/SI: Add d16 support for buffer intrinsics. Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402	2018-01-12 21:12:19 +00:00
Matt Arsenault	e19bc2ee0f	AMDGPU: Use unique PSVs for buffer resources Also fixes using the wrong memory type for some intrinsics when custom lowering them. llvm-svn: 321557	2017-12-29 17:18:21 +00:00
Matt Arsenault	905f3518ba	AMDGPU: Implement getTgtMemIntrinsic for images Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. llvm-svn: 321555	2017-12-29 17:18:14 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Matt Arsenault	cad7fa857c	AMDGPU: Partially fix disassembly of MIMG instructions Stores failed to decode at all since they didn't have a DecoderNamespace set. Loads worked, but did not change the register width displayed to match the numbmer of enabled channels. The number of printed registers for vaddr is still wrong, but I don't think that's encoded in the instruction so there's not much we can do about that. Image atomics are still broken. MIMG is the same encoding for SI/VI, but the image atomic classes are split up into encoding specific versions unlike every other MIMG instruction. They have isAsmParserOnly set on them for some reason. dmask is also special for these, so we probably should not have it as an explicit operand as it is now. llvm-svn: 320614	2017-12-13 21:07:51 +00:00
Matt Arsenault	856777d8c9	AMDGPU: image_getlod and image_getresinfo do not read memory llvm-svn: 320187	2017-12-08 20:00:57 +00:00
Matt Arsenault	ecad0d5364	AMDGPU: Preserve MMO in adjustWritemask Follow up to r319705. Currently the MMO is produced after this in the custom inserter, so this doesn't change anything yet. llvm-svn: 320186	2017-12-08 20:00:45 +00:00
Matt Arsenault	68f0505263	AMDGPU: Fix creating invalid copy when adjusting dmask Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. llvm-svn: 319705	2017-12-04 22:18:27 +00:00
Matt Arsenault	84445dd13c	AMDGPU: Use gfx9 carry-less add/sub instructions llvm-svn: 319491	2017-11-30 22:51:26 +00:00
Matt Arsenault	b655fa9ce2	DAG: Add nuw when splitting loads and stores The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. llvm-svn: 319272	2017-11-29 01:25:12 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Yaxun Liu	c596226604	[AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes flat load instead of buffer load. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40040 llvm-svn: 318844	2017-11-22 16:13:35 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Matt Arsenault	301162c4fe	AMDGPU: Replace i64 add/sub lowering Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340	2017-11-15 21:51:43 +00:00
Matt Arsenault	45b98189bd	AMDGPU: Don't use MUBUF vaddr if address may overflow Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240	2017-11-15 00:45:43 +00:00

1 2 3 4 5 ...

416 Commits