llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	7f83397d72	AMDGPU: Account for LDS alignment The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912	2016-02-05 19:47:29 +00:00
Matt Arsenault	cf84e26fb6	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Tom Stellard	1242ce9695	AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900	2016-02-05 18:44:57 +00:00
Tom Stellard	5dde1d2eb3	AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16863 llvm-svn: 259897	2016-02-05 18:29:17 +00:00
Tom Stellard	6e1967ef66	AMDGPU/SI: Correctly initialize SIInsertWaits pass Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894	2016-02-05 17:42:38 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	fb8cdbae0c	AMDGPU: Minor cleanups for AMDGPUPromoteAlloca Mostly convert to use range loops. llvm-svn: 259550	2016-02-02 19:32:35 +00:00
Matt Arsenault	e5737f7cac	AMDGPU: Report AMDGPUPromoteAlloca changed the function llvm-svn: 259547	2016-02-02 19:18:57 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Oliver Stannard	7e7d983a87	Refactor backend diagnostics for unsupported features Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498	2016-02-02 13:52:43 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00
Matt Arsenault	d0799df707	AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work correctly. llvm-svn: 259296	2016-01-30 05:10:59 +00:00
Matt Arsenault	43976df0da	AMDGPU: Add new amdgcn workitem intrinsics These use the correct prefix and follow the HSA naming convention rather than the config register option names. llvm-svn: 259293	2016-01-30 04:25:19 +00:00
Matt Arsenault	295875efda	AMDGPU: Remove 24-bit intrinsics The known bit matching code seems to work reasonably well, so these shouldn't really be needed. llvm-svn: 259180	2016-01-29 10:05:16 +00:00
Matt Arsenault	5b39b34ca5	AMDGPU: Match fmed3 patterns with legacy fmin/fmax llvm-svn: 259090	2016-01-28 20:53:48 +00:00
Matt Arsenault	f639c32739	AMDGPU: Match some med3 patterns llvm-svn: 259089	2016-01-28 20:53:42 +00:00
Matt Arsenault	7293f9895e	AMDGPU: Set DX10Clamp bit llvm-svn: 259088	2016-01-28 20:53:35 +00:00
Tom Stellard	3d2c852958	AMDGPU: waitcnt operand fixes Summary: Allow lgkmcnt up to 0xF (hardware allows that). Fix mask for ExpCnt in AMDGPUInstPrinter. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16314 Patch by: Nikolay Haustov llvm-svn: 259059	2016-01-28 17:13:44 +00:00
Tom Stellard	2ff726272a	AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054	2016-01-28 16:04:37 +00:00
Oliver Stannard	02fa1c80c4	Revert r259035, it introduces a cyclic library dependency llvm-svn: 259045	2016-01-28 13:19:47 +00:00
Oliver Stannard	b4b092ea1b	Add backend dignostic printer for unsupported features Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035	2016-01-28 10:07:27 +00:00
NAKAMURA Takumi	628a7a0aef	Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016	2016-01-28 04:41:32 +00:00
Oliver Stannard	1e67a9f196	Refactor backend diagnostics for unsupported features The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951	2016-01-27 17:30:33 +00:00
Tom Stellard	6e3b14de62	AMDGPU/SI: Fix commuting of 32-bit VOPC instructions Summary: We didn't have entries in the commuting table for the 32-bit instructions. I don't think we hit this problem now, but we will once uniform branching is enabled. Tests will come in a later commit. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16600 llvm-svn: 258936	2016-01-27 15:53:52 +00:00
Marek Olsak	e86f252209	AMDGPU/SI: Stoney has only 16 LDS banks Summary: This is a candidate for stable, along with all patches that add the "stoney" processor. Reviewers: tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16485 llvm-svn: 258922	2016-01-27 11:19:45 +00:00
Benjamin Kramer	b3e8a6d2b8	Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs. llvm-svn: 258917	2016-01-27 10:01:28 +00:00
Matt Arsenault	b22828f2fb	AMDGPU: Fix default device handling When no device name is specified, default to kaveri for HSA since SI is not supported and it woud fail. Default to "tahiti" instead of "SI" since these are effectively the same, and tahiti is an actual device. Move default device handling to the TargetMachine rather than the AMDGPUSubtarget. The module ISA version is computed from the device name provided with the target machine, so the attributes printed by the AsmPrinter were inconsistent with those computed in the subtarget. Also remove DevName field from subtarget since it's redundant with getCPU() in the superclass. llvm-svn: 258901	2016-01-27 02:17:49 +00:00
Reid Kleckner	5b4637141e	[llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names This brings the compile time of Function.cpp from ~40s down to ~4s for me locally. It also shaves off about 400KB of object file size in a release+asserts build. I also realized that the AMDGPU backend does not have any GCC builtin names to match, so the extra lookup was a no-op. I removed it to silence a zero-length string table array warning. There should be no functional change here. This change really ends the story of PR11951. llvm-svn: 258897	2016-01-27 01:43:12 +00:00
Reid Kleckner	1c93b4cd7b	[llvm-tblgen] Stop emitting the intrinsic name matching code The AMDGPU backend was the last user of the old StringMatcher recognition code. Move it over to the new lookupLLVMIntrinsicName funciton, which is now improved to handle all of the interesting edge cases exposed by AMDGPU intrinsic names. llvm-svn: 258875	2016-01-26 23:01:21 +00:00
Chris Bieneman	e49730d4ba	Remove autoconf support Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861	2016-01-26 21:29:08 +00:00
Matt Arsenault	bee7575e1a	AMDGPU: Move AMDGPU intrinsics only used by R600 llvm-svn: 258790	2016-01-26 04:49:24 +00:00
Matt Arsenault	382d945d16	AMDGPU: Tidy minor td file issues Make comments and indentation more consistent. Rearrange a few things to be in a more consistent order, such as organizing subtarget features from those describing an actual device property, and those used as options. llvm-svn: 258789	2016-01-26 04:49:22 +00:00
Matt Arsenault	c5f6152911	AMDGPU: Make v32i8/v64i8 illegal types Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788	2016-01-26 04:43:48 +00:00
Matt Arsenault	018179fc46	AMDGPU: Remove old sample intrinsics I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787	2016-01-26 04:38:08 +00:00
Matt Arsenault	051d6f9fde	AMDGPU: Add new amdgcn intrinsics for cube instructions More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786	2016-01-26 04:29:56 +00:00
Matt Arsenault	9a10cea7fb	AMDGPU: Implement read_register and write_register intrinsics Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785	2016-01-26 04:29:24 +00:00
Matt Arsenault	0c3e2338fe	AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783	2016-01-26 04:14:16 +00:00
Matt Arsenault	7713162c32	AMDGPU: Remove more unused intrinsics Replace tests with lrp with basic IR expansion llvm-svn: 258612	2016-01-23 05:42:38 +00:00
Matt Arsenault	f75257aaa6	AMDGPU: Move amdgcn intrinsic handling into SITargetLowering llvm-svn: 258608	2016-01-23 05:32:20 +00:00
Matt Arsenault	f1341406bf	AMDGPU: Remove IntrNoMem from llvm.SI.sendmsg This has side effects. llvm-svn: 258607	2016-01-23 05:32:18 +00:00
Matt Arsenault	2a93bb6365	AMDGPU: Remove Feature64BitPtr This is a leftover from AMDIL that doesn't do anything and doesn't belong here. llvm-svn: 258606	2016-01-23 05:32:14 +00:00
Matt Arsenault	10ca39ca8b	AMDGPU: Add new name for barrier intrinsic llvm-svn: 258558	2016-01-22 21:30:43 +00:00
Matt Arsenault	bef34e21c7	AMDGPU: Rename intrinsics to use amdgcn prefix The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557	2016-01-22 21:30:34 +00:00
Matt Arsenault	0b783ef076	AMDGPU: Fix crash with invariant markers The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537	2016-01-22 19:47:54 +00:00
Matt Arsenault	59bd3014f2	AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523	2016-01-22 19:00:09 +00:00
Matt Arsenault	bb4ff5f5b6	AMDGPU: Remove unused R600 intrinsics llvm-svn: 258522	2016-01-22 18:52:14 +00:00
Matt Arsenault	7898b90ee1	AMDGPU: Change control flow intrinsics to use amdgcn prefix These aren't supposed to be used outside of the backend, so there aren't any users to worry about. llvm-svn: 258516	2016-01-22 18:42:55 +00:00
Matt Arsenault	8d903029e8	AMDGPU: Don't use separate mulhu/mulhs Pats llvm-svn: 258515	2016-01-22 18:42:49 +00:00
Matt Arsenault	ee0930821a	AMDGPU: Remove random TGSI intrinsic I don't think this was ever used. llvm-svn: 258514	2016-01-22 18:42:44 +00:00
Matt Arsenault	0cbaa1762b	AMDGPU: Remove AMDGPU.fract intrinsic Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513	2016-01-22 18:42:38 +00:00
Tom Stellard	de008d338c	AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute Summary: Currently the SI scheduler can be selected via command line option, but it turned out it would be better if it was selectable via a Target Attribute. This patch adds "si-scheduler" attribute to the backend. Reviewers: tstellarAMD, echristo Subscribers: echristo, arsenm Differential Revision: http://reviews.llvm.org/D16192 llvm-svn: 258386	2016-01-21 04:28:34 +00:00
Tom Stellard	d1efda8e9e	AMDGPU/SI: Promote i1 SETCC operations Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352	2016-01-20 21:48:24 +00:00
Matt Arsenault	7836f895fe	AMDGPU: Fix old comments that mention AMDIL llvm-svn: 258350	2016-01-20 21:22:21 +00:00
Matt Arsenault	7ba334a7d9	AMDGPU: Remove AMDGPU.trunc intrinsic llvm-svn: 258348	2016-01-20 21:05:53 +00:00
Matt Arsenault	15fbe49daf	AMDGPU: Remove AMDIL.fraction intrinsic llvm-svn: 258347	2016-01-20 21:05:49 +00:00
Matt Arsenault	7cccd2672e	AMDGPU: Remove AMDIL.round.nearest intrinsic llvm-svn: 258346	2016-01-20 21:05:40 +00:00
Matt Arsenault	1c9e4ef0df	AMDGPU: Remove abs intrinsic llvm-svn: 258343	2016-01-20 20:58:29 +00:00
Matt Arsenault	f7e6e89718	AMDGPU: Remove min/max intrinsics This removes support for mesa 11.0.x llvm-svn: 258342	2016-01-20 20:50:19 +00:00
Tom Stellard	77a177722f	Correctly initialize SIAnnotateControlFlow Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16304 llvm-svn: 258319	2016-01-20 15:48:27 +00:00
Matthias Braun	5d458617aa	RegisterPressure: Make liveness tracking subregister aware Differential Revision: http://reviews.llvm.org/D14968 llvm-svn: 258258	2016-01-20 00:23:26 +00:00
Tom Stellard	2e045bbc5f	AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256	2016-01-20 00:13:22 +00:00
Matt Arsenault	33e3ecee0c	AMDGPU: Reduce 64-bit SRAs llvm-svn: 258096	2016-01-18 22:09:04 +00:00
Matt Arsenault	6e3a45193a	AMDGPU: Split 64-bit and of constant up This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095	2016-01-18 22:01:13 +00:00
Matt Arsenault	3cbbc10488	AMDGPU: Generalize shl combine Reduce 64-bit shl with constant > 32. We already special cased this for the == 32 case, but this also works for any >= 32 constant. llvm-svn: 258092	2016-01-18 21:55:14 +00:00
Matt Arsenault	80edab99ff	AMDGPU: Reduce 64-bit lshr by constant to 32-bit 64-bit shifts are very slow on some subtargets. llvm-svn: 258090	2016-01-18 21:43:36 +00:00
Matt Arsenault	e83690c1cc	AMDGPU: Add subtarget feature for instruction rates llvm-svn: 258085	2016-01-18 21:13:50 +00:00
Manuel Jacob	5f6eaac611	GlobalValue: use getValueType() instead of getType()->getPointerElementType(). Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999	2016-01-16 20:30:46 +00:00
Rui Ueyama	da00f2fdf4	Update to use new name alignTo(). llvm-svn: 257804	2016-01-14 21:06:47 +00:00
Rafael Espindola	8340f94df1	Convert a few assert failures into proper errors. Fixes PR25944. llvm-svn: 257697	2016-01-13 22:56:57 +00:00
Changpeng Fang	c16be00313	AMDGPU/SI: Update ISA version for FIJI llvm-svn: 257666	2016-01-13 20:39:25 +00:00
Hans Wennborg	81efb6b418	Fix struct/class mismatch for MachineSchedContext llvm-svn: 257648	2016-01-13 18:59:45 +00:00
Marek Olsak	46dadbfab2	AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625	2016-01-13 17:23:20 +00:00
Marek Olsak	3c0ebc71f1	AMDGPU/SI: Remove ending s_endpgm from non-void functions Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16035 llvm-svn: 257623	2016-01-13 17:23:12 +00:00
Marek Olsak	8e9cc63bfb	AMDGPU/SI: Add s_waitcnt at the end of non-void functions Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622	2016-01-13 17:23:09 +00:00
Marek Olsak	8a0f335ad6	AMDGPU/SI: Add support for non-void functions Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621	2016-01-13 17:23:04 +00:00
Nicolai Haehnle	02c3291566	AMDGPU/SI: Add SI Machine Scheduler Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609	2016-01-13 16:10:10 +00:00
Marek Olsak	4e99b6ec01	AMDGPU/SI: Allow more shader inputs Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16032 llvm-svn: 257593	2016-01-13 11:46:48 +00:00
Marek Olsak	b6c8c3d165	AMDGPU/SI: Allow any number of PS inputs Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 llvm-svn: 257592	2016-01-13 11:46:10 +00:00
Marek Olsak	fccabaf57e	AMDGPU/SI: Add new target attribute InitialPSInputAddr Summary: This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM. The register assigns VGPR locations to PS inputs, while the ENA register determines whether or not they are loaded. Mesa needs to set some inputs as not-movable, so that a pixel shader prolog binary appended at the beginning can assume where some inputs are. v2: Make PSInputAddr private, because there is never enough silly getters and setters for people to read. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16030 llvm-svn: 257591	2016-01-13 11:45:36 +00:00
Marek Olsak	926c56f50c	AMDGPU/SI: Fix a bug in SIFoldOperands Summary: ret.ll will contain a test for this Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16029 llvm-svn: 257590	2016-01-13 11:44:29 +00:00
Tom Stellard	f421837250	AMDGPU: Emit note directive for HSA even if there are no functions Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16010 llvm-svn: 257488	2016-01-12 17:18:17 +00:00
Matt Arsenault	5e0bdb8b95	AMDGPU: Implement {{s\|u}}int_to_fp i64 -> f32 The old lowering for uint_to_fp failed opencl conformance. It might be OK for fast math mode, but I'm not sure. llvm-svn: 257393	2016-01-11 22:01:48 +00:00
Matt Arsenault	800fecf9de	AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target It might be better to let this be a select failure instead. llvm-svn: 257386	2016-01-11 21:18:33 +00:00
Matt Arsenault	5319b0add5	AMDGPU: Fix ctlz combine for sub 32-bit types llvm-svn: 257353	2016-01-11 17:02:06 +00:00
Matt Arsenault	de5fbe9c60	AMDGPU: Pattern match ffbh pattern to instruction. The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. llvm-svn: 257352	2016-01-11 17:02:00 +00:00
Matt Arsenault	f058d67643	AMDGPU: Custom lower i64 ctlz llvm-svn: 257348	2016-01-11 16:50:29 +00:00
Matt Arsenault	5ca3c72c5a	LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal llvm-svn: 257345	2016-01-11 16:37:46 +00:00
Matt Arsenault	02d45dfeda	AMDGPU: Remove dead target dag combine llvm-svn: 257344	2016-01-11 16:37:40 +00:00
Tom Stellard	4c4c72db48	AMDGPU/SI: Emit global variable sizes when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15952 llvm-svn: 257173	2016-01-08 14:50:28 +00:00
Tom Stellard	ad8f5e8111	AMDGPU: Emit functions sizes Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15951 llvm-svn: 257172	2016-01-08 14:50:23 +00:00
Nicolai Haehnle	82fc962c20	AMDGPU/SI: Fold operands with sub-registers Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074	2016-01-07 17:10:29 +00:00
Nicolai Haehnle	3c05d6d3b5	AMDGPU/SI: xnack_mask is always reserved on VI Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073	2016-01-07 17:10:20 +00:00
Nicolai Haehnle	a61e5a8d4e	AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader Summary: This is admittedly something that you could only run into by manually playing around with shader assembly because the SITypeWriter pass is skipped for compute. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15902 llvm-svn: 256980	2016-01-06 22:01:04 +00:00
Nicolai Haehnle	6035504ab3	AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870	2016-01-05 20:42:49 +00:00
Matt Arsenault	905042774d	AMDGPU: Remove redundant let mayLoad = 1 This is already set on the SMRD format class. llvm-svn: 256813	2016-01-05 04:50:28 +00:00
Tom Stellard	5cd09ade38	AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions for HSA Summary: This fixes a regression caused by r256282. Reviewers: arsenm, cfang Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15736 llvm-svn: 256810	2016-01-05 03:40:16 +00:00
Tom Stellard	2c82ee60c3	AMDGPU/SI: Consolidate FLAT patterns Summary: We had to sets of identical FLAT patterns one inside the HasFlatAddressSpace predicate and one inside the useFlatForGloabl predicate. This patch merges these sets into a single pattern under the isCIVI predicate. The reason we can remove the predicates is that when MUBUF instructions are legal, the instruction selector will prefer selecting those over FLAT instructions because MUBUF patterns have a higher complexity score. So, in this case having patterns for FLAT instructions will have no effect. This change also simplifies the process for forcing global address space loads to use FLAT instructions, since we no only have to disable the MUBUF patterns instead of having to disable the MUBUF patterns and enable the FLAT patterns. Reviewers: arsenm, cfang Subscribers: llvm-commits llvm-svn: 256807	2016-01-05 02:26:37 +00:00
Nicolai Haehnle	5b50497617	AMDGPU: add +xnack feature Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794	2016-01-04 23:35:53 +00:00
Tom Stellard	3da5672755	AMDGPU/SI: Move VI SMEM pattern back into VIInstructions.td Summary: This was accidently moved to CIInstructions.td in r256282 Reviewers: cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15763 llvm-svn: 256775	2016-01-04 20:23:10 +00:00
Nicolai Haehnle	e705aadd67	AMDGPU: Avoid assertions after SGPR spilling failed Summary: The comment explains it: emitError does not necessarily exit the compilation process, and then using NoRegister leads to assertions later on. This generates incorrect code, of course, but the user should know to not use the result when an error has been emitted. It would be nice to have a test-case for this inside the LLVM repository, but llc exits on error. shader-db tests trigger the underlying issue at least on Tonga. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15826 llvm-svn: 256757	2016-01-04 15:50:01 +00:00
Craig Topper	daf2e3ff7a	Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC llvm-svn: 256427	2015-12-25 22:10:01 +00:00
Matt Arsenault	4339b3ff35	AMDGPU: Fix getRegisterBitWidth for vectors llvm-svn: 256362	2015-12-24 05:14:55 +00:00
Tom Stellard	5ebdfbe562	AMDGPU/SI: Fix encoding of flat instructions on VI Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15735 llvm-svn: 256360	2015-12-24 03:18:18 +00:00
Tom Stellard	668f793049	AMDGPU/SI: Remove non-existent flat instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15734 llvm-svn: 256357	2015-12-24 02:41:55 +00:00
Changpeng Fang	b41574a961	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282	2015-12-22 20:55:23 +00:00
Rafael Espindola	4b0d24c00a	Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA" This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275	2015-12-22 19:46:44 +00:00
Changpeng Fang	9b8a9be058	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273	2015-12-22 19:32:28 +00:00
Tom Stellard	2b65ed306d	AMDGPU/SI: Fix encoding for FLAT_SCRATCH registers on VI Summary: These register has different encodings on CI and VI, so we add pseudo FLAT_SCRACTH registers to be used before MC, and subtarget specific registers to be used by the MC layer. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15661 llvm-svn: 256178	2015-12-21 18:44:27 +00:00
Tom Stellard	9da8620cdb	AMDGPU/SI: Change assembly name for flat scratch registers to flat_scratch This matches what the assembler accepts. llvm-svn: 256177	2015-12-21 18:44:21 +00:00
Tom Stellard	ffc1a5aef7	AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders Summary: The analysis of shader inputs was completely wrong. We were passing the wrong index to AttributeSet::hasAttribute() and the logic for which inputs where in SGPRs was wrong too. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15608 llvm-svn: 256082	2015-12-19 02:54:15 +00:00
Matt Arsenault	2aed6ca1d3	AMDGPU: Switch barrier intrinsics to using convergent noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075	2015-12-19 01:46:41 +00:00
Nicolai Haehnle	6bcf8b2890	AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15629 llvm-svn: 256073	2015-12-19 01:36:26 +00:00
Nicolai Haehnle	dd58705af6	AMDGPU: fix overlapping copies in copyPhysReg Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072	2015-12-19 01:16:06 +00:00
Changpeng Fang	c9963936e7	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022	2015-12-18 20:04:28 +00:00
Changpeng Fang	ef735b74c1	Revert "AMDGPU/SI: Test commit" This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021	2015-12-18 20:04:26 +00:00
Changpeng Fang	7fdf674c2e	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020	2015-12-18 19:57:41 +00:00
Tom Stellard	caaa3aa07c	AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908	2015-12-17 17:05:09 +00:00
Nicolai Haehnle	87323da6eb	AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906	2015-12-17 16:46:42 +00:00
Matt Arsenault	e05ff15186	AMDGPU: Override getCFInstrCost The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796	2015-12-16 18:37:19 +00:00
Tom Stellard	7750f4ed9e	AMDGPU/SI: Set the code object work group segment size when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15493 llvm-svn: 255702	2015-12-15 23:15:25 +00:00
Tom Stellard	a495307e5e	AMDGPU/SI: Set the code objects private segment size when targeting HSA. Summary: I'm not sure how things worked before without this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15492 llvm-svn: 255692	2015-12-15 22:55:30 +00:00
Tom Stellard	29dd05e92f	AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15426 llvm-svn: 255689	2015-12-15 22:39:36 +00:00
Tom Stellard	a6f24c6565	AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 llvm-svn: 255672	2015-12-15 20:55:55 +00:00
Tom Stellard	dbe374b2c5	AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence() Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15476 llvm-svn: 255661	2015-12-15 18:04:38 +00:00
Tom Stellard	8f307217c3	AMDGPU/SI: Fix bitcast between v2f32 and f64 The radeonsi fp64 support can hit these now that some redundant bitcasts are folded. Patch by: Michel Dänzer Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 255657	2015-12-15 17:11:17 +00:00
Tom Stellard	43f52df0b5	AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics Summary: These are meant to be used instead of the llvm.SI.tid intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15475 llvm-svn: 255652	2015-12-15 17:02:52 +00:00
Tom Stellard	ad7d03daa6	AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics Summary: These are meant to be used instead of the llvm.SI.fs.interp intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15474 llvm-svn: 255651	2015-12-15 17:02:49 +00:00
Tom Stellard	ac00eb5470	AMDGPU/SI: Add getShaderType() function to Utils/ Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15424 llvm-svn: 255650	2015-12-15 16:26:16 +00:00
Krzysztof Parzyszek	dac7102874	[Packetizer] Add AliasAnalysis as a parameter to the packetizer This will make the depedence graph more accurate if an alias analysis is provided. If nullptr is specified in its place, the behavior will remain as it is currently. llvm-svn: 255540	2015-12-14 20:35:13 +00:00
Krzysztof Parzyszek	d44a1fd506	Add "const" to function arguments in DFAPacketizer llvm-svn: 255526	2015-12-14 18:54:44 +00:00
Matt Arsenault	d079285e05	AMDGPU: Use generic bitreverse intrinsic Also fix bug in vector legalization for bitreverse. llvm-svn: 255512	2015-12-14 17:25:38 +00:00
Matt Arsenault	52a52a564b	AMDGPU: Fix splitting vector loads with existing offsets If the original MMO had an offset, it was dropped. Also use the correct alignment after adding the new offset. llvm-svn: 255508	2015-12-14 16:59:40 +00:00
Cong Hou	c106989fd5	Normalize MBB's successors' probabilities in several locations. This patch adds some missing calls to MBB::normalizeSuccProbs() in several locations where it should be called. Those places are found by checking if the sum of successors' probabilities is approximate one in MachineBlockPlacement pass with some instrumented code (not in this patch). Differential revision: http://reviews.llvm.org/D15259 llvm-svn: 255455	2015-12-13 09:26:17 +00:00
Matt Arsenault	fbd9bbfda3	Start replacing vector_extract/vector_insert with extractelt/insertelt These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359	2015-12-11 19:20:16 +00:00
Tom Stellard	c2d654322b	AMDGPU/SI: Fix warning introduced by r255204 llvm-svn: 255205	2015-12-10 03:10:46 +00:00
Tom Stellard	c93fc11f36	AMDGPU/SI: Emit constant arrays in the .text section Summary: This allows us to remove the END_OF_TEXT_LABEL hack we had been using and simplifies the fixups used to compute the address of constant arrays. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15257 llvm-svn: 255204	2015-12-10 02:13:01 +00:00
Tom Stellard	b3c3bda512	AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraints Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs. Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15342 llvm-svn: 255203	2015-12-10 02:12:53 +00:00
Tom Stellard	9760f03757	AMDGPU/SI: Emit constant arrays in the .hsrodata_readonly_agent section Summary: This is done only when targeting HSA. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13807 llvm-svn: 254587	2015-12-03 03:34:32 +00:00
Tom Stellard	00f2f91af4	AMDGPU/SI: Correctly emit agent global segment variables when targeting HSA Differential Revision: http://reviews.llvm.org/D14508 llvm-svn: 254540	2015-12-02 19:47:57 +00:00
Tom Stellard	e928533dae	AMDGPU: Fix msan test failure llvm-svn: 254527	2015-12-02 18:35:23 +00:00
Tom Stellard	e3b5aeaf83	AMDGPU/SI: Don't emit group segment global variables Summary: Only global or readonly segment variables should appear in object files. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15111 llvm-svn: 254519	2015-12-02 17:00:42 +00:00
Matt Arsenault	592d068198	AMDGPU: Error on addrspacecasts that aren't actually implemented llvm-svn: 254469	2015-12-01 23:04:05 +00:00
Matt Arsenault	f9bfeafd00	AMDGPU: Implement isNoopAddrSpaceCast llvm-svn: 254468	2015-12-01 23:04:00 +00:00
Matt Arsenault	3b15967008	AMDGPU: Disallow flat_scr in SI assembler llvm-svn: 254459	2015-12-01 20:31:08 +00:00
Matt Arsenault	856d1928a8	AMDGPU: Optimize VOP2 operand legalization Don't use commuteInstruction, and don't commute if doing so will not improve legality. Skip the more complex checks for literal operands and constant bus restrictions, which are not a concern for VOP2 instructions because src1 does not accept SGPRs or constants and few implicitly read vcc. This gets called quite a few times and the attempts at commuting are a significant fraction of the time spent in SIFixSGPRCopies, so it's somewhat worthwhile to optimize. With this patch and others leading up to it, this reduces the compile time of SIFixSGPRCopies on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system. llvm-svn: 254452	2015-12-01 19:57:17 +00:00
Matt Arsenault	e830f5427b	AMDGPU: Report extractelement as free in cost model The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. llvm-svn: 254438	2015-12-01 19:08:39 +00:00
Tom Stellard	38b7cbe3e0	AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now dead Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15050 llvm-svn: 254427	2015-12-01 17:45:22 +00:00
Tom Stellard	ff63c25753	AMDGPU: Use the default strings for data emission directives Summary: This makes the assembly output look nicer and there is no reason to have custom strings for these. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14671 llvm-svn: 254426	2015-12-01 17:45:17 +00:00
Cong Hou	d97c100dc4	Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces. (This is the second attempt to submit this patch. The first caused two assertion failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687) The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254377	2015-12-01 05:29:22 +00:00
Hans Wennborg	1dbaf67537	Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces." and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction." Asserts were firing in Chromium builds. See PR25687. llvm-svn: 254366	2015-12-01 03:49:42 +00:00
Matt Arsenault	456fdfcdc2	Squelch unused variable warning in SIRegisterInfo.cpp. Patch by Justin Lebar llvm-svn: 254362	2015-12-01 02:14:33 +00:00
Cong Hou	fa1917c673	Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces. The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254348	2015-12-01 00:02:51 +00:00
Matt Arsenault	ada6cf1b22	AMDGPU: Fix unused function llvm-svn: 254333	2015-11-30 21:32:10 +00:00
Matt Arsenault	41003af292	AMDGPU: Error if too many user SGPRs used llvm-svn: 254332	2015-11-30 21:16:07 +00:00
Matt Arsenault	26f8f3db39	AMDGPU: Rework how private buffer passed for HSA If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331	2015-11-30 21:16:03 +00:00
Matt Arsenault	ac234b604d	AMDGPU: Rename enums to be consistent with HSA code object terminology llvm-svn: 254330	2015-11-30 21:15:57 +00:00
Matt Arsenault	0e3d38937e	AMDGPU: Remove SIPrepareScratchRegs It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329	2015-11-30 21:15:53 +00:00
Matt Arsenault	ff6da2fe89	AMDGPU: Use assert zext for workgroup sizes llvm-svn: 254328	2015-11-30 21:15:45 +00:00
Matt Arsenault	ea03cf2fa1	AMDGPU: Don't reserve SCRATCH_PTR input register This hasn't been doing anything since using relocations was added. llvm-svn: 254304	2015-11-30 15:46:47 +00:00
Tom Stellard	48f29f21ee	AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsic Summary: This returns a pointer to the dispatch packet, which can be used to load information about the kernel dispach. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14898 llvm-svn: 254116	2015-11-26 00:43:29 +00:00
Marek Olsak	7ed6b2f414	AMDGPU/SI: select S_ABS_I32 when possible (v2) v2: added more tests, moved the SALU->VALU conversion to a separate function It looks like it's not possible to get subregisters in the S_ABS lowering code, and I don't feel like guessing without testing what the correct code would look like. llvm-svn: 254095	2015-11-25 21:22:45 +00:00
Matt Arsenault	49affb8462	AMDGPU: Check feature attributes in SIMachineFunctionInfo llvm-svn: 254091	2015-11-25 20:55:12 +00:00
Matt Arsenault	61001bbc03	AMDGPU: Make v2i64/v2f64 legal types. They can be loaded and stored, so count them as legal. This is mostly to fix a number of common cases for load/store merging. llvm-svn: 254086	2015-11-25 19:58:34 +00:00
Artyom Skrobov	314ee04268	Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC) Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085	2015-11-25 19:41:11 +00:00
Matt Arsenault	ff05da806c	AMDGPU: Split LDS vector loads If properly aligned this could allow using ds_read_b64. llvm-svn: 253975	2015-11-24 12:18:54 +00:00
Matt Arsenault	4d801cd357	AMDGPU: Split x8 and x16 vector loads instead of scalarize The one regression in the builtin tests is in the read2 test which now (again) has many extra copies, but this should be solved once the pass is replaced with a DAG combine. llvm-svn: 253974	2015-11-24 12:05:03 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Akira Hatanaka	b11ef0897c	Reduce the size of MCRelaxableFragment. MCRelaxableFragment previously kept a copy of MCSubtargetInfo and MCInst to enable re-encoding the MCInst later during relaxation. A copy of MCSubtargetInfo (instead of a reference or pointer) was needed because the feature bits could be modified by the parser. This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment with a constant reference to MCSubtargetInfo. The copies of MCSubtargetInfo are kept in MCContext, and the target parsers are now responsible for asking MCContext to provide a copy whenever the feature bits of MCSubtargetInfo have to be toggled. With this patch, I saw a 4% reduction in peak memory usage when I compiled verify-uselistorder.lto.bc using llc. rdar://problem/21736951 Differential Revision: http://reviews.llvm.org/D14346 llvm-svn: 253127	2015-11-14 06:35:56 +00:00
Akira Hatanaka	bd9fc28444	[MCTargetAsmParser] Move the member varialbes that reference MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a member function getSTI. This is done in preparation for making changes to shrink the size of MCRelaxableFragment. (see http://reviews.llvm.org/D14346). llvm-svn: 253124	2015-11-14 05:20:05 +00:00
Tom Stellard	afd6e2f3c3	AMDGPU: Add stony support Patch by: Alex Deucher llvm-svn: 253053	2015-11-13 17:06:32 +00:00
Tom Stellard	0967c91e0c	Revert "Remove unnecessary call to getAllocatableRegClass" This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956	2015-11-12 21:43:25 +00:00
Matt Arsenault	8246d4aead	AMDGPU: Print more fields in comments llvm-svn: 252677	2015-11-11 00:27:46 +00:00
Matt Arsenault	61cb6fa848	AMDGPU: Remove dead code llvm-svn: 252675	2015-11-11 00:01:36 +00:00
Matt Arsenault	6690d7de39	AMDGPU: Set isAllocatable = 0 on VS_32/VS_64 llvm-svn: 252674	2015-11-11 00:01:32 +00:00
Tom Stellard	41b7e63040	AMDGPU/SI: Refactor VOP[12C] tablegen definitions Summary: Pass the VOPProfile object all the through to *_m multiclasses. This will allow us to do more simplifications in the future. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13437 llvm-svn: 252339	2015-11-06 20:56:18 +00:00
Matt Arsenault	f59e538937	AMDGPU: Cleanup includes llvm-svn: 252328	2015-11-06 18:23:00 +00:00
Matt Arsenault	0c90e9501e	AMDGPU: Create emergency stack slots during frame lowering Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327	2015-11-06 18:17:45 +00:00
Matt Arsenault	08f14de244	AMDGPU: Remove unused scratch resource operands The SGPR spill pseudos don't actually use them. llvm-svn: 252324	2015-11-06 18:07:53 +00:00
Matt Arsenault	3931948bb6	AMDGPU: Add pass to detect used kernel features Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323	2015-11-06 18:01:57 +00:00
Matt Arsenault	4dc7a5a5c6	AMDGPU: Fix hardcoded alignment of spill. Instead of forcing 4 alignment when spilled, set register class alignments. llvm-svn: 252322	2015-11-06 17:54:47 +00:00
Matt Arsenault	623e6fd466	AMDGPU: Hack for VS_32 register pressure For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321	2015-11-06 17:54:43 +00:00
Tom Stellard	1e1b05db24	AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291	2015-11-06 11:45:14 +00:00
Matt Arsenault	5b22dfa65d	AMDGPU: Also track whether SGPRs were spilled llvm-svn: 252145	2015-11-05 05:27:10 +00:00
Matt Arsenault	d41c0dbff0	AMDGPU: Print number user SGPRs This doesn't quite match how SC prints it, which doesn't put it in a comment. llvm-svn: 252144	2015-11-05 05:27:07 +00:00
Matt Arsenault	68802d3177	AMDGPU: Disallow s[102:103] on VI in assembler llvm-svn: 252142	2015-11-05 03:11:27 +00:00
Matt Arsenault	a40450cba2	AMDGPU: Fix assert when legalizing atomic operands The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140	2015-11-05 02:46:56 +00:00
Matt Arsenault	bed42a7320	AMDGPU: Make addr64 atomic operand order consistent vaddr comes before srsrc in every other MUBUF instruction, and is the order it is printed. llvm-svn: 252139	2015-11-05 02:46:53 +00:00
Matt Arsenault	6c2e200d38	AMDGPU: Fix typo llvm-svn: 252116	2015-11-05 01:03:08 +00:00
Matt Arsenault	aac9b49325	AMDGPU: Make flat_scratch name consistent The printed name and the parsed assembler names weren't the same. I'm not sure which name SC prints these as, but I think it's this one. llvm-svn: 252010	2015-11-03 22:50:34 +00:00
Matt Arsenault	967c2f5dee	AMDGPU: Fix asserts on invalid register ranges If the requested SGPR was not actually aligned, it was accepted and rounded down instead of rejected. Also fix an assert if the range is an invalid size. llvm-svn: 252009	2015-11-03 22:50:32 +00:00
Matt Arsenault	3473c72aab	AMDGPU: Fix off by one error in register parsing If trying to use one past the end, this would assert. llvm-svn: 252008	2015-11-03 22:50:27 +00:00
Matt Arsenault	e8ed13d946	AMDGPU: s[102:103] is unavailable on VI llvm-svn: 252000	2015-11-03 22:39:52 +00:00
Matt Arsenault	192b282bf3	AMDGPU: Define correct number of SGPRs There are actually 104 so 2 were missing. More assembler tests with high register number tuples will be included in later patches. llvm-svn: 251999	2015-11-03 22:39:50 +00:00
Matt Arsenault	6c0674112a	AMDGPU: Make findUsedSGPR more readable Add more comments etc. llvm-svn: 251996	2015-11-03 22:30:15 +00:00
Matt Arsenault	782c03bb7e	AMDGPU: Initialize SIFixSGPRCopies so -print-after works llvm-svn: 251995	2015-11-03 22:30:13 +00:00
Matt Arsenault	d9d659aa23	AMDGPU: Alphabetize includes llvm-svn: 251994	2015-11-03 22:30:08 +00:00
Matthias Braun	93563e7032	ScheduleDAGInstrs: Remove IsPostRA flag; NFC ScheduleDAGInstrs doesn't behave differently before or after register allocation. It was only used in a method of MachineSchedulerBase which behaved differently in MachineScheduler/PostMachineScheduler. Change this to let MachineScheduler/PostMachineScheduler just pass in a parameter to that function. The order of the LiveIntervals* and bool RemoveKillFlags paramters have been switched to make out-of-tree code fail instead of unintentionally passing a value intended for the IsPostRA flag to the (previously following and default initialized) RemoveKillFlags. Differential Revision: http://reviews.llvm.org/D14245 llvm-svn: 251883	2015-11-03 01:53:29 +00:00
Matt Arsenault	f1aebbf33a	AMDGPU: Stop assuming vreg for build_vector This was causing a variety of test failures when v2i64 is added as a legal type. SIFixSGPRCopies should correctly handle the case of vector inputs to a scalar reg_sequence, so this isn't necessary anymore. This was hiding some deficiencies in how reg_sequence is handled later, but this shouldn't be a problem anymore since the register class copy of a reg_sequence is now done before the reg_sequence. llvm-svn: 251860	2015-11-02 23:30:48 +00:00
Matt Arsenault	d48da14269	AMDGPU: Error on graphics shaders with HSA I've found myself pointlessly debugging problems from running graphics tests with an HSA triple a few times, so stop this from happening again. llvm-svn: 251858	2015-11-02 23:23:02 +00:00
Matt Arsenault	0de924b76d	AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE Make the REG_SEQUENCE be a VGPR, and do the register class copy first. llvm-svn: 251855	2015-11-02 23:15:42 +00:00
Marek Olsak	6f6d318e16	AMDGPU/SI: handle undef for llvm.SI.packf16 llvm-svn: 251632	2015-10-29 15:29:09 +00:00
Marek Olsak	74d084f466	AMDGPU/SI: use S_OR for fneg (fabs f32) llvm-svn: 251631	2015-10-29 15:29:05 +00:00
Marek Olsak	f924dd6f3c	AMDGPU/SI: use S_AND for i1 trunc llvm-svn: 251630	2015-10-29 15:05:03 +00:00
Matt Arsenault	2ea0a23f18	AMDGPU: Print modifiers when dumping AMDGPUOperand llvm-svn: 251160	2015-10-24 00:12:56 +00:00
Matt Arsenault	382557ec72	AMDGPU: Fix parsing of 32-bit literals with sign bit set llvm-svn: 251132	2015-10-23 18:07:58 +00:00
Matt Arsenault	391be09ef3	AMDGPU: Fix adding redundant m0 uses BuildMI already adds these since they are defined correctly now. llvm-svn: 250961	2015-10-21 22:37:51 +00:00
Matt Arsenault	e8c0891e42	AMDGPU: Fix verifier error in SIFoldOperands There may be other use operands that also need their kill flags cleared. This happens in a few tests when SIFoldOperands is moved after PeepholeOptimizer. PeepholeOptimizer rewrites cases that look like: %vreg0 = ... %vreg1 = COPY %vreg0 use %vreg1<kill> %vreg2 = COPY %vreg0 use %vreg2<kill> to use the earlier source to %vreg0 = ... use %vreg0 use %vreg0 Currently SIFoldOperands sees the copied registers, so there is only one use. So far I haven't managed to come up with a test that currently has multiple uses of a foldable VGPR -> VGPR copy. llvm-svn: 250960	2015-10-21 22:37:50 +00:00
Matt Arsenault	b6fd98c7d9	AMDGPU: Split DiagnosticInfoUnsupported into its own file llvm-svn: 250959	2015-10-21 22:37:46 +00:00
Matt Arsenault	6005fcbe12	AMDGPU: Simplify VOP3 operand legalization. This was checking for a variety of situations that should never happen. This saves a tiny bit of compile time. We should not be selecting instructions with invalid operands in the first place. Most of the time for registers copys are inserted to the correct operand register class. For VOP3, since all operand types are supported and literal constants never are, we just need to verify the constant bus requirements (all immediates should be legal inline ones). The only possibly tricky case to maybe worry about is if when legalizing operands in moveToVALU with s_add_i32 and similar instructions. If the original s_add_i32 had a literal constant and we need to replace it with v_add_i32_e64 we would have an unsupported literal operand. However, I don't think we should worry about that because SIFoldOperands should handle folding literal constant operands into the SALU instructions based on the uses. At SIFoldOperands time, the legality and profitability of operand types is a bit different. llvm-svn: 250951	2015-10-21 21:51:02 +00:00
Matt Arsenault	e223cebd10	AMDGPU: Fix not checking implicit operands in verifyInstruction When verifying constant bus restrictions, this wasn't catching uses in implicit operands. llvm-svn: 250948	2015-10-21 21:15:01 +00:00
Matt Arsenault	3add6439d0	AMDGPU: Add MachineInstr overloads for instruction format tests llvm-svn: 250797	2015-10-20 04:35:43 +00:00
Matt Arsenault	8f18917a90	AMDGPU: Stop reserving v[254:255] This wasn't doing anything useful. They weren't explicitly used anywhere, and the RegScavenger ignores reserved registers. This for some reason caused a random scheduling change in the test. Getting the check lines to pass is too frustrating, and there's probably not too much value in checking the vector case's operands N times. llvm-svn: 250794	2015-10-20 03:59:58 +00:00
Craig Topper	2626094fa1	Make a bunch of static arrays const. llvm-svn: 250642	2015-10-18 05:15:34 +00:00
Artyom Skrobov	63471330d2	Don't pretend AMDGPU backend knows how to custom-lower UDIVREM for vector types; it can't Reviewers: arsenm, jvesely, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13734 llvm-svn: 250384	2015-10-15 09:18:47 +00:00
Duncan P. N. Exon Smith	a73371a9b7	AMDGPU: Remove implicit ilist iterator conversions, NFC One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new one. Previously, bundle iterators and single-instruction iterators could be compared to each other (comparing on underlying pointers). I changed a comparison from using `MBB->end()` to using `MBB->instr_end()`, since both end iterators should point at the some place anyway. I don't think the implicit conversion between the two iterator types is a good idea since it's fairly easy to accidentally compare to the wrong thing (they aren't always end iterators). Otherwise I would have just added the conversion. Even with that, no there should be functionality change here. llvm-svn: 250218	2015-10-13 20:07:10 +00:00
Matt Arsenault	f0d9e47da2	AMDGPU: Refactor isVGPRToSGPRCopy It should now correctly handle physical registers and make it easier to identify the other direction. llvm-svn: 250132	2015-10-13 00:07:54 +00:00
Matt Arsenault	61dc235f20	DAGCombiner: Combine extract_vector_elt from build_vector This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129	2015-10-12 23:59:50 +00:00
Matt Arsenault	8c0ef8b36d	AMDGPU: Register some more passes so -print-before works llvm-svn: 250071	2015-10-12 17:43:59 +00:00
Justin Bogner	468c998031	CodeGen: print and verify after TargetPassConfig::insertPass by default In r224059, we started verifying after addPass, but missed doing so on insertPass. There isn't a good reason for the discrepancy, and skipping the verifier in these cases causes bugs. This also exposes a verifier error that was introduced in r249087, but the verifier doesn't run until after the register coalescer, when the issue happens to have been resolved. I've skipped the verifier after SIFixSGPRLiveRangesID to avoid the failures for now and will follow up with Matt for a proper fix. llvm-svn: 249643	2015-10-08 00:36:22 +00:00
Matt Arsenault	fc0ad42516	AMDGPU: Fix missing implicit m0 uses on movrel instructions llvm-svn: 249577	2015-10-07 17:46:32 +00:00
Matt Arsenault	10e6a61892	AMDGPU: Add comment for VOP2b operand class Because of the constant bus requirement, it is never legal to use a literal constant for these instructions despite the encoding allowing it. This was already doing the right thing, but note why. llvm-svn: 249500	2015-10-07 01:36:00 +00:00
Matt Arsenault	187276fa94	AMDGPU: Properly register passes llvm-svn: 249495	2015-10-07 00:42:53 +00:00
Matt Arsenault	284192730a	AMDGPU: Use explicit register size indirect pseudos This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494	2015-10-07 00:42:51 +00:00
Matt Arsenault	922b7bf808	AMDGPU: Remove inferRegClassFromUses / inferRegClassFromDefs I'm not sure why this would be necessary, and no tests fail with them removed. Looking at the uses is suspect as well because the use reg classes will likely change when the users are moved as a result of moving this instruction. llvm-svn: 249493	2015-10-07 00:42:31 +00:00
Tom Stellard	0fbf899c0f	AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments() Summary: We currently ignore the calling convention, so there is no real reason to assert on the calling convention of functions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13367 llvm-svn: 249468	2015-10-06 21:16:34 +00:00
Tom Stellard	88e0b25181	AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcp Summary: The assembly printing of these is still missing the encoding size suffix, but this will be fixed in a later commit. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13436 llvm-svn: 249424	2015-10-06 15:57:53 +00:00
Tom Stellard	d585cd85a3	AMDGPU/SI: Add a helper for creating aliases for the _e32 instructions Summary: We are currently only using these aliases for VOPC instructions, but this helper will make it easier to use them everywhere. These aliases allow for the automatic matching of instructions with forced 32-bit encoding. Eventually, we should be able to remove the custom C++ logic we have for this in the assembler. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13396 llvm-svn: 249330	2015-10-05 17:57:39 +00:00
Tom Stellard	dc9088a10e	AMDGPU/SI: Remove unused tablegen multiclass Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13395 llvm-svn: 249221	2015-10-03 00:29:50 +00:00
Matt Arsenault	d092a068ba	AMDGPU/SI: Add verifier check for exec reads Make sure we aren't accidentally not setting these in the instruction definitions. llvm-svn: 249170	2015-10-02 18:58:37 +00:00
Matt Arsenault	b733f00510	AMDGPU: Fix unused variable warning in release build llvm-svn: 249091	2015-10-01 22:40:35 +00:00
Matt Arsenault	b87fc22915	AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc pass Replace LiveInterval usage with LiveVariables. LiveIntervals computes far more information than is needed for this pass which just needs to find if an SGPR is live out of the defining block. LiveIntervals are not usually available that early, requiring computing them twice which is very expensive. The extra run of LiveIntervals/LiveVariables/SlotIndexes was costing in total about 5% of compile time. Continuing to use LiveIntervals is problematic. It seems there is an option (early-live-intervals) to run the analysis about where it should go to avoid recomputing LiveVariables, but it seems to be completely broken with subreg liveness enabled. There are also problems from trying to recompute LiveIntervals since this seems to undo LiveVariables and clearing kill flags, causing TwoAddressInstructions to make bad decisions. Insert the pass right after live variables and preserve it. The tricky case to worry about might be phis since LiveVariables doesn't count a register as live out if in the successor block it is only used in a phi, but I don't think this is a concern right now because SIFixSGPRCopies replaces SGPR phis. llvm-svn: 249087	2015-10-01 22:10:03 +00:00
Matt Arsenault	d2c7589f93	AMDGPU: Merge if and switch llvm-svn: 249082	2015-10-01 21:51:59 +00:00
Matt Arsenault	db7f0ef367	AMDGPU: Remove dead code There's no point in checking VReg_1 because all uses of it should already have been removed by SILowerI1Copies. llvm-svn: 249081	2015-10-01 21:51:57 +00:00
Matt Arsenault	d1d499aa56	AMDGPU: Make SIInsertWaits about a factor of 4 faster This was the slowest target custom pass and was spending 80% of the time in getMinimalPhysRegClass which was called for every register operand. Try to use the statically known register class when possible from the instruction's MCOperandInfo. There are a few pseudo instructions which are not well behaved with unknown register classes which still require the expensive physical register class search. There are a few other possibilities for making this even faster, such as not inspecting implicit operands. For now those are checked because it is technically possible to have a scalar load into exec or vcc which can be implicitly used. llvm-svn: 249079	2015-10-01 21:43:15 +00:00
Tom Stellard	e9f8b24985	AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073	2015-10-01 21:16:05 +00:00
Tom Stellard	e0e582c9aa	AMDGPU: Add MEM_RAT STORE_TYPED. v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042	2015-10-01 17:51:34 +00:00
Tom Stellard	c0f0fba2c4	AMDGPU: Factor out EOP query. v2: Fix brace placement and capitalization (Matt). Patch by: Zoltan Gilian llvm-svn: 249041	2015-10-01 17:51:29 +00:00
Tom Stellard	1f0e7bbc5b	AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init order Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12451 llvm-svn: 248978	2015-10-01 02:02:46 +00:00
Marek Olsak	d1a69a2839	AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set to prevent setting a huge stride, because DATA_FORMAT has a different meaning if ADD_TID_ENABLE is set. This is a candidate for stable llvm 3.7. Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 248858	2015-09-29 23:37:32 +00:00
Matt Arsenault	ba6aae785a	AMDGPU: Factor switch into separate function llvm-svn: 248742	2015-09-28 20:54:57 +00:00
Matt Arsenault	73aa8f687a	AMDGPU: Fix splitting x16 SMRD loads When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741	2015-09-28 20:54:52 +00:00
Matt Arsenault	e5d042cd56	AMDGPU: Fix moving SMRD loads with literal offsets on CI llvm-svn: 248740	2015-09-28 20:54:46 +00:00
Matt Arsenault	dd49c5fc1b	AMDGPU: Fix splitting SMRD with large offset The splitting of > 4 dword SMRD instructions if using an offset in an SGPR instead of an immediate was not setting the destination register, resulting an an instruction missing an operand which would assert later. Test will be included in a following commit which fixes a related issue. llvm-svn: 248739	2015-09-28 20:54:42 +00:00
Andrew Kaylor	16c4da03d5	Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735	2015-09-28 20:33:22 +00:00
Matt Arsenault	1d36b717a5	AMDGPU: Remove hasPostISelHook from most instructions Since this is only needed for VOP3 and a few other special case instructions, stop setting it on everything. llvm-svn: 248657	2015-09-26 05:06:48 +00:00
Matt Arsenault	f32481372c	AMDGPU: Switch over reg class size instead of checking all super classes This gets isSGPRClass out of my profile of SIFixSGPRCopies. llvm-svn: 248656	2015-09-26 04:59:04 +00:00
Matt Arsenault	6e28010215	AMDGPU: Don't handle invalid reg classes in helper functions No tests hit these and it would be better to have checks like this explicit where they are used. llvm-svn: 248655	2015-09-26 04:53:30 +00:00
Saleem Abdulrasool	9174623b2d	AMDGPU: address -Winconsistent-missing-override Add missing override. NFC. llvm-svn: 248652	2015-09-26 04:34:52 +00:00
Matt Arsenault	8e1ddf84fe	AMDGPU: Set CopyCost of register classes These require multiple mov instructions to copy, but the default value is that 1 instruction is needed. I'm not sure if this actually changes anything. llvm-svn: 248651	2015-09-26 04:09:34 +00:00
Matt Arsenault	e98a074c42	AMDGPU: VOP3b definition cleanups llvm-svn: 248647	2015-09-26 02:25:48 +00:00
Matt Arsenault	86095b8dec	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Matt Arsenault	e229c0c45e	AMDGPU: Construct new buffer instruction when moving SMRD It's easier to understand creating a full instruction than the current situation where sometimes a new instruction is created and sometimes it is awkwardly mutated in place. llvm-svn: 248627	2015-09-25 22:21:19 +00:00
Tom Stellard	e135ffd554	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Matt Arsenault	f743b838cb	AMDGPU: Make getNamedOperandIdx declaration readonly This matches how it is defined in the generated implementation. llvm-svn: 248598	2015-09-25 18:09:15 +00:00
Matt Arsenault	0a10900070	AMDGPU: Disable some passes that are not meaningful Don't run passes related to stack maps, garbage collection, exceptions since these aren't useful for GPUs. There might be a few more to turn off that I'm less sure about (e.g. ShrinkWrapping) or I'm not sure how to disable (SafeStack and StackProtector) llvm-svn: 248591	2015-09-25 17:41:20 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	0cb8517dc6	AMDGPU: Fix recomputing dominator tree unnecessarily SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587	2015-09-25 17:21:28 +00:00
Matt Arsenault	2d6fdb8495	AMDGPU: Re-justify workaround and fix worked around problem When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585	2015-09-25 17:08:42 +00:00
Matt Arsenault	3ad55ec946	AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources This avoids needting to re-legalize the new REG_SEQUENCE. llvm-svn: 248584	2015-09-25 17:08:40 +00:00
Matt Arsenault	6525aa3529	AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos This was only set on the final _si/_vi version, but not on the pseudos most of codegen sees. No test since these instructions aren't used yet. llvm-svn: 248583	2015-09-25 16:58:27 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
Matt Arsenault	8aa9973696	AMDGPU: Remove unused includes llvm-svn: 248553	2015-09-25 00:28:43 +00:00
Matt Arsenault	e66621b306	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	d6adfb401c	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Matt Arsenault	68d938649e	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	e068f9a263	AMDGPU: Return after instruction is processed. llvm-svn: 248476	2015-09-24 07:51:28 +00:00
Matt Arsenault	708586faa2	AMDGPU: Remove another unnecessary check from commuteInstruction llvm-svn: 248475	2015-09-24 07:51:25 +00:00
Matt Arsenault	fa242960fc	AMDGPU: Add readonly to InstrMapping functions llvm-svn: 248474	2015-09-24 07:51:23 +00:00
Matt Arsenault	cab64f1c75	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	c8e2ce4046	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
NAKAMURA Takumi	0a7d0ad95f	Untabify. llvm-svn: 248264	2015-09-22 11:15:07 +00:00
NAKAMURA Takumi	a9cb538a74	Reformat blank lines. llvm-svn: 248263	2015-09-22 11:14:39 +00:00
NAKAMURA Takumi	84965031a7	Reformat comment lines. llvm-svn: 248262	2015-09-22 11:14:12 +00:00
Matt Arsenault	f11e7489e1	AMDGPU: Remove unnecessary check If the instruction doesn't have enough operands, it either shouldn't be marked as isCommutable or is malformed. llvm-svn: 248242	2015-09-22 04:17:45 +00:00
Matt Arsenault	85441dd724	AMDGPU: Move copy handling under switch like other instructions llvm-svn: 248172	2015-09-21 16:27:22 +00:00
Craig Topper	0013be16ff	Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type extra times. NFC llvm-svn: 248140	2015-09-21 05:32:41 +00:00
Craig Topper	4e9b03d6f9	Don't pass StringRefs around by const reference. Pass by value instead per coding standards. NFC llvm-svn: 248136	2015-09-21 00:18:00 +00:00
Matt Arsenault	1fafdc82d6	AMDGPU: Remove dead code getCFGStructurizerRegClass is not used for SI, so move it into R600 specific stuff. llvm-svn: 248087	2015-09-19 06:41:10 +00:00
Eric Christopher	a4e5d3cf8e	constify the Function parameter to the TTI creation callback and propagate to all callers/users/etc. llvm-svn: 247864	2015-09-16 23:38:13 +00:00
Sanjay Patel	a260701bbb	propagate fast-math-flags on DAG nodes After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815	2015-09-16 16:31:21 +00:00
Daniel Sanders	50f17235dd	Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Eric has replied and has demanded the patch be reverted. llvm-svn: 247702	2015-09-15 16:17:27 +00:00
Daniel Sanders	153010c52d	Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Thanks go to Pavel Labath for fixing LLDB for me. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247692	2015-09-15 14:08:28 +00:00
Daniel Sanders	c40de48041	Revert r247684 - Replace Triple with a new TargetTuple ... LLDB needs to be updated in the same commit. llvm-svn: 247686	2015-09-15 13:46:21 +00:00
Daniel Sanders	18d4b0dab7	Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247683	2015-09-15 13:17:40 +00:00
Bruce Mitchener	e9ffb45b60	Fix typos. Summary: This fixes a variety of typos in docs, code and headers. Subscribers: jholewinski, sanjoy, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12626 llvm-svn: 247495	2015-09-12 01:17:08 +00:00
Cong Hou	c536bd9e73	Pass BranchProbability/BlockMass by value instead of const& as they are small. NFC. llvm-svn: 247357	2015-09-10 23:10:42 +00:00
Matt Arsenault	e0b44040aa	AMDGPU: Simplify debug printing llvm-svn: 247345	2015-09-10 21:51:19 +00:00
Matt Arsenault	57116cce19	AMDGPU: Use StringRef value llvm-svn: 247344	2015-09-10 21:51:15 +00:00
Matt Arsenault	80f766a032	AMDGPU/SI: Fix more cases of losing exec operands llvm-svn: 247230	2015-09-10 01:23:28 +00:00
Matt Arsenault	ad46e0c1ab	AMDGPU/SI: Fix creating v_mov_b32s without exec uses This will be caught by existing tests with a verifier check to be added in a future commit. llvm-svn: 247229	2015-09-10 01:06:06 +00:00
Matt Arsenault	ef67d76869	AMDGPU: Extract full 64-bit subregister and use subregs Instead of extracting both 32-bit components from the 128-bit register. This produces fewer copies and is easier for the copy peephole optimizer to understand and see the actual uses as extracts from a reg_sequence. This avoids needing to handle subregister composing in the PeepholeOptimizer's ValueTracker for this case. llvm-svn: 247162	2015-09-09 17:03:29 +00:00
Matt Arsenault	b5541fb098	AMDGPU: Remove unused multiclass argument llvm-svn: 247161	2015-09-09 17:03:18 +00:00
Tom Stellard	9a197676b1	AMDGPU/SI: Fold operands through REG_SEQUENCE instructions Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157	2015-09-09 15:43:26 +00:00
Matt Arsenault	d768737454	AMDGPU: Fix not encoding src2 of VOP3b instructions Broken by r247074. Should include an assembler test, but the assembler is currently broken for VOP3b apparently. llvm-svn: 247123	2015-09-09 08:39:49 +00:00
Matt Arsenault	acd68b58ae	SelectionDAG: Support Expand of f16 extloads Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113	2015-09-09 01:12:27 +00:00
Matt Arsenault	86d336e91b	AMDGPU/SI: Fix input vcc operand for VOP2b instructions Adds vcc to output string input for e32. Allows option of using e64 encoding with assembler. Also fixes these instructions not implicitly reading exec. llvm-svn: 247074	2015-09-08 21:15:00 +00:00
Matt Arsenault	8ac35cd031	AMDGPU: Mark s_barrier as a high latency instruction These were marked as WriteSALU, which is low latency. I'm guessing at the value to use, but it should probably be considered the highest latency instruction. I'm not sure this has any actual effect since hasSideEffects probably is preventing any moving of these. llvm-svn: 247060	2015-09-08 19:54:32 +00:00
Matt Arsenault	8fb810a1d2	AMDGPU: Fix s_barrier flags This should be convergent. This is not a barrier in the isBarrier sense, nor hasCtrlDep. llvm-svn: 247059	2015-09-08 19:54:25 +00:00
Matt Arsenault	966a94f861	AMDGPU: Handle sub of constant for DS offset folding sub C, x - > add (sub 0, x), C for DS offsets. This is mostly to fix regressions that show up when SeparateConstOffsetFromGEP is enabled. llvm-svn: 247054	2015-09-08 19:34:22 +00:00
Sanjay Patel	ce74db9d8d	check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771	2015-09-03 15:03:19 +00:00
Matt Arsenault	51d2d0f668	AMDGPU: Fix adding redundant implicit operands These are already added during the MachineInstr construction, so this was adding the implicit registers twice. llvm-svn: 246525	2015-09-01 02:02:21 +00:00
Matt Arsenault	e4d0c142e8	AMDGPU: Add sdst operand to VOP2b instructions The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358	2015-08-29 07:16:50 +00:00
Matt Arsenault	9a32cd3d3b	AMDGPU: Set mem operands for spill instructions llvm-svn: 246357	2015-08-29 06:48:57 +00:00
Matt Arsenault	5c004a7c61	AMDGPU: Fix dropping mem operands when moving to VALU Without a memory operand, mayLoad or mayStore instructions are treated as hasUnorderedMemRef, which results in much worse scheduling. We really should have a verifier check that any non-side effecting mayLoad or mayStore has a memory operand. There are a few instructions (interp and images) which I'm not sure what / where to add these. llvm-svn: 246356	2015-08-29 06:48:46 +00:00
Tom Stellard	eea72ccbf2	AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediates Summary: We were assuming tha if the use operand had a sub-register that the immediate was 64-bits, but this was breaking the case of folding a 64-bit immediate into another 64-bit instruction. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12255 llvm-svn: 246354	2015-08-29 01:58:21 +00:00
Tom Stellard	b8ce14c4c3	AMDGPU/SI: Factor operand folding code into its own function Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12254 llvm-svn: 246353	2015-08-28 23:45:19 +00:00
Matt Arsenault	8a067121f8	AMDGPU: Delete dead code There is no context where s_mov_b64 is emitted and could potentially be moved to the VALU. It is currently only emitted for materializing immediates, which can't be dependent on vector sources. The immediate splitting is already done when selecting constants. I'm not sure what contexts if any the register splitting would have been used before. Also clean up using s_mov_b64 in place of v_mov_b64_pseudo, although this isn't required and just skips the extra step of eliminating the copy from the SReg_64. llvm-svn: 246080	2015-08-26 20:48:08 +00:00
Matt Arsenault	5e7f95e567	AMDGPU: Don't reprocess instructions when splitting i64 bcnt llvm-svn: 246079	2015-08-26 20:48:04 +00:00
Matt Arsenault	445833cc91	AMDGPU: Fix not moving users of s_bfe_i64 to VALU This wouldn't propagate to users of the original BFE and would hit a verifier error. llvm-svn: 246078	2015-08-26 20:47:58 +00:00
Matt Arsenault	f003c38e1e	AMDGPU: Don't create intermediate SALU instructions When splitting 64-bit operations, create the correct VALU instructions immediately. This was splitting things like s_or_b64 into the two s_or_b32s and then pushing the new instructions onto the worklist. There's no reason we need to do this intermediate step. llvm-svn: 246077	2015-08-26 20:47:50 +00:00
Matt Arsenault	602a16d3db	AMDGPU/SI: Report SIFixSGPRLiveRanges changed function llvm-svn: 246056	2015-08-26 19:12:03 +00:00
Matt Arsenault	bd66061db7	AMDGPU: Make sure to reserve super registers I think this could potentially have broken if one of the super registers were allocated that contain v254/v255. llvm-svn: 246051	2015-08-26 18:54:50 +00:00
Matt Arsenault	19c5488015	AMDGPU: Produce error on dynamic_stackalloc llvm-svn: 246048	2015-08-26 18:37:13 +00:00
Matt Arsenault	0a3ac1be43	AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775	2015-08-22 00:54:31 +00:00
Matt Arsenault	e8df879948	AMDGPU: Improve accuracy of instruction rates for some FP instructions llvm-svn: 245774	2015-08-22 00:50:41 +00:00
Matt Arsenault	33010103b7	AMDGPU: Use DFS to avoid second loop over function llvm-svn: 245772	2015-08-22 00:43:38 +00:00
Matt Arsenault	c8d8e4ed76	AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges llvm-svn: 245769	2015-08-22 00:19:34 +00:00
Matt Arsenault	aba29d6ab1	AMDGPU: Improve debug printing in SIFixSGPRLiveRanges llvm-svn: 245768	2015-08-22 00:19:25 +00:00
Matt Arsenault	6adf07a92e	AMDGPU: Move CI instructions into CIInstructions.td There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767	2015-08-22 00:16:34 +00:00
Matt Arsenault	f56872dc30	AMDGPU: Minor cleanups to help with f16 support The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764	2015-08-21 23:49:51 +00:00
Tom Stellard	bd8a0856e2	AMDGPU/SI: Better handle s_wait insertion We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755	2015-08-21 22:47:27 +00:00
Michael Kuperstein	dcdab4cd3a	[TLI] Refactor "is integer division cheap" queries. This removes the isPow2SDivCheap() query, as it is not currently used in any meaningful way. isIntDivCheap() no longer relies on a state variable (as all in-tree target set it to false), but the interface allows querying based on the type optimization level. NFC. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245430	2015-08-19 11:17:59 +00:00
Matthias Braun	d55bcf2646	MachineRegisterInfo: Introduce isPhysRegUsed() This method checks whether a physical regiser or any of its aliases are used in the function. Using this function in SIRegisterInfo::findUnusedReg() should also fix this reported failure: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html http://reviews.llvm.org/rL242173#inline-533 The report doesn't come with a testcase and I don't know enough about AMDGPU to create one myself. llvm-svn: 245329	2015-08-18 18:54:27 +00:00
Yaron Keren	178c465223	Add missing include guard. llvm-svn: 245173	2015-08-16 07:55:08 +00:00
Matt Arsenault	588732bd6e	AMDGPU/SI: Only look at live out SGPR defs When trying to fix SGPR live ranges, skip defs that are killed in the same block as the def. I don't think we need to worry about these cases as long as the live ranges of the SGPRs in dominating blocks are correct. This reduces the number of elements the second loop over the function needs to look at, and makes it generally easier to understand. The second loop also only considers if the live range is live in to a block, which logically means it must have been live out from another. llvm-svn: 245150	2015-08-15 02:58:49 +00:00
James Y Knight	5567bafe93	Remove redundant TargetFrameLowering::getFrameIndexOffset virtual function. This was the same as getFrameIndexReference, but without the FrameReg output. Differential Revision: http://reviews.llvm.org/D12042 llvm-svn: 245148	2015-08-15 02:32:35 +00:00
Matt Arsenault	297ae311ce	AMDGPU/SI: Fix printing useless info with amdhsa The comments at the bottom would all report 0 if amdhsa was used. llvm-svn: 245135	2015-08-15 00:12:39 +00:00
Matt Arsenault	0259a7aa41	AMDGPU/SI: Update LiveVariables This is simple but won't work if/when this pass is moved to be post-SSA. llvm-svn: 245134	2015-08-15 00:12:37 +00:00
Matt Arsenault	670ba46efe	AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges Does not mark SlotIndexes as reserved, although I think that might be OK. LiveVariables still need to be handled. llvm-svn: 245133	2015-08-15 00:12:35 +00:00
Matt Arsenault	b75233235c	AMDGPU: Remove unnecessary assert These shouldn't ever be null. The number of successors was already asserted to be 2. llvm-svn: 245132	2015-08-15 00:12:32 +00:00
Matt Arsenault	4275c29a02	AMDGPU/SI: Make comments more precise. True branch instructions do behave as expected with liveness. Avoid the phrasing "branch decision is based on a value in an SGPR" because this could be misleading. A VALU compare instruction's result is still based on an SGPR, even though that condition may be divergent. llvm-svn: 245131	2015-08-15 00:12:30 +00:00
Tom Stellard	bef1094ee7	AMDGPU/SI: Add missing spill class The compiler was failing to spill for some shaders. Patch By: Axel Davy llvm-svn: 245087	2015-08-14 19:46:05 +00:00
Simon Pilgrim	7218251861	[AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect. Differential Revision: http://reviews.llvm.org/D12007 llvm-svn: 244960	2015-08-13 21:40:02 +00:00
Yaron Keren	556b21aa10	Remove and forbid raw_svector_ostream::flush() calls. After r244870 flush() will only compare two null pointers and return, doing nothing but wasting run time. The call is not required any more as the stream and its SmallString are always in sync. Thanks to David Blaikie for reviewing. llvm-svn: 244928	2015-08-13 18:12:56 +00:00
Matt Arsenault	c574686529	AMDGPU: Fix assert on dbg_value instructions llvm-svn: 244728	2015-08-12 09:04:44 +00:00
Alex Lorenz	e40c8a2b26	PseudoSourceValue: Replace global manager with a manager in a machine function. This commit removes the global manager variable which is responsible for storing and allocating pseudo source values and instead it introduces a new manager class named 'PseudoSourceValueManager'. Machine functions now own an instance of the pseudo source value manager class. This commit also modifies the 'get...' methods in the 'MachinePointerInfo' class to construct pseudo source values using the instance of the pseudo source value manager object from the machine function. This commit updates calls to the 'get...' methods from the 'MachinePointerInfo' class in a lot of different files because those calls now need to pass in a reference to a machine function to those methods. This change will make it easier to serialize pseudo source values as it will enable me to transform the mips specific MipsCallEntry PseudoSourceValue subclass into two target independent subclasses. Reviewers: Akira Hatanaka llvm-svn: 244693	2015-08-11 23:09:45 +00:00
Benjamin Kramer	df005cbe19	Fix some comment typos. llvm-svn: 244402	2015-08-08 18:27:36 +00:00
Tom Stellard	30cf77457d	AMDGPU/SI: Another attempt to fix Windows bots broken by r244372 llvm-svn: 244383	2015-08-08 01:11:07 +00:00
Matt Arsenault	cbd753761a	AMDGPU: Implement AMDGPUOperand::print() llvm-svn: 244381	2015-08-08 00:41:51 +00:00
Matt Arsenault	4635915504	AMDGPU/SI: Remove VCCReg llvm-svn: 244380	2015-08-08 00:41:48 +00:00
Matt Arsenault	6942d1a034	AMDGPU/SI: Remove source uses of VCCReg llvm-svn: 244379	2015-08-08 00:41:45 +00:00
Tom Stellard	fc70950bf2	AMDGPU/SI: Attempt to fix Windows bots broken by r244372 llvm-svn: 244376	2015-08-08 00:17:59 +00:00
Tom Stellard	fd25395c72	AMDGPU: Add pass to lower OpenCL image and sampler arguments. The pass adds new kernel arguments for image attributes, and resolves calls to dummy attribute and resource id getter functions. Patch by: Zoltan Gilian llvm-svn: 244372	2015-08-07 23:19:30 +00:00
Tom Stellard	8ebad11ee9	AMDGPU/SI: Use InstAlias instead of MnemonicAlias for VOPC instructions Summary: With InstAlias, we don't need to print the _e32 portion of the mnemonic when we print the $dst operand. This change makes it possible to include vcc in the asm string when we switch VOPC over to having implicit vcc defs. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11813 llvm-svn: 244362	2015-08-07 22:00:56 +00:00
Matt Arsenault	711b390a7c	AMDGPU: Assume SMRD access for constant address space Since r243294 these are selected to SMRD and moved later if required. llvm-svn: 244354	2015-08-07 20:18:34 +00:00
Tom Stellard	c8733e805e	AMDGPU/SI: Use correct encoding of vopc for VI in the assembler Summary: We were using the SI encoding for VI. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11812 llvm-svn: 244332	2015-08-07 16:45:33 +00:00
Tom Stellard	85656cabfb	AMDGPU/SI: v_mac_legacy_f32 does not exist on VI Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11810 llvm-svn: 244322	2015-08-07 15:34:30 +00:00
Tom Stellard	11f19f78f0	AMDGPU/SI: Remove unused outs parameter from VOPC TableGen classes Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11809 llvm-svn: 244321	2015-08-07 15:34:27 +00:00
Tom Stellard	d488605ed3	AMDGPU/SI: Add Fiji support Patch by: Alex Deucher llvm-svn: 244255	2015-08-06 19:43:02 +00:00
Tom Stellard	217361c33f	AMDGPU/SI: Add support for 32-bit immediate SMRD offsets on CI Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11604 llvm-svn: 244254	2015-08-06 19:28:38 +00:00
Tom Stellard	dee26a2876	AMDGPU/SI: Use ComplexPatterns for SMRD addressing modes Summary: This allows us to consolidate several of the TableGen patterns. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11602 llvm-svn: 244253	2015-08-06 19:28:30 +00:00
Matt Arsenault	95f0606e62	AMDGPU/SI: Remove EXECReg For the same reasons as the other physical registers. llvm-svn: 244062	2015-08-05 16:42:57 +00:00
Matt Arsenault	4c0487bff6	AMDGPU: Remove SCCReg. These should be handled as a physical register rather than a virtual register class with one member. llvm-svn: 244061	2015-08-05 16:42:54 +00:00
Craig Topper	e3dcce9700	De-constify pointers to Type since they can't be modified. NFC This was already done in most places a while ago. This just fixes the ones that crept in over time. llvm-svn: 243842	2015-08-01 22:20:21 +00:00
Alex Lorenz	b4d0d6a345	AMDGPU/SI: Add implicit register operands in the correct order. This commit fixes a bug in the class 'SIInstrInfo' where the implicit register machine operands were added to a machine instruction in an incorrect order - the implicit uses were added before the implicit defs. I found this bug while working on moving the implicit register operand verification code from the MIR parser to the machine verifier. This commit also makes the method 'addImplicitDefUseOperands' in the machine instruction class public so that it can be reused in the 'SIInstrInfo' class. Reviewers: Matt Arsenault Differential Revision: http://reviews.llvm.org/D11689 llvm-svn: 243799	2015-07-31 23:30:09 +00:00
Matt Arsenault	e1ce344b5a	AMDGPU: Fix v16i32 to v16i8 truncstore llvm-svn: 243731	2015-07-31 04:12:04 +00:00
Matt Arsenault	ba01337942	AMDGPU/SI: Set DwarfRegNum This requires a fix in tablegen for the cast<int> from bits<16> to work in the list initializer. llvm-svn: 243723	2015-07-31 01:12:10 +00:00
Tom Stellard	82325598c3	AMDGPU/SI: Remove unused pattern for f32 constant loads Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11603 llvm-svn: 243719	2015-07-31 01:02:32 +00:00
Matt Arsenault	7a0c3a92c0	AMDGPU: Set SubRegIndex size and offset I'm not sure what reasons the comment here could have had for not setting these. Without these set, there is an assertion hit during DWARF emission. llvm-svn: 243661	2015-07-30 17:03:11 +00:00
Matt Arsenault	b39e858356	AMDGPU: Fix unreachable when emitting binary debug info Copy implementation of applyFixup from AArch64 with AArch64 bits ripped out. Tests will be included with a later commit. Several other problems must be fixed before binary debug info emission will work. llvm-svn: 243660	2015-07-30 17:03:08 +00:00
Tom Stellard	4229aa942d	AMDGPU/SI: Simplify moveSMRDToVALU() Summary: Replace the switch on instruction opcode with a switch on register size. This way we don't need to update the switch statement when we add new SMRD variants. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11601 llvm-svn: 243652	2015-07-30 16:20:42 +00:00
Tom Stellard	9d74076065	AMDGPU/SI: Remove isTriviallyReMaterializable() function from SIInstrInfo Summary: This function is never called. isReallyTriviallyReMaterializable() is the function that should be implemented instead. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11620 llvm-svn: 243651	2015-07-30 16:20:40 +00:00
Nick Lewycky	c3890d2969	Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.) Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h. llvm-svn: 243585	2015-07-29 22:32:47 +00:00
Alex Lorenz	d8a1e542ab	Fix broken ArrayRef conversion from r243497. llvm-svn: 243501	2015-07-28 23:34:27 +00:00
Alex Lorenz	ef5c196fb0	MIR Serialization: Serialize the target index machine operands. Reviewers: Duncan P. N. Exon Smith llvm-svn: 243497	2015-07-28 23:02:45 +00:00
Matt Arsenault	7227cc1a48	AMDGPU: Don't try to use LDS/vector for private if pointer value stored If the pointer is the store's value operand, this would produce a broken module. Make sure the use is actually for the pointer operand. llvm-svn: 243462	2015-07-28 18:47:00 +00:00
Matt Arsenault	fdcd39a8ad	AMDGPU: Fix crash if called function is a bitcast getCalledFunction() is null, so this would crash. Replace crash with an error on unsupported call. llvm-svn: 243461	2015-07-28 18:29:14 +00:00
Matt Arsenault	916cea5682	AMDGPU: Fix return type of getImplicitParameterOffset. Patch by Zoltan Gilian <zoltan.gilian@gmail.com> llvm-svn: 243459	2015-07-28 18:09:55 +00:00
Colin LeMahieu	fe2c8b8015	[llvm-mc] Pushing plumbing through for --fatal-warnings flag. llvm-svn: 243334	2015-07-27 21:56:53 +00:00
Marek Olsak	93df060871	AMDGPU: don't match vgpr loads for constant loads Author: Dave Airlie <airlied@redhat.com> In order to implement indirect sampler loads, we don't want to match on a VGPR load but an SGPR one for constants, as we cannot feed VGPRs to the sampler only SGPRs. this should be applicable for llvm 3.7 as well. llvm-svn: 243294	2015-07-27 18:16:08 +00:00
Marek Olsak	1354b87695	AMDGPU/SI: Fix the V_FRACT_F64 SI bug workaround This is a candidate for 3.7. llvm-svn: 243263	2015-07-27 11:37:42 +00:00
Chandler Carruth	96ada25bf3	[PM/AA] Remove all of the dead AliasAnalysis pointers being threaded through APIs that are no longer necessary now that the update API has been removed. This will make changes to the AA interfaces significantly less disruptive (I hope). Either way, it seems like a really nice cleanup. llvm-svn: 242882	2015-07-22 09:52:54 +00:00
Matt Arsenault	f849bb49cc	AMDGPU: Set isMoveImm on s_movk_i32 llvm-svn: 242747	2015-07-21 00:40:08 +00:00
Tom Stellard	70580f83cc	AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory ops Summary: The MUBUF addr64 bit has been removed on VI, so we must use FLAT instructions when the pointer is stored in VGPRs. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11067 llvm-svn: 242673	2015-07-20 14:28:41 +00:00
Simon Pilgrim	ba51d116c4	Remove TargetInstrInfo::canFoldMemoryOperand canFoldMemoryOperand is not actually used anywhere in the codebase - all existing users instead call foldMemoryOperand directly when they wish to fold and can correctly deduce what they need from the return value. This patch removes the canFoldMemoryOperand base function and the target implementations; only x86 had a real (bit-rotted) implementation, although AMDGPU had a preparatory stub that had never needed to be completed. Differential Revision: http://reviews.llvm.org/D11331 llvm-svn: 242638	2015-07-19 10:50:53 +00:00
Tom Stellard	78655fcfdc	AMDPGU/SI: Negative offsets aren't allowed in MUBUF's vaddr operand Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11226 llvm-svn: 242434	2015-07-16 19:40:09 +00:00
Tom Stellard	c98ee20328	AMDPGU/SI: Use AssertZext node to mask high bit for scratch offsets Summary: We can safely assume that the high bit of scratch offsets will never be set, because this would require at least 128 GB of GPU memory. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11225 llvm-svn: 242433	2015-07-16 19:40:07 +00:00
Tom Stellard	ff5efb8c03	AMDGPU/R600: Remove unused variable This fixes a warning introduced by r242410. llvm-svn: 242412	2015-07-16 16:13:34 +00:00
Tom Stellard	1d46fb2d2f	AMDPGU/R600: Replace llvm_unreachable() call with LLVMContext::emitError() Summary: This fixes an issue on MIPS where the infinite-loop-evergreen.ll test was failing to terminate. Fixes PR24147. Reviewers: arsenm, dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11260 llvm-svn: 242410	2015-07-16 15:38:29 +00:00
Mehdi Amini	e029eae634	Add missing break in switch case in R600ISelLowering Summary: Catched by coverity. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11120 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 242388	2015-07-16 06:23:12 +00:00
Pete Cooper	65c69407c8	Add allnodes() iterator range to SelectionDAG. NFC. SelectionDAG already had begin/end methods for iterating over all the nodes, but didn't define an iterator_range for us in foreach loops. This adds such a method and uses it in some of the eligible places throughout the backends. llvm-svn: 242212	2015-07-14 22:10:54 +00:00
Matt Arsenault	24692118ba	AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32) This can be done only with moves which theoretically will optimize better later. Although this transform increases the instruction count, it should be code size / cycle count neutral in the worst VALU case. It also seems to slightly improve a couple of testcases due to other DAG combines this exposes. This is probably slightly worse for the SALU case, so it might be better to handle this during moveToVALU, although then you lose some simplifications like the load width reducing in the simple testcase. llvm-svn: 242177	2015-07-14 18:20:33 +00:00
Matt Arsenault	84db5d97b0	AMDGPU/SI: Fix read2 merging into a super register. If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move. Also remove the assert that offset1 > offset0. There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it. llvm-svn: 242174	2015-07-14 17:57:36 +00:00
Matthias Braun	9912bb817c	MachineRegisterInfo: Remove UsedPhysReg infrastructure We have a detailed def/use lists for every physical register in MachineRegisterInfo anyway, so there is little use in maintaining an additional bitset of which ones are used. Removing it frees us from extra book keeping. This simplifies VirtRegMap. Differential Revision: http://reviews.llvm.org/D10911 llvm-svn: 242173	2015-07-14 17:52:07 +00:00
Tom Stellard	e48fe2a27a	AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11061 llvm-svn: 242146	2015-07-14 14:15:03 +00:00
Matt Arsenault	ca95d44110	AMDGPU: Minor cleanups to always inline pass llvm-svn: 242053	2015-07-13 19:08:36 +00:00
Tom Stellard	db5a11f698	AMDGPU/SI: Select mad patterns to v_mac_f32 The two-address instruction pass will convert these back to v_mad_f32 if necessary. Differential Revision: http://reviews.llvm.org/D11060 llvm-svn: 242038	2015-07-13 15:47:57 +00:00
Matt Arsenault	cf13d18730	AMDGPU: Fix chains for memory ops dependent on argument loads Most loads and stores are derived from pointers derived from a kernel argument load inserted during argument lowering. This was just using the EntryToken chain for the argument loads, and any users of these loads were also on the EntryToken chain. Return the chain of the lowered argument load so that dependent loads end up on the correct chain. No test since I'm not aware of any case where this actually broke. llvm-svn: 241960	2015-07-10 22:51:36 +00:00
Duncan P. N. Exon Smith	754e21f244	MC: Remove MCSubtargetInfo() default constructor Force all creators of `MCSubtargetInfo` to immediately initialize it, merging the default constructor and the initializer into an initializing constructor. Besides cleaning up the code a little, this makes it clear that the initializer is never called again later. Out-of-tree backends need a trivial change: instead of calling: auto *X = new MCSubtargetInfo(); InitXYZMCSubtargetInfo(X, ...); return X; they should call: return createXYZMCSubtargetInfoImpl(...); There's no real functionality change here. llvm-svn: 241957	2015-07-10 22:43:42 +00:00
Matt Arsenault	0d5197380c	AMDGPU: Use requested chain when lowering arguments No test since I'm not aware of any case where this will end up being a different chain. llvm-svn: 241954	2015-07-10 22:28:41 +00:00
Tom Stellard	dcb9f0907f	AMDGPU: Add helper function for implicit parameter offsets. Patch by: Zoltan Gilian llvm-svn: 241861	2015-07-09 21:20:37 +00:00
Matt Arsenault	8b03e6c164	AMDGPU/R600: Return correct chain when lowering loads The other LowerLOAD should be returning the correct chain. llvm-svn: 241839	2015-07-09 18:47:03 +00:00
Tom Stellard	ab6e9c0f94	AMDGPU/SI: The SIShrinkInstructions pass should only fold immediates with one use This is convered by existing testcases and will be exposed by a future commit. llvm-svn: 241817	2015-07-09 16:30:36 +00:00
Tom Stellard	9ebf7ca2f0	AMDGPU/SI: Fix crash on physical registers in SIInstrInfo::isOperandLegal() No test case for this. I ran into it while working on some improvements to SIShrinkInstructions.cpp. llvm-svn: 241816	2015-07-09 16:30:27 +00:00
Mehdi Amini	eaabc51e78	Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user A documentation for this function would be nice by the way. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241807	2015-07-09 15:12:23 +00:00
Mehdi Amini	a749f2ad47	Remove getDataLayout() from TargetLowering Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11042 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241779	2015-07-09 02:09:52 +00:00
Mehdi Amini	0cdec1e2ab	Make isLegalAddressingMode() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778	2015-07-09 02:09:40 +00:00
Mehdi Amini	9639d650bb	Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776	2015-07-09 02:09:20 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Mehdi Amini	5010ebf181	Make TargetTransformInfo keeping a reference to the Module DataLayout DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774	2015-07-09 02:08:42 +00:00
Matt Arsenault	db7781c6e9	AMDGPU: Run SIInsertWaits as pre-emit pass Running this after the scheduler enables scheduling waits later so other ALU instructions can run while this would be waiting. When combined with enabling the post-RA scheduler, this gives about a ~20% improvement on sgemm. llvm-svn: 241473	2015-07-06 17:02:20 +00:00
Daniel Sanders	f423f5627c	Change the last few internal StringRef triples into Triple objects. Summary: This concludes the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. At this point, the StringRef-form of GNU Triples should only be used in the public API (including IR serialization) and a couple objects that directly interact with the API (most notably the Module class). The next step is to replace these Triple objects with the TargetTuple object that will represent our authoratative/unambiguous internal equivalent to GNU Triples. Reviewers: rengolin Subscribers: llvm-commits, jholewinski, ted, rengolin Differential Revision: http://reviews.llvm.org/D10962 llvm-svn: 241472	2015-07-06 16:56:07 +00:00
Matt Arsenault	706f930b72	AMDGPU/SI: Add debugging subtarget feature for DS offsets We don't have a good way to detect most situations where DS offsets are usable on SI, so add an option to force using them even if unsafe for debugging performance problems. llvm-svn: 241462	2015-07-06 16:01:58 +00:00
Benjamin Kramer	9bfb627a0e	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Matt Arsenault	24e33d10a0	AMDGPU: Fix indentation of switch llvm-svn: 241380	2015-07-03 23:33:38 +00:00
Ranjeet Singh	86ecbb7b54	Reverting r241058 because it's causing buildbot failures. llvm-svn: 241061	2015-06-30 12:32:53 +00:00
Ranjeet Singh	5b119091a1	There are a few places where subtarget features are still represented by uint64_t, this patch replaces these usages with the FeatureBitset (std::bitset) type. Differential Revision: http://reviews.llvm.org/D10542 llvm-svn: 241058	2015-06-30 11:30:42 +00:00
Matt Arsenault	8ebce8f12b	AMDGPU/SI: Fix extra space when printing v_div_fmas_* llvm-svn: 240911	2015-06-28 18:16:14 +00:00
Tom Stellard	4694ed0a14	AMDPGU/SI: Use correct resource descriptors for VI on HSA Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D10777 llvm-svn: 240841	2015-06-26 21:58:42 +00:00
Tom Stellard	ff7416ba06	AMDGPU/SI: Update amd_kernel_code_t definition and add assembler support Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10772 llvm-svn: 240839	2015-06-26 21:58:31 +00:00
Tom Stellard	833ae4fadd	AMDGPU/SI: Remove unused variable This should fix some bots that were broken by r240831. llvm-svn: 240838	2015-06-26 21:58:26 +00:00
Tom Stellard	91efe9cebe	AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSA Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10708 llvm-svn: 240832	2015-06-26 21:15:11 +00:00
Tom Stellard	347ac79b15	AMDGPU/SI: Add hsa code object directives Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10757 llvm-svn: 240831	2015-06-26 21:15:07 +00:00
Tom Stellard	b5798b09d3	AMDGPU/SI: There are no implicit kernel args in the amdhsa ABI Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10706 llvm-svn: 240830	2015-06-26 21:15:03 +00:00
Tom Stellard	f151a45ccd	AMDGPU/SI: Emit amd_kernel_code_t in EmitFunctionBodyStart() Summary: This way the function symbol points to the start of amd_kernel_code_t rather than the start of the function. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10705 llvm-svn: 240829	2015-06-26 21:14:58 +00:00
Marek Olsak	cfbdba2d0b	AMDGPU: really don't commute REV opcodes if the target variant doesn't exist If pseudoToMCOpcode failed, we would return the original opcode, so operands would be swapped, but the instruction would remain the same. It resulted in LSHLREV a, b ---> LSHLREV b, a. This fixes Glamor text rendering and piglit/arb_sample_shading-builtin-gl-sample-mask on VI. This is a candidate for stable branches. v2: the test was simplified by Tom Stellard llvm-svn: 240824	2015-06-26 20:29:10 +00:00
Benjamin Kramer	e61cbd1f3a	Replace copy-pasted debug value skipping with MBB::getLastNonDebugInstr No functional change intended. llvm-svn: 240639	2015-06-25 13:28:24 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Matt Arsenault	0b554ed364	AMDGPU: Use getAsInteger instead of atoi llvm-svn: 240365	2015-06-23 02:05:55 +00:00
Tom Stellard	f0296cee9b	R600/SI: Use ELF64 format instead of ELF32 Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10392 llvm-svn: 240331	2015-06-22 21:03:54 +00:00
Tom Stellard	3aed34e947	R600: Use EM_AMDGPU for the ELF Machine type Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10390 llvm-svn: 240330	2015-06-22 21:03:52 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Eric Christopher	572e03a396	Fix "the the" in comments. llvm-svn: 240112	2015-06-19 01:53:21 +00:00
Matt Arsenault	417c93e3c1	AMDGPU: Change unreachable into reported error llvm-svn: 239943	2015-06-17 20:55:25 +00:00
Sanjoy Das	b666ea369c	[TargetInstrInfo] Rename getLdStBaseRegImmOfs and implement for x86. Summary: TargetInstrInfo::getLdStBaseRegImmOfs to TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86. The implementation only handles a few easy cases now and will be made more sophisticated in the future. This is NFCI: the only user of `getLdStBaseRegImmOfs` (now `getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion` is disabled for x86. Reviewers: reames, ab, MatzeB, atrick Reviewed By: MatzeB, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10199 llvm-svn: 239741	2015-06-15 18:44:14 +00:00
Tom Stellard	104ad064df	AMDGPU: s/R600/AMDGPU/ in the Makefiles Now the library names in the Makefiles match the library names in LLVMBuild.txt. This should hopefully fix the remaining bot failures. llvm-svn: 239661	2015-06-13 05:11:14 +00:00
Tom Stellard	45bb48ea19	R600 -> AMDGPU rename llvm-svn: 239657	2015-06-13 03:28:10 +00:00
Tom Stellard	1be1aa84ec	Revert "AMDGPU: Add core backend files for R600/SI codegen v6" This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea. llvm-svn: 160303	2012-07-16 18:19:53 +00:00
Tom Stellard	151dc338e4	Revert "Target/AMDGPU/R600KernelParameters.cpp: Fix two includes, <llvm/IRBuilder.h> and <llvm/TypeBuilder.h>" This reverts commit 0258a6bdd30802f5cc0e8e57c8e768fde2aef590. llvm-svn: 160299	2012-07-16 18:19:41 +00:00
Tom Stellard	1bd3012505	Revert "Target/AMDGPU: [CMake] Fix dependencies. 1) Add intrinsics_gen. Add AMDGPUCommonTableGen." This reverts commit ebc934ba32ee71abbb8f0f2eb6a0fbaa613ba0d2. llvm-svn: 160298	2012-07-16 18:19:40 +00:00
Tom Stellard	781853e11f	Revert "Target/AMDGPU/R600KernelParameters.cpp: Don't use "and", "or" as conditional operator..." This reverts commit 29f28bc14ad5a907f5dc849f004fafeec0aab33a. llvm-svn: 160297	2012-07-16 18:19:38 +00:00
Tom Stellard	2e007de42d	Revert "Target/AMDGPU/AMDILIntrinsicInfo.cpp: Use llvm_unreachable() in nonreturn function, instead of assert(0)." This reverts commit 4ba4acc1bc2561b944a571edbb6a2dc78e357dfe. llvm-svn: 160296	2012-07-16 18:19:37 +00:00
Tom Stellard	f65e78b2fa	Revert "Target/AMDGPU: Fix includes, or msvc build failed." This reverts commit fef4aa1b16fcf7a472559abbbcf4c1adc9eb5ca6. llvm-svn: 160295	2012-07-16 18:19:32 +00:00
NAKAMURA Takumi	96cc5e5bf9	Target/AMDGPU: Fix includes, or msvc build failed. llvm-svn: 160280	2012-07-16 15:43:50 +00:00
NAKAMURA Takumi	dc4261794f	Target/AMDGPU/AMDILIntrinsicInfo.cpp: Use llvm_unreachable() in nonreturn function, instead of assert(0). llvm-svn: 160279	2012-07-16 15:43:09 +00:00
NAKAMURA Takumi	5f5fd8e545	Target/AMDGPU/R600KernelParameters.cpp: Don't use "and", "or" as conditional operator... llvm-svn: 160278	2012-07-16 15:42:35 +00:00
NAKAMURA Takumi	bb42a5e2cf	Target/AMDGPU: [CMake] Fix dependencies. 1) Add intrinsics_gen. Add AMDGPUCommonTableGen. llvm-svn: 160276	2012-07-16 15:09:11 +00:00
NAKAMURA Takumi	3128d26124	Target/AMDGPU/R600KernelParameters.cpp: Fix two includes, <llvm/IRBuilder.h> and <llvm/TypeBuilder.h> llvm-svn: 160275	2012-07-16 15:08:47 +00:00
Tom Stellard	bcce80fa95	AMDGPU: Add core backend files for R600/SI codegen v6 llvm-svn: 160270	2012-07-16 14:17:08 +00:00

... 12 13 14 15 16 ...

1093 Commits