2b957b5a6f 
								
							 
						 
						
							
							
								
								AMDGPU: Make i64 loads/stores promote to v2i32  
							
							... 
							
							
							
							Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.
This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.
llvm-svn: 268293 
							
						 
						
							2016-05-02 20:07:26 +00:00  
				
					
						
							
							
								 
						
							
								3d1c1deb04 
								
							 
						 
						
							
							
								
								AMDGPU: Run SIFoldOperands after PeepholeOptimizer  
							
							... 
							
							
							
							PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378 
							
						 
						
							2016-04-14 21:58:24 +00:00  
				
					
						
							
							
								 
						
							
								9a19c240c0 
								
							 
						 
						
							
							
								
								AMDGPU: Materialize sign bits with bfrev  
							
							... 
							
							
							
							If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.
llvm-svn: 263201 
							
						 
						
							2016-03-11 07:42:49 +00:00  
				
					
						
							
							
								 
						
							
								0de924b76d 
								
							 
						 
						
							
							
								
								AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE  
							
							... 
							
							
							
							Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.
llvm-svn: 251855 
							
						 
						
							2015-11-02 23:15:42 +00:00  
				
					
						
							
							
								 
						
							
								45bb48ea19 
								
							 
						 
						
							
							
								
								R600 -> AMDGPU rename  
							
							... 
							
							
							
							llvm-svn: 239657 
							
						 
						
							2015-06-13 03:28:10 +00:00