Stephen Lin
							
						 
						
							 
							
							
							
							
								
							
							
								4362261b00 
								
							 
						 
						
							
							
								
								CHECK-LABEL-ify some code gen tests to improve diagnostic experience when tests fail.  
							
							 
							
							... 
							
							
							
							llvm-svn: 188447 
							
						 
						
							2013-08-15 06:47:53 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								96fd264cc0 
								
							 
						 
						
							
							
								
								Make va_arg and argument passing to varargs functions work correctly with  
							
							 
							
							... 
							
							
							
							AVX vectors when AVX is turned on.
Fixes <rdar://problem/10513611>.
llvm-svn: 183813 
							
						 
						
							2013-06-12 00:13:45 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								2761350730 
								
							 
						 
						
							
							
								
								Fix a very silly mistake in r183590.  
							
							 
							
							... 
							
							
							
							llvm-svn: 183720 
							
						 
						
							2013-06-11 01:59:28 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								c11c169530 
								
							 
						 
						
							
							
								
								Fix va_arg on x86-64 for a struct containing a single int128_t.  PR16248  
							
							 
							
							... 
							
							
							
							llvm-svn: 183590 
							
						 
						
							2013-06-07 23:20:55 +00:00  
						
					 
				
					
						
							
							
								 
								Bill Wendling
							
						 
						
							 
							
							
							
							
								
							
							
								48939ced20 
								
							 
						 
						
							
							
								
								Update testcases due to Attribute sorting improvements.  
							
							 
							
							... 
							
							
							
							llvm-svn: 175253 
							
						 
						
							2013-02-15 05:25:49 +00:00  
						
					 
				
					
						
							
							
								 
								Bill Wendling
							
						 
						
							 
							
							
							
							
								
							
							
								85ab57ac5d 
								
							 
						 
						
							
							
								
								Update the tests.  
							
							 
							
							... 
							
							
							
							This update coincides with r174110. That change ordered the attributes
alphabetically.
llvm-svn: 174111 
							
						 
						
							2013-01-31 23:17:12 +00:00  
						
					 
				
					
						
							
							
								 
								Bill Wendling
							
						 
						
							 
							
							
							
							
								
							
							
								9806806f39 
								
							 
						 
						
							
							
								
								Modify the tests for the (sorted) order that the attributes come out as now.  
							
							 
							
							... 
							
							
							
							llvm-svn: 173762 
							
						 
						
							2013-01-29 03:21:00 +00:00  
						
					 
				
					
						
							
							
								 
								John McCall
							
						 
						
							 
							
							
							
							
								
							
							
								c818bbb8b2 
								
							 
						 
						
							
							
								
								Fix the required args count for variadic blocks.  
							
							 
							
							... 
							
							
							
							We were emitting calls to blocks as if all arguments were
required --- i.e. with signature (A,B,C,D,...) rather than
(A,B,...).  This patch fixes that and accounts for the
implicit block-context argument as a required argument.
In addition, this patch changes the function type under which
we call unprototyped functions on platforms like x86-64 that
guarantee compatibility of variadic functions with unprototyped
function types;  previously we would always call such functions
under the LLVM type T (...)*, but now we will call them under
the type T (A,B,C,D,...)*.  This last change should have no
material effect except for making the type conventions more
explicit;  it was a side-effect of the most convenient implementation.
llvm-svn: 169588 
							
						 
						
							2012-12-07 07:03:17 +00:00  
						
					 
				
					
						
							
							
								 
								Manman Ren
							
						 
						
							 
							
							
							
							
								
							
							
								836a93bdb3 
								
							 
						 
						
							
							
								
								ABI: comments from Eli on r168820.  
							
							 
							
							... 
							
							
							
							rdar://12723368
llvm-svn: 168821 
							
						 
						
							2012-11-28 22:29:41 +00:00  
						
					 
				
					
						
							
							
								 
								Manman Ren
							
						 
						
							 
							
							
							
							
								
							
							
								84b921f805 
								
							 
						 
						
							
							
								
								ABI: modify CreateCoercedLoad and CreateCoercedStore to not use load or store of  
							
							 
							
							... 
							
							
							
							the original parameter or return type.
Since we do not accurately represent the data fields of a union, we should not
directly load or store a union type.
As an exmple, if we have i8,i8, i32, i32 as one field type and i32,i32 as
another field type, the first field type will be chosen to represent the union.
If we load with the union's type, the 3rd byte and the 4th byte will be skipped.
rdar://12723368
llvm-svn: 168820 
							
						 
						
							2012-11-28 22:08:52 +00:00  
						
					 
				
					
						
							
							
								 
								Daniel Dunbar
							
						 
						
							 
							
							
							
							
								
							
							
								f07b5ec0dc 
								
							 
						 
						
							
							
								
								IRgen/ABI/x86_64: Avoid passing small structs using byval sometimes.  
							
							 
							
							... 
							
							
							
							- We do this when it is easy to determine that the backend will pass them on
   the stack properly by itself.
Currently LLVM codegen is really bad in some cases with byval, for example, on
the test case here (which is derived from Sema code, which likes to pass
SourceLocations around)::
  struct s47 { unsigned a; };
  void f47(int,int,int,int,int,int,struct s47);
  void test47(int a, struct s47 b) { f47(a, a, a, a, a, a, b); }
we used to emit code like this::
  ...
  movl	%esi, -8(%rbp)
  movl	-8(%rbp), %ecx
  movl	%ecx, (%rsp)
  ...
to handle moving the struct onto the stack, which is just appalling.
Now we generate::
  movl	%esi, (%rsp)
which seems better, no?
llvm-svn: 152462 
							
						 
						
							2012-03-10 01:03:58 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								bfd5addf4c 
								
							 
						 
						
							
							
								
								When we're passing a vector with an illegal type through memory on x86-64, use byval so we're sure the backend does the right thing.  Fixes va_arg with illegal vectors and an obscure ABI mismatch with __m64 vectors.  
							
							 
							
							... 
							
							
							
							llvm-svn: 145652 
							
						 
						
							2011-12-02 00:11:43 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								f37bd2f2f1 
								
							 
						 
						
							
							
								
								Don't use a varargs convention for calls unprototyped functions where one of the arguments is an AVX vector.  
							
							 
							
							... 
							
							
							
							llvm-svn: 145574 
							
						 
						
							2011-12-01 04:53:19 +00:00  
						
					 
				
					
						
							
							
								 
								Tanya Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								71f1b2dcd4 
								
							 
						 
						
							
							
								
								Correct the code generation for function arguments of vec3 types on x86_64 when they are greater than 128 bits. This was incorrectly coercing things like long3 into a double2.  
							
							 
							
							... 
							
							
							
							Add test case.
llvm-svn: 145312 
							
						 
						
							2011-11-28 23:18:11 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								a1748564b4 
								
							 
						 
						
							
							
								
								Make va_arg on x86-64 compute alignment the same way as argument passing.  
							
							 
							
							... 
							
							
							
							Fixes <rdar://problem/10463281>.
llvm-svn: 144966 
							
						 
						
							2011-11-18 02:44:19 +00:00  
						
					 
				
					
						
							
							
								 
								John McCall
							
						 
						
							 
							
							
							
							
								
							
							
								a5efa7386a 
								
							 
						 
						
							
							
								
								Track whether an AggValueSlot is potentially aliased, and do not  
							
							 
							
							... 
							
							
							
							emit call results into potentially aliased slots.  This allows us
to properly mark indirect return slots as noalias, at the cost
of requiring an extra memcpy when assigning an aggregate call
result into a l-value.  It also brings us into compliance with
the x86-64 ABI.
llvm-svn: 138599 
							
						 
						
							2011-08-25 23:04:34 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								98154a76fd 
								
							 
						 
						
							
							
								
								Reapply r134946 with fixes. Tested on Benjamin testcase and other test-suite failures.  
							
							 
							
							... 
							
							
							
							llvm-svn: 135091 
							
						 
						
							2011-07-13 21:58:55 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								0aadf83f80 
								
							 
						 
						
							
							
								
								Revert r134946  
							
							 
							
							... 
							
							
							
							llvm-svn: 135004 
							
						 
						
							2011-07-12 22:30:58 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								73e3004e75 
								
							 
						 
						
							
							
								
								fix an unintended behavior change in the type system rewrite, which caused us to compile  
							
							 
							
							... 
							
							
							
							stuff like this:
typedef struct {
 int x, y, z; 
} foo_t;
foo_t g;
into:
%"struct.<anonymous>" = type { i32, i32, i32 }
we now get:
%struct.foo_t = type { i32, i32, i32 }
This doesn't change the behavior of the compiler, but makes the IR much easier to read.
llvm-svn: 134969 
							
						 
						
							2011-07-12 05:53:08 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								75541d00e0 
								
							 
						 
						
							
							
								
								Do the same as r134946 for arrays. Add more testcases for avx x86_64 arg  
							
							 
							
							... 
							
							
							
							passing.
llvm-svn: 134951 
							
						 
						
							2011-07-12 01:27:38 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								7a26681092 
								
							 
						 
						
							
							
								
								Fix one x86_64 abi issue and the test to actually look for the right thing,  
							
							 
							
							... 
							
							
							
							which is: { <4 x float>, <4 x float> } should continue to go through memory.
llvm-svn: 134946 
							
						 
						
							2011-07-12 00:30:27 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								21a41bb5ec 
								
							 
						 
						
							
							
								
								Reapply r134754, which turns out to be working correctly and also  
							
							 
							
							... 
							
							
							
							add one more testcase.
llvm-svn: 134934 
							
						 
						
							2011-07-11 22:41:29 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								a5f58b05e8 
								
							 
						 
						
							
							
								
								clang side to match the LLVM IR type system rewrite patch.  
							
							 
							
							... 
							
							
							
							llvm-svn: 134831 
							
						 
						
							2011-07-09 17:41:47 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								129b4cc9ec 
								
							 
						 
						
							
							
								
								Revert x86_64 ABI changes until I have time to check the items raised by Eli.  
							
							 
							
							... 
							
							
							
							llvm-svn: 134765 
							
						 
						
							2011-07-08 22:57:35 +00:00  
						
					 
				
					
						
							
							
								 
								Bruno Cardoso Lopes
							
						 
						
							 
							
							
							
							
								
							
							
								308d7423a9 
								
							 
						 
						
							
							
								
								Add support for AVX 256-bit in the x86_64 ABI (as in the 0.99.5 draft)  
							
							 
							
							... 
							
							
							
							llvm-svn: 134754 
							
						 
						
							2011-07-08 22:18:40 +00:00  
						
					 
				
					
						
							
							
								 
								Eli Friedman
							
						 
						
							 
							
							
							
							
								
							
							
								1310c68bb0 
								
							 
						 
						
							
							
								
								Don't use x86_mmx where it isn't necessary.  
							
							 
							
							... 
							
							
							
							The start of some work on getting -mno-mmx working the way we want it to.
llvm-svn: 134300 
							
						 
						
							2011-07-02 00:57:27 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								44c2b90556 
								
							 
						 
						
							
							
								
								Fix x86-64 byval passing to specify the alignment even when the code  
							
							 
							
							... 
							
							
							
							generator will give it something sufficient.  This is important because
the mid-level optimizer doesn't know what alignment is required otherwise.
llvm-svn: 131879 
							
						 
						
							2011-05-22 23:21:23 +00:00  
						
					 
				
					
						
							
							
								 
								John McCall
							
						 
						
							 
							
							
							
							
								
							
							
								e0fda7377e 
								
							 
						 
						
							
							
								
								The 0.98 revision of the x86-64 ABI clarified a lot of things, some  
							
							 
							
							... 
							
							
							
							of which break strict compatibility with previous compilers.  Implement
one of them and then immediately opt out on Darwin.
llvm-svn: 129899 
							
						 
						
							2011-04-21 01:20:55 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								69e683fb35 
								
							 
						 
						
							
							
								
								vector of long and ulong are also classified as INTEGER in x86-64 abi,  
							
							 
							
							... 
							
							
							
							this fixes rdar://8358475 a failure of the gcc.dg/compat/vector_1 abi
test.
llvm-svn: 112205 
							
						 
						
							2010-08-26 18:13:50 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								46830f2fd6 
								
							 
						 
						
							
							
								
								1 x ulonglong needs to be classified as INTEGER, just like 1 x longlong,  
							
							 
							
							... 
							
							
							
							this fixes a miscompilation on the included testcase, rdar://8359248
llvm-svn: 112201 
							
						 
						
							2010-08-26 18:03:20 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								51e1cc2fe2 
								
							 
						 
						
							
							
								
								tame an assertion, fixing rdar://8357396  
							
							 
							
							... 
							
							
							
							llvm-svn: 112174 
							
						 
						
							2010-08-26 06:28:35 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								9f8b451876 
								
							 
						 
						
							
							
								
								Finally pass "two floats in a 64-bit unit" as a <2 x float> instead of  
							
							 
							
							... 
							
							
							
							as a double in the x86-64 ABI.  This allows us to generate much better
code for certain things, e.g.:
_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}
Used to compile into (look at the integer silliness!):
_f32:                                   ## @f32
## BB#0:                                ## %entry
	movd	%xmm1, %rax
	movd	%eax, %xmm1
	movd	%xmm0, %rcx
	movd	%ecx, %xmm0
	addss	%xmm1, %xmm0
	movd	%xmm0, %edx
	shrq	$32, %rax
	movd	%eax, %xmm0
	shrq	$32, %rcx
	movd	%ecx, %xmm1
	addss	%xmm0, %xmm1
	movd	%xmm1, %eax
	shlq	$32, %rax
	addq	%rdx, %rax
	movd	%rax, %xmm0
	ret
Now we get:
_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret
and compile stuff like:
extern float _Complex ccoshf( float _Complex ) ;
float _Complex ccosf ( float _Complex z ) {
 float _Complex iz;
 (__real__ iz) = -(__imag__ z);
 (__imag__ iz) = (__real__ z);
 return ccoshf(iz);
}
into:
_ccosf:                                 ## @ccosf
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm1
	xorps	LCPI4_0(%rip), %xmm1
	unpcklps	%xmm0, %xmm1
	movaps	%xmm1, %xmm0
	jmp	_ccoshf                 ## TAILCALL
instead of:
_ccosf:                                 ## @ccosf
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	movq	%rax, %rcx
	shlq	$32, %rcx
	shrq	$32, %rax
	xorl	$-2147483648, %eax      ## imm = 0xFFFFFFFF80000000
	addq	%rcx, %rax
	movd	%rax, %xmm0
	jmp	_ccoshf                 ## TAILCALL
There is still "stuff to be done" here for the struct case,
but this resolves rdar://6379669 - [x86-64 ABI] Pass and return 
_Complex float / double efficiently
llvm-svn: 112111 
							
						 
						
							2010-08-25 23:39:14 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								7f4b81af7a 
								
							 
						 
						
							
							
								
								fix rdar://8251384, another case where we could access beyond the  
							
							 
							
							... 
							
							
							
							end of a struct.  This improves the case when the struct being passed
contains 3 floats, either due to a struct or array of 3 things.  Before
we'd generate this IR for the testcase:
define float @bar(double %X.coerce0, double %X.coerce1) nounwind {
entry:
  %X = alloca %struct.foof, align 8               ; <%struct.foof*> [#uses=2]
  %0 = bitcast %struct.foof* %X to %1*            ; <%1*> [#uses=2]
  %1 = getelementptr %1* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %X.coerce0, double* %1
  %2 = getelementptr %1* %0, i32 0, i32 1         ; <double*> [#uses=1]
  store double %X.coerce1, double* %2
  %tmp = getelementptr inbounds %struct.foof* %X, i32 0, i32 2 ; <float*> [#uses=1]
  %tmp1 = load float* %tmp                        ; <float> [#uses=1]
  ret float %tmp1
}
which compiled (with optimization) to:
_bar:                                   ## @bar
## BB#0:                                ## %entry
	movd	%xmm1, %rax
	movd	%eax, %xmm0
	ret
Now we produce:
define float @bar(double %X.coerce0, float %X.coerce1) nounwind {
entry:
  %X = alloca %struct.foof, align 8               ; <%struct.foof*> [#uses=2]
  %0 = bitcast %struct.foof* %X to %0*            ; <%0*> [#uses=2]
  %1 = getelementptr %0* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %X.coerce0, double* %1
  %2 = getelementptr %0* %0, i32 0, i32 1         ; <float*> [#uses=1]
  store float %X.coerce1, float* %2
  %tmp = getelementptr inbounds %struct.foof* %X, i32 0, i32 2 ; <float*> [#uses=1]
  %tmp1 = load float* %tmp                        ; <float> [#uses=1]
  ret float %tmp1
}
and:
_bar:                                   ## @bar
## BB#0:                                ## %entry
	movaps	%xmm1, %xmm0
	ret
llvm-svn: 109776 
							
						 
						
							2010-07-29 18:13:09 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								3f76342cfc 
								
							 
						 
						
							
							
								
								handle a case where we could access off the end of a function  
							
							 
							
							... 
							
							
							
							that Eli pointed out, rdar://8249586
llvm-svn: 109762 
							
						 
						
							2010-07-29 17:34:39 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								44f9c3b3f1 
								
							 
						 
						
							
							
								
								in release mode, irbuilder doesn't add names to instructions,  
							
							 
							
							... 
							
							
							
							this will hopefully fix the osuosl clang-i686-darwin10 builder.
llvm-svn: 109760 
							
						 
						
							2010-07-29 17:14:05 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								98076a25ce 
								
							 
						 
						
							
							
								
								This is a little bit far, but optimize cases like:  
							
							 
							
							... 
							
							
							
							struct a {
  struct c {
    double x;
    int y;
  } x[1];
};
void foo(struct a A) {
}
into:
define void @foo(double %A.coerce0, i32 %A.coerce1) nounwind {
entry:
  %A = alloca %struct.a, align 8                  ; <%struct.a*> [#uses=1]
  %0 = bitcast %struct.a* %A to %struct.c*        ; <%struct.c*> [#uses=2]
  %1 = getelementptr %struct.c* %0, i32 0, i32 0  ; <double*> [#uses=1]
  store double %A.coerce0, double* %1
  %2 = getelementptr %struct.c* %0, i32 0, i32 1  ; <i32*> [#uses=1]
  store i32 %A.coerce1, i32* %2
instead of:
define void @foo(double %A.coerce0, i64 %A.coerce1) nounwind {
entry:
  %A = alloca %struct.a, align 8                  ; <%struct.a*> [#uses=1]
  %0 = bitcast %struct.a* %A to %0*               ; <%0*> [#uses=2]
  %1 = getelementptr %0* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %A.coerce0, double* %1
  %2 = getelementptr %0* %0, i32 0, i32 1         ; <i64*> [#uses=1]
  store i64 %A.coerce1, i64* %2
I only do this now because I never want to look at this code again :)
 
llvm-svn: 109738 
							
						 
						
							2010-07-29 07:43:55 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								c8b7b53a1e 
								
							 
						 
						
							
							
								
								implement a todo: pass a eight-byte that consists of a  
							
							 
							
							... 
							
							
							
							small integer + padding as that small integer.  On code
like:
struct c { double x; int y; };
void bar(struct c C) { }
This means that we compile to:
define void @bar(double %C.coerce0, i32 %C.coerce1) nounwind {
entry:
  %C = alloca %struct.c, align 8                  ; <%struct.c*> [#uses=2]
  %0 = getelementptr %struct.c* %C, i32 0, i32 0  ; <double*> [#uses=1]
  store double %C.coerce0, double* %0
  %1 = getelementptr %struct.c* %C, i32 0, i32 1  ; <i32*> [#uses=1]
  store i32 %C.coerce1, i32* %1
instead of:
define void @bar(double %C.coerce0, i64 %C.coerce1) nounwind {
entry:
  %C = alloca %struct.c, align 8                  ; <%struct.c*> [#uses=3]
  %0 = bitcast %struct.c* %C to %0*               ; <%0*> [#uses=2]
  %1 = getelementptr %0* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %C.coerce0, double* %1
  %2 = getelementptr %0* %0, i32 0, i32 1         ; <i64*> [#uses=1]
  store i64 %C.coerce1, i64* %2
which gives SRoA heartburn.
This implements rdar://5711709, a nice low number :)
llvm-svn: 109737 
							
						 
						
							2010-07-29 07:30:00 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								fe34c1d53e 
								
							 
						 
						
							
							
								
								Kill off the 'coerce' ABI passing form.  Now 'direct' and 'extend' always  
							
							 
							
							... 
							
							
							
							have a "coerce to" type which often matches the default lowering of Clang
type to LLVM IR type, but the coerce case can be handled by making them
not be the same.
This simplifies things and fixes issues where X86-64 abi lowering would 
return coerce after making preferred types exactly match up.  This caused
us to compile:
typedef float v4f32 __attribute__((__vector_size__(16)));
v4f32 foo(v4f32 X) {
  return X+X;
}
into this code at -O0:
define <4 x float> @foo(<4 x float> %X.coerce) nounwind {
entry:
  %retval = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=2]
  %coerce = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=2]
  %X.addr = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=3]
  store <4 x float> %X.coerce, <4 x float>* %coerce
  %X = load <4 x float>* %coerce                  ; <<4 x float>> [#uses=1]
  store <4 x float> %X, <4 x float>* %X.addr
  %tmp = load <4 x float>* %X.addr                ; <<4 x float>> [#uses=1]
  %tmp1 = load <4 x float>* %X.addr               ; <<4 x float>> [#uses=1]
  %add = fadd <4 x float> %tmp, %tmp1             ; <<4 x float>> [#uses=1]
  store <4 x float> %add, <4 x float>* %retval
  %0 = load <4 x float>* %retval                  ; <<4 x float>> [#uses=1]
  ret <4 x float> %0
}
Now we get:
define <4 x float> @foo(<4 x float> %X) nounwind {
entry:
  %X.addr = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=3]
  store <4 x float> %X, <4 x float>* %X.addr
  %tmp = load <4 x float>* %X.addr                ; <<4 x float>> [#uses=1]
  %tmp1 = load <4 x float>* %X.addr               ; <<4 x float>> [#uses=1]
  %add = fadd <4 x float> %tmp, %tmp1             ; <<4 x float>> [#uses=1]
  ret <4 x float> %add
}
This implements rdar://8248065
llvm-svn: 109733 
							
						 
						
							2010-07-29 06:26:06 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								9fa15c3608 
								
							 
						 
						
							
							
								
								ignore structs that wrap vectors in IR, the abstraction shouldn't add penalty.  
							
							 
							
							... 
							
							
							
							Before we'd compile the example into something like:
  %coerce.dive2 = getelementptr %struct.v4f32wrapper* %retval, i32 0, i32 0 ; <<4 x float>*> [#uses=1]
  %1 = bitcast <4 x float>* %coerce.dive2 to <2 x double>* ; <<2 x double>*> [#uses=1]
  %2 = load <2 x double>* %1, align 1             ; <<2 x double>> [#uses=1]
  ret <2 x double> %2
Now we produce:
  %coerce.dive2 = getelementptr %struct.v4f32wrapper* %retval, i32 0, i32 0 ; <<4 x float>*> [#uses=1]
  %0 = load <4 x float>* %coerce.dive2, align 1   ; <<4 x float>> [#uses=1]
  ret <4 x float> %0
llvm-svn: 109732 
							
						 
						
							2010-07-29 05:02:29 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								4200fe4e50 
								
							 
						 
						
							
							
								
								move the 'pretty 16-byte vector' inferring code up to be shared  
							
							 
							
							... 
							
							
							
							with return values, improving stuff that returns __m128 etc.
llvm-svn: 109731 
							
						 
						
							2010-07-29 04:56:46 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								3a44c7e55d 
								
							 
						 
						
							
							
								
								now that we have CGT around, we can start using preferred types  
							
							 
							
							... 
							
							
							
							for return values too.  Instead of compiling something like:
struct foo {
  int *X;
  float *Y;
};
struct foo test(struct foo *P) { return *P; }
to:
%1 = type { i64, i64 }
define %1 @test(%struct.foo* %P) nounwind {
entry:
  %retval = alloca %struct.foo, align 8           ; <%struct.foo*> [#uses=2]
  %P.addr = alloca %struct.foo*, align 8          ; <%struct.foo**> [#uses=2]
  store %struct.foo* %P, %struct.foo** %P.addr
  %tmp = load %struct.foo** %P.addr               ; <%struct.foo*> [#uses=1]
  %tmp1 = bitcast %struct.foo* %retval to i8*     ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.foo* %tmp to i8*        ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 16, i32 8, i1 false)
  %0 = bitcast %struct.foo* %retval to %1*        ; <%1*> [#uses=1]
  %1 = load %1* %0, align 1                       ; <%1> [#uses=1]
  ret %1 %1
}
We now get the result more type safe, with:
define %struct.foo @test(%struct.foo* %P) nounwind {
entry:
  %retval = alloca %struct.foo, align 8           ; <%struct.foo*> [#uses=2]
  %P.addr = alloca %struct.foo*, align 8          ; <%struct.foo**> [#uses=2]
  store %struct.foo* %P, %struct.foo** %P.addr
  %tmp = load %struct.foo** %P.addr               ; <%struct.foo*> [#uses=1]
  %tmp1 = bitcast %struct.foo* %retval to i8*     ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.foo* %tmp to i8*        ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 16, i32 8, i1 false)
  %0 = load %struct.foo* %retval                  ; <%struct.foo> [#uses=1]
  ret %struct.foo %0
}
That memcpy is completely terrible, but I don't know how to fix it.
llvm-svn: 109729 
							
						 
						
							2010-07-29 04:46:19 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								f4ba08aeaf 
								
							 
						 
						
							
							
								
								pass argument vectors in a type that corresponds to the user type if  
							
							 
							
							... 
							
							
							
							possible.  This improves the example to pass <4 x float> instead of
<2 x double> but we still get awful code, and still don't get the
return value right.
llvm-svn: 109700 
							
						 
						
							2010-07-28 23:47:21 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								31faff5d58 
								
							 
						 
						
							
							
								
								use Get8ByteTypeAtOffset for the return value path as well so we  
							
							 
							
							... 
							
							
							
							don't get errors similar to PR7714 on the return path.
llvm-svn: 109689 
							
						 
						
							2010-07-28 23:06:14 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								4c1e484f39 
								
							 
						 
						
							
							
								
								fix PR7714 by not referencing off the end of a struct when passed by value in  
							
							 
							
							... 
							
							
							
							x86-64 abi.  This also improves codegen as well.  Some refactoring is needed of
this code.
llvm-svn: 109681 
							
						 
						
							2010-07-28 22:15:08 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								c401de9998 
								
							 
						 
						
							
							
								
								in the "coerce" case, the ABI handling code ends up making the  
							
							 
							
							... 
							
							
							
							alloca for an argument.  Make sure the argument gets the proper
decl alignment, which may be different than the type alignment.
This fixes PR7567
llvm-svn: 107627 
							
						 
						
							2010-07-05 20:21:00 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								22a931e3bb 
								
							 
						 
						
							
							
								
								Change X86_64ABIInfo to have ASTContext and TargetData ivars to  
							
							 
							
							... 
							
							
							
							avoid passing ASTContext down through all the methods it has.
When classifying an argument, or argument piece, as INTEGER, check
to see if we have a pointer at exactly the same offset in the 
preferred type.  If so, use that pointer type instead of i64.  This
allows us to compile A function taking a stringref into something
like this:
define i8* @foo(i64 %D.coerce0, i8* %D.coerce1) nounwind ssp {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=4]
  %0 = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  store i64 %D.coerce0, i64* %0
  %1 = getelementptr %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  store i8* %D.coerce1, i8** %1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  %tmp3 = load i8** %tmp2                         ; <i8*> [#uses=1]
  %add.ptr = getelementptr inbounds i8* %tmp3, i64 %tmp1 ; <i8*> [#uses=1]
  ret i8* %add.ptr
}
instead of this:
define i8* @foo(i64 %D.coerce0, i64 %D.coerce1) nounwind ssp {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=3]
  %0 = insertvalue %0 undef, i64 %D.coerce0, 0    ; <%0> [#uses=1]
  %1 = insertvalue %0 %0, i64 %D.coerce1, 1       ; <%0> [#uses=1]
  %2 = bitcast %struct.DeclGroup* %D to %0*       ; <%0*> [#uses=1]
  store %0 %1, %0* %2, align 1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  %tmp3 = load i8** %tmp2                         ; <i8*> [#uses=1]
  %add.ptr = getelementptr inbounds i8* %tmp3, i64 %tmp1 ; <i8*> [#uses=1]
  ret i8* %add.ptr
}
This implements rdar://7375902 - [codegen quality] clang x86-64 ABI lowering code punishing StringRef
llvm-svn: 107123 
							
						 
						
							2010-06-29 06:01:59 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								9e748e9d6e 
								
							 
						 
						
							
							
								
								add IR names to coerced arguments.  
							
							 
							
							... 
							
							
							
							llvm-svn: 107105 
							
						 
						
							2010-06-29 00:14:52 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								3dd716c3c3 
								
							 
						 
						
							
							
								
								Change CGCall to handle the "coerce" case where the coerce-to type  
							
							 
							
							... 
							
							
							
							is a FCA to pass each of the elements as individual scalars.  This
produces code fast isel is less likely to reject and is easier on
the optimizers.
For example, before we would compile:
struct DeclGroup { long NumDecls; char * Y; };
char * foo(DeclGroup D) {
  return D.NumDecls+D.Y;
}
to:
%struct.DeclGroup = type { i64, i64 }
define i64 @_Z3foo9DeclGroup(%struct.DeclGroup) nounwind {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=3]
  store %struct.DeclGroup %0, %struct.DeclGroup* %D, align 1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i64*> [#uses=1]
  %tmp3 = load i64* %tmp2                         ; <i64> [#uses=1]
  %add = add nsw i64 %tmp1, %tmp3                 ; <i64> [#uses=1]
  ret i64 %add
}
Now we get:
%0 = type { i64, i64 }
%struct.DeclGroup = type { i64, i8* }
define i8* @_Z3foo9DeclGroup(i64, i64) nounwind {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=3]
  %2 = insertvalue %0 undef, i64 %0, 0            ; <%0> [#uses=1]
  %3 = insertvalue %0 %2, i64 %1, 1               ; <%0> [#uses=1]
  %4 = bitcast %struct.DeclGroup* %D to %0*       ; <%0*> [#uses=1]
  store %0 %3, %0* %4, align 1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  %tmp3 = load i8** %tmp2                         ; <i8*> [#uses=1]
  %add.ptr = getelementptr inbounds i8* %tmp3, i64 %tmp1 ; <i8*> [#uses=1]
  ret i8* %add.ptr
}
Elimination of the FCA inside the function is still-to-come.
llvm-svn: 107099 
							
						 
						
							2010-06-28 23:44:11 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								a7d81ab7f3 
								
							 
						 
						
							
							
								
								X86-64:  
							
							 
							
							... 
							
							
							
							pass/return structs of float/int as float/i32 instead of double/i64
to make the code generated for ABI cleaner.  Passing in the low part
of a double is the same as passing in a float.
For example, we now compile:
struct DeclGroup { float NumDecls; };
float foo(DeclGroup D);
void bar(DeclGroup *D) {
 foo(*D);
}
into:
%struct.DeclGroup = type { float }
define void @_Z3barP9DeclGroup(%struct.DeclGroup* %D) nounwind {
entry:
  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
  %agg.tmp = alloca %struct.DeclGroup, align 4    ; <%struct.DeclGroup*> [#uses=2]
  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 4, i32 4, i1 false)
  %coerce.dive = getelementptr %struct.DeclGroup* %agg.tmp, i32 0, i32 0 ; <float*> [#uses=1]
  %0 = load float* %coerce.dive, align 1          ; <float> [#uses=1]
  %call = call float @_Z3foo9DeclGroup(float %0)  ; <float> [#uses=0]
  ret void
}
instead of:
%struct.DeclGroup = type { float }
define void @_Z3barP9DeclGroup(%struct.DeclGroup* %D) nounwind {
entry:
  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
  %agg.tmp = alloca %struct.DeclGroup, align 4    ; <%struct.DeclGroup*> [#uses=2]
  %tmp3 = alloca double                           ; <double*> [#uses=2]
  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 4, i32 4, i1 false)
  %coerce.dive = getelementptr %struct.DeclGroup* %agg.tmp, i32 0, i32 0 ; <float*> [#uses=1]
  %0 = bitcast double* %tmp3 to float*            ; <float*> [#uses=1]
  %1 = load float* %coerce.dive                   ; <float> [#uses=1]
  store float %1, float* %0, align 1
  %2 = load double* %tmp3                         ; <double> [#uses=1]
  %call = call float @_Z3foo9DeclGroup(double %2) ; <float> [#uses=0]
  ret void
}
which is this machine code (at -O0):
__Z3barP9DeclGroup:
	subq	$24, %rsp
	movq	%rdi, 16(%rsp)
	movq	16(%rsp), %rdi
	leaq	8(%rsp), %rax
	movl	(%rdi), %ecx
	movl	%ecx, (%rax)
	movss	8(%rsp), %xmm0
	callq	__Z3foo9DeclGroup
	addq	$24, %rsp
	ret
vs this:
__Z3barP9DeclGroup:
	subq	$24, %rsp
	movq	%rdi, 16(%rsp)
	movq	16(%rsp), %rdi
	leaq	8(%rsp), %rax
	movl	(%rdi), %ecx
	movl	%ecx, (%rax)
	movss	8(%rsp), %xmm0
	movss	%xmm0, (%rsp)
	movsd	(%rsp), %xmm0
	callq	__Z3foo9DeclGroup
	addq	$24, %rsp
	ret
At -O3, it is the difference between this now:
__Z3barP9DeclGroup:
	movss	(%rdi), %xmm0
	jmp	__Z3foo9DeclGroup  # TAILCALL
vs this before:
__Z3barP9DeclGroup:
	movl	(%rdi), %eax
	movd	%rax, %xmm0
	jmp	__Z3foo9DeclGroup  # TAILCALL
llvm-svn: 107048 
							
						 
						
							2010-06-28 19:56:59 +00:00  
						
					 
				
					
						
							
							
								 
								Chris Lattner
							
						 
						
							 
							
							
							
							
								
							
							
								055097f024 
								
							 
						 
						
							
							
								
								If coercing something from int or pointer type to int or pointer type  
							
							 
							
							... 
							
							
							
							(potentially after unwrapping it from a struct) do it without going through
memory.  We now compile:
struct DeclGroup {
  unsigned NumDecls;
};
int foo(DeclGroup D) {
  return D.NumDecls;
}
into:
%struct.DeclGroup = type { i32 }
define i32 @_Z3foo9DeclGroup(i64) nounwind ssp noredzone {
entry:
  %D = alloca %struct.DeclGroup, align 4          ; <%struct.DeclGroup*> [#uses=2]
  %coerce.dive = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %coerce.val.ii = trunc i64 %0 to i32            ; <i32> [#uses=1]
  store i32 %coerce.val.ii, i32* %coerce.dive
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp1 = load i32* %tmp                          ; <i32> [#uses=1]
  ret i32 %tmp1
}
instead of:
%struct.DeclGroup = type { i32 }
define i32 @_Z3foo9DeclGroup(i64) nounwind ssp noredzone {
entry:
  %D = alloca %struct.DeclGroup, align 4          ; <%struct.DeclGroup*> [#uses=2]
  %tmp = alloca i64                               ; <i64*> [#uses=2]
  %coerce.dive = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  store i64 %0, i64* %tmp
  %1 = bitcast i64* %tmp to i32*                  ; <i32*> [#uses=1]
  %2 = load i32* %1, align 1                      ; <i32> [#uses=1]
  store i32 %2, i32* %coerce.dive
  %tmp1 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp2 = load i32* %tmp1                         ; <i32> [#uses=1]
  ret i32 %tmp2
}
... which is quite a bit less terrifying.
llvm-svn: 106975 
							
						 
						
							2010-06-27 06:26:04 +00:00