llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Lattner	07add49a4b	Implement major new fastisel functionality: the matcher can now handle immediates with value constraints on them (when defined as ImmLeaf's). This is particularly important for X86-64, where almost all reg/imm instructions take a i64immSExt32 immediate operand, which has a value constraint. Before this patch we ended up iseling the examples into such amazing code as: movabsq $7, %rax imulq %rax, %rdi movq %rdi, %rax ret now we produce: imulq $7, %rdi, %rax ret This dramatically shrinks the generated code at -O0 on x86-64. llvm-svn: 129691	2011-04-18 06:22:33 +00:00
Chris Lattner	353fda159d	relax this test to just check that the lock prefix is encoded properly, and to not rely on the register allocator's arbitrary operand choices. llvm-svn: 129690	2011-04-18 06:15:35 +00:00
Chris Lattner	b53ccb8e36	1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll 2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts 3. teach tblgen to handle shift immediates that are different sizes than the shifted operands, eliminating some code from the X86 fast isel backend. 4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function instead of FastEmit_ri to simplify code. llvm-svn: 129666	2011-04-17 20:23:29 +00:00
Chris Lattner	eb729d48ff	fix an x86 fast isel issue where we'd completely give up on folding an address when we have a global variable base an an index. Instead, just give up on folding the global variable. Before we'd geenrate: _test: ## @test ## BB#0: movq _rtx_length@GOTPCREL(%rip), %rax leaq (%rax), %rax addq %rdi, %rax movzbl (%rax), %eax ret now we generate: _test: ## @test ## BB#0: movq _rtx_length@GOTPCREL(%rip), %rax movzbl (%rax,%rdi), %eax ret The difference is even more significant when there is a scale involved. This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64 llvm-svn: 129664	2011-04-17 17:47:38 +00:00
Chris Lattner	4832660b4d	fix an oversight which caused us to compile the testcase (and other less trivial things) into a dummy lea. Before we generated: _test: ## @test movq _G@GOTPCREL(%rip), %rax leaq (%rax), %rax ret now we produce: _test: ## @test movq _G@GOTPCREL(%rip), %rax ret This is part of rdar://9289558 llvm-svn: 129662	2011-04-17 17:12:08 +00:00
Chris Lattner	045c43855c	Fix rdar://9289512 - not folding load into compare at -O0 The basic issue here is that bottom-up isel is matching the branch and compare, and was failing to fold the load into the branch/compare combo. Fixing this (by allowing folding into any instruction of a sequence that is selected) allows us to produce things like: cmpb $0, 52(%rax) je LBB4_2 instead of: movb 52(%rax), %cl cmpb $0, %cl je LBB4_2 This makes the generated -O0 code run a bit faster, but also speeds up compile time by putting less pressure on the register allocator and generating less code. This was one of the biggest classes of missing load folding. Implementing this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm) line count. llvm-svn: 129656	2011-04-17 06:35:44 +00:00
Eli Friedman	55f7bf3289	Remove working entry from README. llvm-svn: 129654	2011-04-17 02:36:27 +00:00
Chris Lattner	fba7ca63cc	fix rdar://9289583 - fast isel should handle non-canonical commutative binops allowing us to fold the immediate into the 'and' in this case: int test1(int i) { return 8&i; } llvm-svn: 129653	2011-04-17 01:16:47 +00:00
Eli Friedman	55b0acd624	PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext. Returning a new node makes the code try to replace the old node, which in the included testcase is killed by CSE. llvm-svn: 129650	2011-04-16 23:25:34 +00:00
Evan Cheng	b14ce09fca	Fix divmod libcall lowering. Convert to {S\|U}DIVREM first and then expand the node to a libcall. rdar://9280991 llvm-svn: 129633	2011-04-16 03:08:26 +00:00
Akira Hatanaka	2cb3aa30dd	Re-enable test o32_cc_vararg.ll. llvm-svn: 129616	2011-04-15 22:23:09 +00:00
Cameron Zwarich	9c65e4d69c	Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate a case involving EOR, so I only added a test for ORR. llvm-svn: 129610	2011-04-15 21:24:38 +00:00
Rafael Espindola	9fef721830	Add this test back for Darwin. llvm-svn: 129607	2011-04-15 21:06:27 +00:00
Cameron Zwarich	0829b3065a	The AND instruction leaves the V flag unmodified, so it falls victim to the same problem as all of the other instructions we fold with CMPs. llvm-svn: 129602	2011-04-15 20:45:00 +00:00
Cameron Zwarich	93eae1571c	Add missing register forms of instructions to the ARM CMP-folding code. This fixes <rdar://problem/9287901>. llvm-svn: 129599	2011-04-15 20:28:28 +00:00
Akira Hatanaka	279169771b	Add pass that expands pseudo instructions into target instructions after register allocation. Define pseudos that get expanded into mtc1 or mfc1 instructions. llvm-svn: 129594	2011-04-15 19:52:08 +00:00
Rafael Espindola	a01cdb0e37	Add 129518 back with a fix for when we are producing eh just because of debug info. Change ELF systems to use CFI for producing the EH tables. This reduces the size of the clang binary in Debug builds from 690MB to 679MB. llvm-svn: 129571	2011-04-15 15:11:06 +00:00
NAKAMURA Takumi	b5e3e9dd27	Revert r129518, "Change ELF systems to use CFI for producing the EH tables. This reduces the" It broke several builds. llvm-svn: 129557	2011-04-15 03:35:57 +00:00
Evan Cheng	12bb05b75b	Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't forget to right shift the source by 32 first. rdar://9287902 llvm-svn: 129556	2011-04-15 01:31:00 +00:00
Michael J. Spencer	30088ba110	Add 3DNow! intrinsics. llvm-svn: 129551	2011-04-15 00:32:41 +00:00
Evan Cheng	44887f9c7e	Follow up on r127913. Fix Thumb revsh isel. rdar://9286766 llvm-svn: 129548	2011-04-14 23:27:44 +00:00
Rafael Espindola	aa2a7cd828	Change ELF systems to use CFI for producing the EH tables. This reduces the size of the clang binary in Debug builds from 690MB to 679MB. llvm-svn: 129518	2011-04-14 15:18:53 +00:00
Andrew Trick	bfbd972b1f	In the pre-RA scheduler, maintain cmp+br proximity. This is done by pushing physical register definitions close to their use, which happens to handle flag definitions if they're not glued to the branch. This seems to be generally a good thing though, so I didn't need to add a target hook yet. The primary motivation is to generate code closer to what people expect and rule out missed opportunity from enabling macro-op fusion. As a side benefit, we get several 2-5% gains on x86 benchmarks. There is one regression: SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is an independent scheduler bug that will be tracked separately. See rdar://problem/9283108. Incidentally, pre-RA scheduling is only half the solution. Fixing the later passes is tracked by: <rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump Fixes: <rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion llvm-svn: 129508	2011-04-14 05:15:06 +00:00
Bill Wendling	410ec4aad1	As Dan pointed out, movzbl, movsbl, and friends are nicer than their alias (movzx/movsx) because they give more information. Revert that part of the patch. llvm-svn: 129498	2011-04-14 01:46:37 +00:00
Bill Wendling	7e07d6fb69	Have the X86 back-end emit the alias instead of what's being aliased. In most cases, it's much nicer and more informative reading the alias. llvm-svn: 129497	2011-04-14 01:11:51 +00:00
Cameron Zwarich	415b5e8341	Fix a typo in an ARM-specific DAG combine. This fixes <rdar://problem/9278274>. llvm-svn: 129468	2011-04-13 21:01:19 +00:00
Cameron Zwarich	9398197ef1	Fix a regression caused by r102515 where explicit alignment on globals is ignored. There was a test to catch this, but it was just blindly updated in a large change. This fixes another part of <rdar://problem/9275290>. llvm-svn: 129466	2011-04-13 20:36:04 +00:00
Cameron Zwarich	70be27e913	Fix an obvious problem with an alignment computation. AsmPrinter actually does the max itself, so it is not easy to write a test case for this, but I added a test case that would fail if the code in AsmPrinter were removed. llvm-svn: 129432	2011-04-13 09:02:43 +00:00
Cameron Zwarich	cdf59f7016	If a global variable has a specified alignment that is less than the preferred alignment for its type, use the minimum of the specified alignment and the ABI alignment. This fixes <rdar://problem/9275290>. llvm-svn: 129428	2011-04-13 06:03:16 +00:00
Andrew Trick	b53a00d2cb	Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency. Additional fixes: Do something reasonable for subtargets with generic itineraries by handle node latency the same as for an empty itinerary. Now nodes default to unit latency unless an itinerary explicitly specifies a zero cycle stage or it is a TokenFactor chain. Original fixes: UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make the ndoe latency adjustments work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129421	2011-04-13 00:38:32 +00:00
Bill Wendling	b902f1dd88	Reapply r129401 with patch for clang. llvm-svn: 129419	2011-04-13 00:36:11 +00:00
Eric Christopher	28f4c729f7	Temporarily revert r129408 to see if it brings the bots back. llvm-svn: 129417	2011-04-13 00:20:59 +00:00
Eric Christopher	d829f43c06	Fix a bug where we were counting the alias sets as completely used registers for fast allocation. Fixes rdar://9207598 llvm-svn: 129408	2011-04-12 23:23:14 +00:00
Bill Wendling	dbfde42468	Revert r129401 for now. Clang is using the old way of doing things. llvm-svn: 129403	2011-04-12 22:59:27 +00:00
Bill Wendling	47c24875a1	Remove the unaligned load intrinsics in favor of using native unaligned loads. Now that we have a first-class way to represent unaligned loads, the unaligned load intrinsics are superfluous. First part of <rdar://problem/8460511>. llvm-svn: 129401	2011-04-12 22:46:31 +00:00
Andrew Trick	1b60ad6644	Revert 129383. It causes some targets to hit a scheduler assert. llvm-svn: 129385	2011-04-12 20:14:07 +00:00
Andrew Trick	c5dd24a542	PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency. UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make these heuristic adjustments to node latency work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129383	2011-04-12 19:54:36 +00:00
Cameron Zwarich	fbcd69b96a	Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM stores of arguments in the same cache line. This fixes the second half of <rdar://problem/8674845>. llvm-svn: 129345	2011-04-12 02:24:17 +00:00
Wesley Peck	1914c39bd4	Add scheduling information for the MBlaze backend. llvm-svn: 129311	2011-04-11 22:31:52 +00:00
Evan Cheng	ef42bea704	Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679 llvm-svn: 129297	2011-04-11 21:09:18 +00:00
Chris Lattner	214f114aa7	look for the verboten argument slot access in any order, thanks to Frits for pointing this out llvm-svn: 129217	2011-04-09 17:00:34 +00:00
Chris Lattner	af1bccec68	Fix a bug where RecursivelyDeleteTriviallyDeadInstructions could delete the instruction pointed to by CGP's current instruction iterator, leading to a crash on the testcase. This fixes PR9578. llvm-svn: 129200	2011-04-09 07:05:44 +00:00
Chris Lattner	418b1037b0	fix two completely broken tests, which were matching due to PR9629. llvm-svn: 129195	2011-04-09 06:34:38 +00:00
Chris Lattner	ea6afab4b0	remove a bunch of CHECK lines that aren't checking what they thought they were, because alternation was expanding wrong in {{}}'s. llvm-svn: 129194	2011-04-09 06:31:06 +00:00
Chris Lattner	41c80e89f3	have dag combine zap "store undef", which can be formed during call lowering with undef arguments. llvm-svn: 129185	2011-04-09 02:32:02 +00:00
Chris Lattner	1c42a4d159	don't test for codegen of 'store undef' llvm-svn: 129184	2011-04-09 02:31:26 +00:00
Evan Cheng	74d92c1924	Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time. llvm-svn: 129152	2011-04-08 21:37:21 +00:00
Evan Cheng	9a3f2772f0	Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183. llvm-svn: 129107	2011-04-07 20:31:12 +00:00
Andrew Trick	2ad0b37318	Added a check in the preRA scheduler for potential interference on a induction variable. The preRA scheduler is unaware of induction vars, so we look for potential "virtual register cycles" instead. Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing llvm-svn: 129100	2011-04-07 19:54:57 +00:00
Akira Hatanaka	d6f1c58914	Fix handling of functions with internal linkage. llvm-svn: 129099	2011-04-07 19:51:44 +00:00
Tanya Lattner	266792a55a	Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases. llvm-svn: 129074	2011-04-07 15:24:20 +00:00
Evan Cheng	a7c7b54dde	Change -arm-divmod-libcall to a target neutral option. llvm-svn: 129045	2011-04-07 00:58:44 +00:00
Owen Anderson	bdff1c997a	Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB. llvm-svn: 129038	2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen	1ec41e2bd9	These tests no longer require linear scan because reserved register coalescing is now universal. llvm-svn: 128936	2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen	6aa0fbf4c0	Run LiveDebugVariables in RegAllocBasic and RegAllocGreedy. llvm-svn: 128935	2011-04-05 21:40:37 +00:00
Jakob Stoklund Olesen	e20fec7732	Fix one more batch of X86 tests to be register allocation dependent. llvm-svn: 128919	2011-04-05 20:20:30 +00:00
Jakob Stoklund Olesen	18fd84c79a	When dead code elimination removes all but one use, try to fold the single def into the remaining use. Rematerialization can leave single-use loads behind that we might as well fold whenever possible. llvm-svn: 128918	2011-04-05 20:20:26 +00:00
Johnny Chen	293875ef55	Fix test-llvm failures. llvm-svn: 128906	2011-04-05 18:41:40 +00:00
Stuart Hastings	345094777f	ARM doesn't support byval yet. XFAIL this test until it does. llvm-svn: 128891	2011-04-05 17:16:21 +00:00
Jakob Stoklund Olesen	76ad3debab	Ensure all defs referring to a virtual register are marked dead by addRegisterDead(). There can be multiple defs for a single virtual register when they are defining sub-registers. The missing <dead> flag was stopping the inline spiller from eliminating dead code after rematerialization. llvm-svn: 128888	2011-04-05 16:53:50 +00:00
Rafael Espindola	7dd4d6e2e8	Print visibility info for external variables. llvm-svn: 128887	2011-04-05 15:51:32 +00:00
Eric Christopher	f392a69ff7	Fix up testcase for previous commit. llvm-svn: 128870	2011-04-05 00:56:01 +00:00
Jakob Stoklund Olesen	bd09d45489	Fix register-dependent X86 tests. llvm-svn: 128867	2011-04-05 00:32:44 +00:00
Jakob Stoklund Olesen	2e85396509	Allow coalescing with reserved physregs in certain cases: When a virtual register has a single value that is defined as a copy of a reserved register, permit that copy to be joined. These virtual register are usually copies of the stack pointer: %vreg75<def> = COPY %ESP; GR32:%vreg75 MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill> MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0 MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0 CALLpcrel32 ... Coalescing these virtual registers early decreases register pressure. Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after register allocation was completed. The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail because it depends on linear scan spilling a particular register. I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of instructions emitted, and its revision history shows the 'correct' count being edited many times. llvm-svn: 128845	2011-04-04 21:00:03 +00:00
Jakob Stoklund Olesen	8296e30627	Disable the PowerPC/Atomics-64 test. The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is wrong, and I don't know how to fix it. It seems to be using the correct register classes for pointers, but it inserts all 32-bit instructions. llvm-svn: 128835	2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen	218661346a	Fix PowerPC tests to be register allocator independent. llvm-svn: 128827	2011-04-04 17:07:03 +00:00
Che-Liang Chiou	e34b271718	ptx: support setp's 4-operand format llvm-svn: 128767	2011-04-02 08:51:39 +00:00
Cameron Zwarich	6fe5c29430	Do some peephole optimizations to remove pointless VMOVs from Neon to integer registers that arise from argument shuffling with the soft float ABI. These instructions are particularly slow on Cortex A8. This fixes one half of <rdar://problem/8674845>. llvm-svn: 128759	2011-04-02 02:40:43 +00:00
Jim Grosbach	360c369967	LDRD/STRD instructions should print both Rt and Rt2 in the asm string. llvm-svn: 128736	2011-04-01 20:26:57 +00:00
Akira Hatanaka	93f898f643	Add code for analyzing FP branches. Clean up branch Analysis functions. llvm-svn: 128718	2011-04-01 17:39:08 +00:00
Evan Cheng	a6a992a662	Add test case. llvm-svn: 128707	2011-04-01 06:27:25 +00:00
Evan Cheng	0f86d6de50	FileCheck'ify test. llvm-svn: 128706	2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen	100f53fd25	Fix Thumb and Thumb2 tests to be register allocator independent. llvm-svn: 128690	2011-03-31 23:31:50 +00:00
Jakob Stoklund Olesen	0709342652	Provide a legal pointer register class when targeting thumb1. The LocalStackSlotAllocation pass was creating illegal registers. llvm-svn: 128687	2011-03-31 23:02:15 +00:00
Jakob Stoklund Olesen	903baeac27	Fix SystemZ tests llvm-svn: 128686	2011-03-31 23:02:12 +00:00
Jakob Stoklund Olesen	0888bcf542	Fix ARM tests to be register allocator independent. llvm-svn: 128680	2011-03-31 22:14:03 +00:00
Evan Cheng	38bf5adcea	Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier accumulator forwarding: vadd d3, d0, d1 vmul d3, d3, d2 => vmul d3, d0, d2 vmla d3, d1, d2 llvm-svn: 128665	2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen	f4c9754d5c	Fix Mips, Sparc, and XCore tests that were dependent on register allocation. Add an extra run with -regalloc=basic to keep them honest. llvm-svn: 128654	2011-03-31 18:42:43 +00:00
Akira Hatanaka	a535270d91	Added support for FP conditional move instructions and fixed bugs in handling of FP comparisons. llvm-svn: 128650	2011-03-31 18:26:17 +00:00
Jakob Stoklund Olesen	e6e6750670	Don't completely eliminate identity copies that also modify super register liveness. Turn them into noop KILL instructions instead. This lets the scavenger know when super-registers are killed and defined. llvm-svn: 128645	2011-03-31 17:55:25 +00:00
Jakob Stoklund Olesen	9a78835414	Mark all uses as <undef> when joining a copy. This way, shrinkToUses() will ignore the instruction that is about to be deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like. Fix a misunderstanding in MachineVerifier about <def,undef> operands. The <undef> flag is valid on def operands where it has the same meaning as <undef> on a use operand. It only applies to sub-register defines which also read the full register. llvm-svn: 128642	2011-03-31 17:23:25 +00:00
Richard Osborne	9a827b30ab	Add XCore intrinsics for initializing / starting / synchronizing threads. llvm-svn: 128633	2011-03-31 15:13:13 +00:00
Jakob Stoklund Olesen	ae044c06bf	Pick a conservative register class when creating a small live range for remat. The rematerialized instruction may require a more constrained register class than the register being spilled. In the test case, the spilled register has been inflated to the DPR register class, but we are rematerializing a load of the ssub_0 sub-register which only exists for DPR_VFP2 registers. The register class is reinflated after spilling, so the conservative choice is only temporary. llvm-svn: 128610	2011-03-31 03:54:44 +00:00
Evan Cheng	ee9d45dd55	Don't try to create zero-sized stack objects. llvm-svn: 128586	2011-03-30 23:44:13 +00:00
Cameron Zwarich	53dd03d537	Add a ARM-specific SD node for VBSL so that forms with a constant first operand can be recognized. This fixes <rdar://problem/9183078>. llvm-svn: 128584	2011-03-30 23:01:21 +00:00
Evan Cheng	18381b4257	Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends was lowering them to sext / uxt + mul instructions. Unfortunately the optimization passes may hoist the extensions out of the loop and separate them. When that happens, the long multiplication instructions can be broken into several scalar instructions, causing significant performance issue. Note the vmla and vmls intrinsics are not added back. Frontend will codegen them as intrinsics vmull* + add / sub. Also note the isel optimizations for catching mul + sext / zext are not changed either. First part of rdar://8832507, rdar://9203134 llvm-svn: 128502	2011-03-29 23:06:19 +00:00
Cameron Zwarich	143f9aea2b	Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes <rdar://problem/8875309> and <rdar://problem/9057191>. llvm-svn: 128492	2011-03-29 21:41:55 +00:00
Rafael Espindola	6b2fac21ca	Reduce test case. llvm-svn: 128445	2011-03-29 02:18:54 +00:00
Evan Cheng	e2086e740f	Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during isel lowering to fold the zero-extend's and take advantage of no-stall back to back vmul + vmla: vmull q0, d4, d6 vmlal q0, d5, d6 is faster than vaddl q0, d4, d5 vmovl q1, d6 vmul q0, q0, q1 This allows us to vmull + vmlal for: f = vmull_u8( vget_high_u8(s), c); f = vmlal_u8(f, vget_low_u8(s), c); rdar://9197392 llvm-svn: 128444	2011-03-29 01:56:09 +00:00
Bill Wendling	96f962fdff	In some cases, the "fail BB dominator" may be null after the BB was split (and becomes reachable when before it wasn't). Check to make sure that it's not null before trying to use it. llvm-svn: 128434	2011-03-28 23:02:18 +00:00
Jakob Stoklund Olesen	9a624fa993	Collect and coalesce DBG_VALUE instructions before emitting the function. Correctly terminate the range of register DBG_VALUEs when the register is clobbered or when the basic block ends. The code is now ready to deal with variables that are sometimes in a register and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack slot'. llvm-svn: 128327	2011-03-26 02:19:36 +00:00
Eric Christopher	d553096688	Fix the bfi handling for or (and a mask) (and b mask). We need the two masks to match inversely for the code as is to work. For the example given we actually want: bfi r0, r2, #1, #1 not #0, however, given the way the pattern is written it's not possible at the moment. Fixes rdar://9177502 llvm-svn: 128320	2011-03-26 01:21:03 +00:00
Jakob Stoklund Olesen	1886a4c823	Emit less labels for debug info and stop emitting .loc directives for DBG_VALUEs. The .dot directives don't need labels, that is a leftover from when we created line number info manually. Instructions following a DBG_VALUE can share its label since the DBG_VALUE doesn't produce any code. llvm-svn: 128284	2011-03-25 17:20:59 +00:00
Devang Patel	71536de752	Move test in x86 specific area. llvm-svn: 128245	2011-03-24 22:39:09 +00:00
Devang Patel	e01b75cb89	Keep track of directory namd and fIx regression caused by Rafael's patch r119613. A better approach would be to move source id handling inside MC. llvm-svn: 128233	2011-03-24 20:30:50 +00:00
NAKAMURA Takumi	521eb7c11e	Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets. FIXME: Some cleanups would be needed. llvm-svn: 128206	2011-03-24 07:07:00 +00:00
Cameron Zwarich	4649f17db1	Do early taildup of ret in CodeGenPrepare for potential tail calls that have a void return type. This fixes PR9487. llvm-svn: 128197	2011-03-24 04:52:10 +00:00
Devang Patel	abc77347a7	Enable GlobalMerge on darwin. llvm-svn: 128183	2011-03-23 23:34:19 +00:00
Andrew Trick	4ab9a16569	Revert r128175. I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix. llvm-svn: 128181	2011-03-23 23:11:02 +00:00
Evan Cheng	425489d397	Cmp peephole optimization isn't always safe for signed arithmetics. int tries = INT_MAX; while (tries > 0) { tries--; } The check should be: subs r4, #1 cmp r4, #0 bgt LBB0_1 The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop canonicalization apparently does in this case). cmp #0 would have cleared it while not changing the N and Z bits. Since BGT is dependent on the V bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0. rdar://9172742 llvm-svn: 128179	2011-03-23 22:52:04 +00:00
Eli Friedman	4c192305bf	PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND. Also cleaning up some duplicated code while I'm here. llvm-svn: 128176	2011-03-23 22:18:48 +00:00
Andrew Trick	4046a0de91	Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS. (target-specific branchless method for double-width relational comparisons on x86) llvm-svn: 128175	2011-03-23 22:16:02 +00:00
Jakob Stoklund Olesen	ec0ac3ca40	Reapply r128045 and r128051 with fixes. This will extend the ranges of debug info variables in registers until they are clobbered. Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on the stack with DBG_VALUE instructions referring to variables in the frame pointer. This fixes the gdb test-suite failure. Fix 2: Don't trace through copies to physical registers setting up call arguments. These registers are call clobbered, and the source register is more likely to be a callee-saved register that can be extended through the call instruction. llvm-svn: 128114	2011-03-22 22:33:08 +00:00
Andrew Trick	b0f98bb5e9	Revert r128045 and r128051, debug info enhancements. Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem. llvm-svn: 128097	2011-03-22 19:18:42 +00:00
Che-Liang Chiou	7413080cea	ptx: add analyze/insert/remove branch llvm-svn: 128084	2011-03-22 14:12:00 +00:00
Jakob Stoklund Olesen	9c057ee440	Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges. These ranges get completely jumbled by the post-ra scheduler, and it is not really reasonable to expect it to make sense of them. Instead, teach DwarfDebug to notice when user variables in registers are clobbered, and terminate the ranges there. llvm-svn: 128045	2011-03-22 00:21:41 +00:00
Dan Gohman	c1783b31a4	Fix fast-isel address mode folding to avoid folding instructions outside of the current basic block. This fixes PR9500, rdar://9156159. llvm-svn: 128041	2011-03-22 00:04:35 +00:00
Rafael Espindola	1557fd6d39	Write the section table and the section data in the same order that gun as does. This makes it a lot easier to compare the output of both as the addresses are now a lot closer. llvm-svn: 127972	2011-03-20 18:44:20 +00:00
Daniel Dunbar	327cd36f74	Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR", it broke a lot of things. llvm-svn: 127954	2011-03-19 21:47:14 +00:00
Evan Cheng	824a711305	SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953	2011-03-19 17:17:39 +00:00
Nadav Rotem	e7a101ccab	Add support for legalizing UINT_TO_FP of vectors on platforms which do not have native support for this operation (such as X86). The legalized code uses two vector INT_TO_FP operations and is faster than scalarizing. llvm-svn: 127951	2011-03-19 13:09:10 +00:00
Andrew Trick	e7537a0187	FileCheckize a test. (one-by-one until valgrind is happy) llvm-svn: 127925	2011-03-19 00:41:39 +00:00
Evan Cheng	dc1d626a3d	Match a few more obvious patterns to revsh. rdar://9147637. llvm-svn: 127913	2011-03-18 21:52:42 +00:00
Eli Friedman	59721e3238	Revert r127852; it's apparently causing an ICE on mingw. llvm-svn: 127909	2011-03-18 21:12:29 +00:00
Justin Holewinski	0984dcc077	PTX: Fix various codegen issues - Emit mad instead of mad.rn for shader model 1.0 - Emit explicit mov.u32 instructions for reading global variables - (most PTX instructions cannot take global variable immediates) llvm-svn: 127895	2011-03-18 19:24:28 +00:00
Che-Liang Chiou	b1df0fe1cc	ptx: fix parameter order that is reversed llvm-svn: 127874	2011-03-18 11:23:56 +00:00
Che-Liang Chiou	ff9d938e33	ptx: add unconditional and conditional branch llvm-svn: 127873	2011-03-18 11:08:52 +00:00
Eli Friedman	1a916a3c0c	Add a target-specific branchless method for double-width relational comparisons on x86. Essentially, the way this works is that SUB+SBB sets the relevant flags the same way a double-width CMP would. This is a substantial improvement over the generic lowering in LLVM. The output is also shorter than the gcc-generated output; I haven't done any detailed benchmarking, though. llvm-svn: 127852	2011-03-18 02:34:11 +00:00
Benjamin Kramer	cfcea12fe2	BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift. This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into shrl $2, %edi imulq $613566757, %rdi, %rax shrq $32, %rax ret instead of movl %edi, %eax imulq $613566757, %rax, %rcx shrq $32, %rcx subl %ecx, %eax shrl %eax addl %ecx, %eax shrl $4, %eax on x86_64 llvm-svn: 127829	2011-03-17 20:39:14 +00:00
Richard Osborne	6120962d7d	Add XCore intrinsic for setpsc. llvm-svn: 127821	2011-03-17 18:42:05 +00:00
NAKAMURA Takumi	bf9ff6f63b	test/CodeGen/X86/h-registers-1.ll: Add explicit -mtriple=x86_64-linux. It does not need to be checked on x86_64-win32 (aka Win64). llvm-svn: 127800	2011-03-17 04:24:40 +00:00
NAKAMURA Takumi	5b6198dfb9	test/CodeGen/X86/constant-pool-remat-0.ll: FileCheck-ize and add explicit -mtriple=x86_64-linux. llvm-svn: 127775	2011-03-16 23:01:31 +00:00
Cameron Zwarich	ac106273d4	The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte rather than an int. Thankfully, this only causes LLVM to miss optimizations, not generate incorrect code. This just fixes the zext at the return. We still insert an i32 ZextAssert when reading a function's arguments, but it is followed by a truncate and another i8 ZextAssert so it is not optimized. llvm-svn: 127766	2011-03-16 22:20:18 +00:00
Cameron Zwarich	40a9200357	Rename a test to be more inclusive. llvm-svn: 127765	2011-03-16 22:20:12 +00:00
Daniel Dunbar	fd95b016fb	Revert r127757, "Patch to a fix dwarf relocation problem on ARM. One-line fix plus the test where it used to break.", which broke Clang self-host of a Debug+Asserts compiler, on OS X. llvm-svn: 127763	2011-03-16 22:16:39 +00:00
Richard Osborne	c871eff3f5	Add XCore intrinsics for setclk, setrdy. llvm-svn: 127761	2011-03-16 21:56:00 +00:00
Renato Golin	a3aeafeb35	Patch to a fix dwarf relocation problem on ARM. One-line fix plus the test where it used to break. llvm-svn: 127757	2011-03-16 21:05:52 +00:00
Cameron Zwarich	49e354bcb6	Add a test for i1 zeroext arguments on x86-64. We currently generate code that conforms to the ABI, but DAGCombine could in theory recognize the sequence of zext asserts and truncates and generate incorrect code. llvm-svn: 127754	2011-03-16 20:15:44 +00:00
Richard Osborne	d4346f2388	Add checkevent intrinsic to check if any resources owned by the current thread can event. llvm-svn: 127741	2011-03-16 18:34:00 +00:00
NAKAMURA Takumi	d60e4101e6	test/CodeGen/X86: FileCheck-ize and add actions for x86_64-linux and x86_64-win32. llvm-svn: 127734	2011-03-16 13:53:07 +00:00
NAKAMURA Takumi	0b9e2b0257	test/CodeGen/X86: Add a pattern for Win64. llvm-svn: 127733	2011-03-16 13:52:51 +00:00
NAKAMURA Takumi	c10801e8a5	test/CodeGen/X86: FileCheck-ize and add explicit -mtriple=x86_64-linux. They are useless to Win64 target. llvm-svn: 127732	2011-03-16 13:52:38 +00:00
NAKAMURA Takumi	662892df27	test/CodeGen/X86/byval*.ll: Win64 has not supported byval yet. llvm-svn: 127731	2011-03-16 13:52:20 +00:00
NAKAMURA Takumi	406f02c9ea	test/CodeGen/X86/dyn-stackalloc.ll: FileCheck-ize. llvm-svn: 127730	2011-03-16 13:52:08 +00:00
Bill Wendling	ebecb33307	Some minor cleanups based on feedback. llvm-svn: 127694	2011-03-15 20:47:26 +00:00
Evan Cheng	42401d6af2	Do not form thumb2 ldrd / strd if the offset is by multiple of 4. rdar://9133587 llvm-svn: 127683	2011-03-15 18:41:52 +00:00
Richard Osborne	5f1a26ea39	On the XCore the scavenging slot should be closest to the SP. llvm-svn: 127680	2011-03-15 15:10:11 +00:00
Richard Osborne	3a68eb150b	Add XCore intrinsics for getps, setps, setsr and clrsr. llvm-svn: 127678	2011-03-15 13:45:47 +00:00
Justin Holewinski	94751fbf32	PTX: Set PTX 2.0 as the minimum supported version - Remove PTX 1.4 code generation - Change type of intrinsics to .v4.i32 instead of .v4.i16 - Add and/or/xor integer instructions llvm-svn: 127677	2011-03-15 13:24:15 +00:00
Evan Cheng	e4b8ac9fef	Add a peephole optimization to optimize pairs of bitcasts. e.g. v2 = bitcast v1 ... v3 = bitcast v2 ... = v3 => v2 = bitcast v1 ... = v1 if v1 and v3 are of in the same register class. bitcast between i32 and fp (and others) are often not nops since they are in different register classes. These bitcast instructions are often left because they are in different basic blocks and cannot be eliminated by dag combine. rdar://9104514 llvm-svn: 127668	2011-03-15 05:13:13 +00:00
Evan Cheng	c5c2cfa381	sext(undef) = 0, because the top bits will all be the same. zext(undef) = 0, because the top bits will be zero. llvm-svn: 127649	2011-03-15 02:22:10 +00:00
Bill Wendling	928de16793	Testcase for r127630. llvm-svn: 127648	2011-03-15 01:49:08 +00:00
Jim Grosbach	3af6fe66b9	Clean up ARM tail calls a bit. They're pseudo-instructions for normal branches. Also more cleanly separate the ARM vs. Thumb functionality. Previously, the encoding would be incorrect for some Thumb instructions (the indirect calls). llvm-svn: 127637	2011-03-15 00:30:40 +00:00
Bill Wendling	e1fd78f2bc	Generate a VTBL instruction instead of a series of loads and stores when we can. As Nate pointed out, VTBL isn't super performant, but it has to be better than this: _shuf: @ BB#0: @ %entry push {r4, r7, lr} add r7, sp, #4 sub sp, #12 mov r4, sp bic r4, r4, #7 mov sp, r4 mov r2, sp vmov d16, r0, r1 orr r0, r2, #6 orr r3, r2, #7 vst1.8 {d16[0]}, [r3] vst1.8 {d16[5]}, [r0] subs r4, r7, #4 orr r0, r2, #5 vst1.8 {d16[4]}, [r0] orr r0, r2, #4 vst1.8 {d16[4]}, [r0] orr r0, r2, #3 vst1.8 {d16[0]}, [r0] orr r0, r2, #2 vst1.8 {d16[2]}, [r0] orr r0, r2, #1 vst1.8 {d16[1]}, [r0] vst1.8 {d16[3]}, [r2] vldr.64 d16, [sp] vmov r0, r1, d16 mov sp, r4 pop {r4, r7, pc} The "illegal" testcase in vext.ll is no longer illegal. <rdar://problem/9078775> llvm-svn: 127630	2011-03-14 23:02:38 +00:00
Eric Christopher	d3cc9fdd8e	Fix this test up a bit. llvm-svn: 127621	2011-03-14 21:05:21 +00:00
Evan Cheng	d2f3b01797	Minor optimization. sign-ext/anyext of undef is still undef. llvm-svn: 127598	2011-03-14 18:15:55 +00:00
Justin Holewinski	fbc8d301bf	PTX: Emit global arrays with proper sizes - Emit all arrays as type .b8 and proper sizes in bytes to conform to the output of nvcc llvm-svn: 127584	2011-03-14 15:40:11 +00:00
Justin Holewinski	8509380f83	PTX: Add support for sqrt/sin/cos intrinsics llvm-svn: 127578	2011-03-14 14:09:33 +00:00
Che-Liang Chiou	a19f075974	ptx: add set.p instruction and related changes to predicate execution llvm-svn: 127577	2011-03-14 11:26:01 +00:00
Eric Christopher	c313d94068	Saving files before committing is overrated. Add a RUN line to this test. llvm-svn: 127520	2011-03-12 01:36:23 +00:00

1 2 3 4 5 ...

4414 Commits