llvm-project

Commit Graph

Author	SHA1	Message	Date
Hans Wennborg	5f916d3df4	[X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440	2016-03-25 18:11:31 +00:00
Hans Wennborg	4ae5119eeb	X86: Use push-pop for materializing 8-bit immediates for minsize (take 2) This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375	2016-03-25 01:10:56 +00:00
Quentin Colombet	cf9732b417	[X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer. cmpxchg[8\|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325	2016-03-12 02:25:27 +00:00
Ahmed Bougacha	bb5d7d7ed8	[X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI. This is long-standing dirtiness, as acknowledged by r77582: The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. Doing this before selection will let us combine away some constructs. Differential Revision: http://reviews.llvm.org/D17659 llvm-svn: 262244	2016-02-29 19:28:07 +00:00
Elena Demikhovsky	e5bbca6ae2	Optimized loading (zextload) of i1 value from memory. This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793. Extra "and" causes performance degradation. We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits. Differential Revision: http://reviews.llvm.org/D17541 llvm-svn: 261828	2016-02-25 07:05:12 +00:00
Davide Italiano	228978c0dc	[X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled. TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387	2016-02-20 00:44:47 +00:00
Craig Topper	e00bffbc13	[X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. It was only needed for rematerialization. llvm-svn: 256818	2016-01-05 07:44:14 +00:00
Craig Topper	9583f51348	[X86] Add OpSize32 to OR32mrLocked instruction to match the normal OR32mr instruction. llvm-svn: 256817	2016-01-05 07:44:11 +00:00
David Majnemer	869be0a4a6	Revert "[X86] Use push-pop for materializing small constants under 'minsize'" The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808	2016-01-05 02:32:06 +00:00
Hans Wennborg	a6a2e512cf	[X86] Use push-pop for materializing small constants under 'minsize' Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936	2015-12-17 23:18:39 +00:00
Hans Wennborg	08d5905bac	[X86] Smaller code for materializing 32-bit 1 and -1 constants "movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 llvm-svn: 255656	2015-12-15 17:10:28 +00:00
Chih-Hung Hsieh	7993e18e80	[X86] Part 2 to fix x86-64 fp128 calling convention. Part 1 was submitted in http://reviews.llvm.org/D15134. Changes in this part: * X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class. * X86CallingConv.td: Pass f128 values in XMM registers or on stack. * X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128. * X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code. * SelectionDAGBuilder.cpp: Fix infinite loop when f128 type can have VT == TLI.getTypeToTransformTo(Ctx, VT). * Add unit tests for x86-64 fp128 type. Differential Revision: http://reviews.llvm.org/D11438 llvm-svn: 255558	2015-12-14 22:08:36 +00:00
Reid Kleckner	420f0542cc	[WinEH] Remove isBarrier from instructions that do not return Fixes machine verification failures with David's latest EH change. llvm-svn: 252541	2015-11-09 23:34:42 +00:00
David Majnemer	2652b75700	[WinEH] Don't emit CATCHRET from visitCatchPad Instead, emit a CATCHPAD node which will get selected to a target specific sequence. llvm-svn: 252528	2015-11-09 23:07:48 +00:00
Reid Kleckner	51460c139e	[WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EH This adds the EH_RESTORE x86 pseudo instr, which is responsible for restoring the stack pointers: EBP and ESP, and ESI if stack realignment is involved. We only need this on 32-bit x86, because on x64 the runtime restores CSRs for us. Previously we had to keep the CATCHRET instruction around during SEH so that we could convince X86FrameLowering to restore our frame pointers. Now we can split these instructions earlier. This was confusing, because we had a return instruction which wasn't really a return and was ultimately going to be removed by X86FrameLowering. This change also simplifies X86FrameLowering, which really shouldn't be building new MBBs. No observable functional change currently, but with the new register mask stuff in D14407, CATCHRET will become a register allocator barrier, and our existing tests rely on us having reasonable register allocation around SEH. llvm-svn: 252266	2015-11-06 01:49:05 +00:00
JF Bastien	2cdd5e4710	x86: preserve flags when folding atomic operations D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. I fixed some of this issue in D13680 but had missed INC/DEC. This patch adds the missing EFLAGS definition. llvm-svn: 250438	2015-10-15 18:24:52 +00:00
Sanjay Patel	85030aa1bd	function names should start with a lower case letter; NFC llvm-svn: 250174	2015-10-13 16:23:00 +00:00
JF Bastien	986ed68eed	x86: preserve flags when folding atomic operations Summary: D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. This patch adds the missing EFLAGS definition. Floating point operations don't set flags, the subsequent fadd optimization is therefore correct. The same applies for surrounding load/store optimizations. Reviewers: rsmith, rtrieu Subscribers: llvm-commits, reames, morisset Differential Revision: http://reviews.llvm.org/D13680 llvm-svn: 250135	2015-10-13 00:28:47 +00:00
Craig Topper	d69d495333	[X86] Remove unnecessary AddComplexity directive. The instruction is already wrapped in the equivalent earlier. NFC llvm-svn: 249369	2015-10-06 02:50:21 +00:00
David Majnemer	f828a0ccc7	[WinEH] Make FuncletLayout more robust against catchret Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052	2015-10-01 18:44:59 +00:00
Reid Kleckner	5b8a46e771	[WinEH] Make funclet return instrs pseudo instrs This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936	2015-09-17 20:43:47 +00:00
Reid Kleckner	7878391208	[WinEH] Add codegen support for cleanuppad and cleanupret All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219	2015-09-10 00:25:23 +00:00
Reid Kleckner	df1295173f	[WinEH] Emit prologues and epilogues for funclets Summary: 32-bit funclets have short prologues that allocate enough stack for the largest call in the whole function. The runtime saves CSRs for the funclet. It doesn't restore CSRs after we finally transfer control back to the parent funciton via a CATCHRET, but that's a separate issue. 32-bit funclets also have to adjust the incoming EBP value, which is what llvm.x86.seh.recoverframe does in the old model. 64-bit funclets need to spill CSRs as normal. For simplicity, this just spills the same set of CSRs as the parent function, rather than trying to compute different CSR sets for the parent function and each funclet. 64-bit funclets also allocate enough stack space for the largest outgoing call frame, like 32-bit. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12546 llvm-svn: 247092	2015-09-08 22:44:41 +00:00
Reid Kleckner	0e2882345d	[WinEH] Add some support for code generating catchpad We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235	2015-08-27 23:27:47 +00:00
Michael Kuperstein	6e3fee07f7	[X86] Remove references to _ftol2 As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925	2015-08-25 07:58:33 +00:00
JF Bastien	0f8a99b62f	x86: NFC remove needless InstrCompiler cast Summary: The casts from String to PatFrag weren't needed if we instead provided an SDNode. This fix was suggested by @pete in D11382. Subscribers: pete, llvm-commits Differential Revision: http://reviews.llvm.org/D11788 llvm-svn: 244167	2015-08-05 23:15:37 +00:00
JF Bastien	8662083770	x86 atomic: optimize a.store(reg op a.load(acquire), release) Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations. Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11382 llvm-svn: 244128	2015-08-05 21:04:59 +00:00
Rafael Espindola	494a381ede	Use small encodings for constants when possible. llvm-svn: 242493	2015-07-17 00:57:52 +00:00
Rafael Espindola	36b718fc74	Avoid a Symbol -> Name -> Symbol conversion. Before this we were producing a TargetExternalSymbol from a MCSymbol. That meant extracting the symbol name and fetching the symbol again down the pipeline. This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the DAG. Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900, allowing r240130 to be committed again. llvm-svn: 240300	2015-06-22 17:46:53 +00:00
Elena Demikhovsky	f61727d880	AVX-512: fixed algorithm of building vectors of i1 elements fixed extract-insert i1 element, load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage. added a bunch of new tests. llvm-svn: 237793	2015-05-20 14:32:03 +00:00
Elena Demikhovsky	c1ac5d7bd5	AVX-512: select operation for i1 vectors like: select i1 %cond, <16 x i1> %a, <16 x i1> %b. I added pseudo-CMOV patterns to resolve the "select". Added tests for KNL and SKX. llvm-svn: 237106	2015-05-12 09:36:52 +00:00
Sergey Dmitrouk	842a51bad8	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	48e93f7181	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	adb4c69d5c	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Craig Topper	7ea899a15f	[X86] Apply AddedComplexity consistently for similar patterns. This keeps them together in the DAGISel tables and reduces table size slightly. llvm-svn: 234086	2015-04-04 04:22:12 +00:00
Craig Topper	3d44178733	[X86] Add a comment about the change in r234075. llvm-svn: 234079	2015-04-04 02:31:43 +00:00
Craig Topper	9012028738	[X86] Don't use GR64 register 'and with immediate' instructions if the immediate is zero in the upper 33-bits or upper 57-bits. Use GR32 instructions instead. Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero. Fixes PR23100. llvm-svn: 234075	2015-04-04 02:08:20 +00:00
Ahmed Bougacha	8f2b4f0be8	[X86] Factor out the CMOV pseudo definitions. NFCI. llvm-svn: 229206	2015-02-14 01:36:53 +00:00
Benjamin Kramer	5f6a907288	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line with countTrailingZeros Update all callers. llvm-svn: 228930	2015-02-12 15:35:40 +00:00
Michael Kuperstein	13fbd45263	[X86] Convert esp-relative movs of function arguments to pushes, step 2 This moves the transformation introduced in r223757 into a separate MI pass. This allows it to cover many more cases (not only cases where there must be a reserved call frame), and perform rudimentary call folding. It still doesn't have a heuristic, so it is enabled only for optsize/minsize, with stack alignment <= 8, where it ought to be a fairly clear win. (Re-commit of r227728) Differential Revision: http://reviews.llvm.org/D6789 llvm-svn: 227752	2015-02-01 16:56:04 +00:00
Michael Kuperstein	e86aa9a8a4	Revert r227728 due to bad line endings. llvm-svn: 227746	2015-02-01 16:15:07 +00:00
Michael Kuperstein	bd57186c76	[X86] Convert esp-relative movs of function arguments to pushes, step 2 This moves the transformation introduced in r223757 into a separate MI pass. This allows it to cover many more cases (not only cases where there must be a reserved call frame), and perform rudimentary call folding. It still doesn't have a heuristic, so it is enabled only for optsize/minsize, with stack alignment <= 8, where it ought to be a fairly clear win. Differential Revision: http://reviews.llvm.org/D6789 llvm-svn: 227728	2015-02-01 11:44:44 +00:00
Michael Kuperstein	90e08320c9	[x32] Change the condition from bitness to LP64 for TCRETURNdi64. TCRETURNmi64, which was mistakenly changed in r227307 will wait for another day. llvm-svn: 227317	2015-01-28 16:11:35 +00:00
Michael Kuperstein	f387611ac2	[x32] Enable sibcall optimization on x32. This includes two things: 1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness). 2) Allow LEA64_32 in MatchingStackOffset. llvm-svn: 227307	2015-01-28 13:38:48 +00:00
Reid Kleckner	e9b8931873	Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics These intrinsics allow multiple functions to share a single stack allocation from one function's call frame. The function with the allocation may only perform one allocation, and it must be in the entry block. Functions accessing the allocation call llvm.recoverframeallocation with the function whose frame they are accessing and a frame pointer from an active call frame of that function. These intrinsics are very difficult to inline correctly, so the intention is that they be introduced rarely, or at least very late during EH preparation. Reviewers: echristo, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D6493 llvm-svn: 225746	2015-01-13 00:48:10 +00:00
Craig Topper	ddbf51f904	[X86] Make isel select the 2-byte register form of INC/DEC even in non-64-bit mode. Convert to the 1-byte form in non-64-bit mode as part of MCInst lowering. Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness. llvm-svn: 225252	2015-01-06 07:35:50 +00:00
Craig Topper	017b830564	[X86] Use 32-bit sign extended immediate for 64-bit LOCK_ArithBinOp with sign extended immediate. llvm-svn: 225098	2015-01-03 00:00:14 +00:00
Craig Topper	c50d64b07b	Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files. llvm-svn: 222801	2014-11-26 00:46:26 +00:00
Michael Kuperstein	3fe15e498f	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Recommitting - This time, with a hopefully working test. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221672	2014-11-11 07:07:40 +00:00
Michael Kuperstein	217e1eec0d	Reverting r221626 due to a too-strict test. llvm-svn: 221629	2014-11-10 21:07:41 +00:00
Michael Kuperstein	3218b942f4	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221626	2014-11-10 20:40:21 +00:00
Robin Morisset	f9e8721564	[X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load()) Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 llvm-svn: 219336	2014-10-08 19:38:18 +00:00
Pavel Chupin	be9f12102f	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247	2014-09-22 13:11:35 +00:00
Robin Morisset	df20586a7a	[X86] Allow atomic operations using immediates to avoid using a register The only valid lowering of atomic stores in the X86 backend was mov from register to memory. As a result, storing an immediate required a useless copy of the immediate in a register. Now these can be compiled as a simple mov. Similarily, adding/and-ing/or-ing/xor-ing an immediate to an atomic location (but through an atomic_store/atomic_load, not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)' instead of using a register. And the same applies to inc/dec. This second point matches the first issue identified in http://llvm.org/bugs/show_bug.cgi?id=17281 llvm-svn: 216980	2014-09-02 22:16:29 +00:00
Reid Kleckner	e704010450	Fix failure to invoke exception handler on Win64 When the last instruction prior to a function epilogue is a call, we need to emit a nop so that the return address is not in the epilogue IP range. This is consistent with MSVC's behavior, and may be a workaround for a bug in the Win64 unwinder. Differential Revision: http://reviews.llvm.org/D4751 Patch by Vadim Chugunov! llvm-svn: 214775	2014-08-04 21:05:27 +00:00
Akira Hatanaka	3516669a50	[X86] Simplify X87 stackifier pass. Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580	2014-08-01 22:19:41 +00:00
Cameron McInally	44f3e30cf2	Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...). llvm-svn: 213073	2014-07-15 16:24:24 +00:00
Cameron McInally	53bc7a3330	Add x86 patterns to match a specific add-with-carry. llvm-svn: 213070	2014-07-15 15:03:32 +00:00
Tim Northover	277066ab43	X86: expand atomics in IR instead of as MachineInstrs. The logic for expanding atomics that aren't natively supported in terms of cmpxchg loops is much simpler to express at the IR level. It also allows the normal optimisations and CodeGen improvements to help out with atomics, instead of using a limited set of possible instructions.. rdar://problem/13496295 llvm-svn: 212119	2014-07-01 18:53:31 +00:00
NAKAMURA Takumi	1db5995d14	Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter. -- This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211691	2014-06-25 12:41:52 +00:00
NAKAMURA Takumi	c403be1991	Reformat. llvm-svn: 211689	2014-06-25 12:40:56 +00:00
NAKAMURA Takumi	d77cefe633	Revert r211399, "Generate native unwind info on Win64" It broke Legacy JIT Tests on x86_64-{mingw32\|msvc}, aka Windows x64. llvm-svn: 211480	2014-06-22 22:00:56 +00:00
Reid Kleckner	4a01230db4	Generate native unwind info on Win64 This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211399	2014-06-20 20:35:47 +00:00
Alexey Volkov	5260dba323	[X86] Use ADD/SUB instead of INC/DEC for Silvermont According to Intel Software Optimization Manual on Silvermont INC or DEC instructions require an additional uop to merge the flags. As a result, a branch instruction depending on an INC or a DEC instruction incurs a 1 cycle penalty. Differential Revision: http://reviews.llvm.org/D3990 llvm-svn: 210466	2014-06-09 11:40:41 +00:00
Jay Foad	a0653a3e6c	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Adam Nemet	d4e56073c7	[X86] Add peephole for masked rotate amount Extend what's currently done for shift because the HW performs this masking implicitly: (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y) I use the newly factored out multiclass that was only supporting shifts so far. For testing I extended my testcase for the new rotation idiom. <rdar://problem/15295856> llvm-svn: 203718	2014-03-12 21:20:55 +00:00
Adam Nemet	b667c3fc26	[X86] Refactor peepholes for masked shift amount into a multiclass The peephole (shift x, (and y, 31)) -> (shift x, y) is repeated for each integer type and each shift variant. To improve this a new multiclass is added that covers all integer types. The shift patterns are now instantiated from this. I am planning to add new instances for rotates as well. No functional change intended: * test/CodeGen/X86/shift-and.ll provides coverage * Compared the expanded tablegen output and matched up the defs for these Pat<>s before and after llvm-svn: 203685	2014-03-12 18:02:33 +00:00
Jim Grosbach	c94d993adf	X86: Enable ISel of 16-bit MOVBE instructions. When the MOVBE instructions are available, use them for 16-bit endian swapping as well as for 32 and 64 bit. The patterns were already present on the instructions, but weren't being matched because the operation was unconditionally marked to 'Expand.' Change that to be conditional on whether the MOVBE instructions are available. Use 'rolw' to implement the in-register version (32 and 64 bit have the dedicated 'bswap' instruction for that). Patch by Louis Gerbarg <lgg@apple.com>. rdar://15479984 llvm-svn: 203524	2014-03-11 00:44:14 +00:00
Craig Topper	fa6298a162	Merge x86 HasOpSizePrefix/HasOpSize16Prefix into a 2-bit OpSize field with 0 meaning no 0x66 prefix in any mode. Rename Opsize16->OpSize32 and OpSize->OpSize16. The classes now refer to their operand size rather than the mode in which they need a 0x66 prefix. Hopefully can merge REX_W into this as OpSize64. llvm-svn: 200626	2014-02-02 09:25:09 +00:00
David Woodhouse	df1e1960ac	[x86] Remove OpSize16 flag from MOV32r0 It's not a real instruction any more and doesn't need encoding information. llvm-svn: 198778	2014-01-08 18:38:26 +00:00
David Woodhouse	956965ca69	[x86] Add OpSize16 to instructions that need it This fixes the bulk of 16-bit output, and the corresponding test case x86-16.s now looks mostly like the x86-32.s test case that it was originally based on. A few irrelevant instructions have been dropped, and there are still some corner cases to be fixed in subsequent patches. llvm-svn: 198752	2014-01-08 12:57:40 +00:00
Craig Topper	792587cc7b	Remove opcode from MOV32r0 that I accidentally left when I converted it to Pseudo. Remove FIXME as well. llvm-svn: 198564	2014-01-05 19:25:13 +00:00
Craig Topper	854f644781	Handle MOV32r0 in expandPostRAPseudo instead of MCInst lowering. No functional change intended. llvm-svn: 198254	2013-12-31 03:05:38 +00:00
Eric Christopher	c0a5aaeab0	[x86] Rename In32BitMode predicate to Not64BitMode That's what it actually means, and with 16-bit support it's going to be a little more relevant since in a few corner cases we may actually want to distinguish between 16-bit and 32-bit mode (for example the bare 'push' aliases to pushw/pushl etc.) Patch by David Woodhouse llvm-svn: 197768	2013-12-20 02:04:49 +00:00
Duncan P. N. Exon Smith	512601d77f	Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS"" This reverts commit r197481, recommiting r197469 with an extra fix. The vastart_save_xmm_regs pseudo-instruction expands to a test and a branch, so it modifies EFLAGS. Mark it so, or else the scheduler might place it in the middle of another test+branch. This fixes a bug exposed by r192750, which changed the initial scheduler to source-order as part of enabling the MI Scheduler for X86. This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to try to save %flags, and adds a test that catches the bad behavior of r197469. <rdar://problem/15627766> llvm-svn: 197503	2013-12-17 15:54:45 +00:00
Duncan P. N. Exon Smith	b2d4274d3f	Revert "Mark vastart_save_xmm_regs as changing EFLAGS" This reverts commit r197469. The sanitizer and dragonegg buildbots are failing, I think because of this change. Reverting until I figure out why. llvm-svn: 197481	2013-12-17 07:13:58 +00:00
Duncan P. N. Exon Smith	a4acde39e9	Mark vastart_save_xmm_regs as changing EFLAGS The vastart_save_xmm_regs pseudo-instruction expands to a test and a branch, so it modifies EFLAGS. Mark it so, or else the scheduler might place it in the middle of another test+branch. This fixes a bug exposed by r192750, which turned on the MI Scheduler for X86. <rdar://problem/15627766> llvm-svn: 197469	2013-12-17 06:12:05 +00:00
Elena Demikhovsky	496656900e	AVX-512: Implemented CMOV for 512-bit vectors llvm-svn: 193747	2013-10-31 13:15:32 +00:00
Eric Christopher	740025745b	Revert part of a fix from 2010, changes since then: a) x86-64 TLS has been documented b) the code path should use movq for the correct relocation to be generated. I've also added a fixme for the test case that we should improve the code generated, it should look something like is documented in the tls abi document. llvm-svn: 192631	2013-10-14 21:52:26 +00:00
Eric Christopher	584d71c6cb	Remove some extraneous whitespace. llvm-svn: 192629	2013-10-14 21:52:18 +00:00
Craig Topper	8956fe0dbc	Mark that the _ftol2 function used by windows on x86 to handle fptoui modifies ECX. llvm-svn: 186787	2013-07-21 07:28:13 +00:00
Tim Northover	3a1fd4c0ac	X86: change MOV64ri64i32 into MOV32ri64 The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. This fixes a typo in the opcode field of the original patch, which should make the legact JIT work again (& adds test for that problem). llvm-svn: 183068	2013-06-01 09:55:14 +00:00
Eric Christopher	e1e57e5ebd	Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it seems to have caused PR16192 and other JIT related failures. llvm-svn: 183059	2013-05-31 23:30:45 +00:00
Tim Northover	d4736d67f4	X86: change MOV64ri64i32 into MOV32ri64 The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. llvm-svn: 182991	2013-05-31 09:57:13 +00:00
Tim Northover	64ec0ff433	X86: use sub-register sequences for MOV*r0 operations Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions, it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg") and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is smaller and partial register updates can sometimes be avoided. Until recently, this sequence was a barrier to rematerialization though. That should now be fixed so it's an appropriate time to make the change. llvm-svn: 182928	2013-05-30 13:19:42 +00:00
Tim Northover	04eb4234fc	X86: change zext moves to use sub-register infrastructure. 32-bit writes on amd64 zero out the high bits of the corresponding 64-bit register. LLVM makes use of this for zero-extension, but until now relied on custom MCLowering and other code to fixup instructions. Now we have proper handling of sub-registers, this can be done by creating SUBREG_TO_REG instructions at selection-time. Should be no change in functionality. llvm-svn: 182921	2013-05-30 10:43:18 +00:00
Jakob Stoklund Olesen	5889ad6cd6	Annotate X86InstrCompiler.td with SchedRW lists. llvm-svn: 177936	2013-03-25 23:07:35 +00:00
Jakob Stoklund Olesen	9bd6b8bd96	Annotate X86InstrCompiler.td with SchedRW lists. Add a new WriteZero SchedWrite type for the common dependency-breaking instructions that clear a register. llvm-svn: 177442	2013-03-19 21:16:56 +00:00
Ulrich Weigand	80d9ad398d	Remove an invalid and unnecessary Pat pattern from the X86 backend: def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))), (MOV64rm tglobaltlsaddr :$dst)>; This pattern is invalid because the MOV64rm instruction expects a source operand of type "i64mem", which is a subclass of X86MemOperand and thus actually consists of five MI operands, but the Pat provides only a single MI operand ("tglobaltlsaddr" matches an SDnode of type ISD::TargetGlobalTLSAddress and provides a single output). Thus, if the pattern were ever matched, subsequent uses of the MOV64rm instruction pattern would access uninitialized memory. In addition, with the TableGen patch I'm about to check in, this would actually be reported as a build-time error. Fortunately, the pattern does in fact never match, for at least two independent reasons. First, the code generator actually never generates a pattern of the form (load (X86Wrapper (tglobaltlsaddr))). For most combinations of TLS and code models, (tglobaltlsaddr) represents just an offset that needs to be added to some base register, so it is never directly dereferenced. The only exception is the initial-exec model, where (tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot, which is in fact directly dereferenced: but in that case, the X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match. Second, even if some patterns along those lines were ever generated, we should not need an extra Pat pattern to match it. Instead, the original MOV64rm instruction pattern ought to match directly, since it uses an "addr" operand, which is implemented via the SelectAddr C++ routine; this routine is supposed to accept the full range of input DAGs that may be implemented by a single mov instruction, including those cases involving ISD::TargetGlobalTLSAddress (and actually does so e.g. in the initial-exec case as above). To avoid build breaks (due to the above-mentioned error) after the TableGen patch is checked in, I'm removing this Pat here. llvm-svn: 177426	2013-03-19 19:49:52 +00:00
Benjamin Kramer	ee23dcb461	X86: Disable cmov-memory patterns on subtargets without cmov. Fixes PR15115. llvm-svn: 175962	2013-02-23 10:40:58 +00:00
Michael Liao	3dffc5e2b7	Fix an issue of pseudo atomic instruction DAG schedule - Add list of physical registers clobbered in pseudo atomic insts Physical registers are clobbered when pseudo atomic instructions are expanded. Add them in clobber list to prevent DAG scheduler to mis-schedule them after these insns are declared side-effect free. - Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 173200	2013-01-22 21:47:38 +00:00
Craig Topper	25cdf92b34	Remove # from the beginning and end of def names. llvm-svn: 171696	2013-01-07 05:26:58 +00:00
Craig Topper	d47a70de9f	Add hasSideEffects=0 to some atomic instructions. llvm-svn: 171122	2012-12-26 23:08:12 +00:00
Michael Liao	97bf363a9e	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989	2012-10-15 22:39:43 +00:00
Benjamin Kramer	302178bf13	X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch for the others. Otherwise it will try to use SSE patterns and fail horribly if sse is disabled. Fixes PR14035. llvm-svn: 165377	2012-10-07 15:34:27 +00:00
Craig Topper	0cb6acb7ce	Remove some encoding bits I forgot to remove from SETB_C16r and SETB_C64r in r165302. llvm-svn: 165303	2012-10-05 06:11:52 +00:00
Craig Topper	9384902ef1	Move expansion of SETB_C(8/16/32/64)r from MCInstLower to ExpandPostRAPseudos and mark them as pseudos in the td file. llvm-svn: 165302	2012-10-05 06:05:15 +00:00
Michael Liao	425c0dbc81	Add 'lock' prefix output support in assembly printer - Instead of embedding 'lock' into each mnemonic of atomic instructions except 'xchg', we teach X86 assembly printer to output 'lock' prefix similar to or consistent with code emitter. llvm-svn: 164659	2012-09-26 05:13:44 +00:00
Michael Liao	2718b20030	Fix 16-bit atomic inst encoding and keep pseudo-inst starting with '#' llvm-svn: 164453	2012-09-22 05:41:15 +00:00
Michael Liao	2456b3ae8c	Fix typo in r164357 llvm-svn: 164452	2012-09-22 03:39:42 +00:00
Michael Liao	7325a9d08e	Fix a typo in r164357 llvm-svn: 164372	2012-09-21 16:03:03 +00:00
Michael Liao	c33bebff52	Revise td of X86 atomic instructions - Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357	2012-09-21 03:00:17 +00:00
Michael Liao	3237662b65	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Jakob Stoklund Olesen	3cf3ffce24	Fix the TCRETURNmi64 bug differently. Add a PatFrag to match X86tcret using 6 fixed registers or less. This avoids folding loads into TCRETURNmi64 using 7 or more volatile registers. <rdar://problem/12282281> llvm-svn: 163819	2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen	78b9f8fc67	Revert r163761 "Don't fold indexed loads into TCRETURNmi64." The patch caused "Wrong topological sorting" assertions. llvm-svn: 163810	2012-09-13 16:52:17 +00:00
Jakob Stoklund Olesen	bfacef45eb	Don't fold indexed loads into TCRETURNmi64. We don't have enough GR64_TC registers when calling a varargs function with 6 arguments. Since %al holds the number of vector registers used, only %r11 is available as a scratch register. This means that addressing modes using both base and index registers can't be folded into TCRETURNmi64. <rdar://problem/12282281> llvm-svn: 163761	2012-09-13 00:25:00 +00:00
Hans Wennborg	789acfb63d	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Jakob Stoklund Olesen	7e21d617ef	Use ptr_rc_tailcall instead of GR32_TC. The getPointerRegClass() hook will return GR32_TC, or whatever is appropriate for the current function. Patch by Yiannis Tsiouris! llvm-svn: 156459	2012-05-09 01:50:09 +00:00
Manman Ren	ef4e0479ec	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312	2012-05-07 18:06:23 +00:00
Rafael Espindola	ba0a6cabb8	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Lang Hames	5569ce7d56	Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode. llvm-svn: 153680	2012-03-29 19:54:28 +00:00
Preston Gurd	48ccc4df0b	This patch adds X86 instruction itineraries for non-pseudo opcodes in X86InstrCompiler.td. It also adds –mcpu-generic to the legalize-shift-64.ll test so the test will pass if run on an Intel Atom CPU, which would otherwise produce an instruction schedule which differs from that which the test expects. llvm-svn: 153033	2012-03-19 14:10:12 +00:00
Michael J. Spencer	248d65e78b	Add WIN_FTOL_* psudo-instructions to model the unique calling convention used by the Win32 _ftol2 runtime function. Patch by Joe Groff! llvm-svn: 151382	2012-02-24 19:01:22 +00:00
Jakob Stoklund Olesen	97e3115dc2	Use the same CALL instructions for Windows as for everything else. The different calling conventions and call-preserved registers are represented with regmask operands that are added dynamically. llvm-svn: 150708	2012-02-16 17:56:02 +00:00
Eli Friedman	206ca569aa	Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768. llvm-svn: 148240	2012-01-16 16:42:21 +00:00
Eli Friedman	75e3db4c7a	Get rid of unused codegen-only instruction. llvm-svn: 148239	2012-01-16 16:29:35 +00:00
Benjamin Kramer	5b3aa60b44	X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63. llvm-svn: 148024	2012-01-12 12:41:34 +00:00
Chandler Carruth	7e9453e916	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chandler Carruth	24680c24d8	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Rafael Espindola	b3285224cd	Fixes an issue reported by -verify-machineinstrs. Patch by Sanjoy Das. llvm-svn: 143064	2011-10-26 21:16:41 +00:00
Rafael Espindola	66393c127d	This commit introduces two fake instructions MORESTACK_RET and MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET followed by a MOV respectively. Having a fake instruction prevents the verifier from seeing a MachineBasicBlock end with a non-terminator (MOV). It also prevents the rather eccentric case of a MachineBasicBlock ending with RET but having successors nevertheless. Patch by Sanjoy Das. llvm-svn: 143062	2011-10-26 21:12:27 +00:00
Eli Friedman	d68a727bd0	Fix the assembler strings for a couple of atomic instructions. Doesn't really matter much in practice, but it's a bit cleaner. llvm-svn: 139563	2011-09-13 00:27:04 +00:00
Eli Friedman	02f2f89a98	Fix atomic load and store on x86 to pass -verify-machineinstrs (and possibly fix some subtle bugs involving passes which check mayStore()). This isn't exactly ideal, but it is good enough for the moment. llvm-svn: 139245	2011-09-07 18:48:32 +00:00
Jakob Stoklund Olesen	1f72dd40c7	Pseudo CMOV instructions don't clobber EFLAGS. The explanation about a 0 argument being materialized as xor is no longer valid. Rematerialization will check if EFLAGS is live before clobbering it. The code produced by X86TargetLowering::EmitLoweredSelect does not clobber EFLAGS. This causes one less testb instruction to be generated in the cmov.ll test case. llvm-svn: 139057	2011-09-02 23:52:55 +00:00
Rafael Espindola	3353017668	Adds a SelectionDAG node X86SegAlloca which will be custom lowered from DYNAMIC_STACKALLOC. Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which will match X86SegAlloca (based on word size) are also added. They will be custom emitted to inject the actual stack handling code. Patch by Sanjoy Das. llvm-svn: 138814	2011-08-30 19:43:21 +00:00
Eli Friedman	5e5704277f	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. llvm-svn: 138660	2011-08-26 21:21:21 +00:00
Eli Friedman	342e8df0e0	Basic x86 code generation for atomic load and store instructions. llvm-svn: 138478	2011-08-24 20:50:09 +00:00
Bruno Cardoso Lopes	72323966c8	Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556 llvm-svn: 137179	2011-08-09 23:27:13 +00:00
Eli Friedman	4ef2426b87	Fix a couple ridiculous copy-paste errors. rdar://9914773 . llvm-svn: 137160	2011-08-09 22:17:39 +00:00
Eli Friedman	e6d1853e74	X86ISD::MEMBARRIER does not require SSE2; it doesn't actually generate any code, and all x86 processors will honor the required semantics. llvm-svn: 136249	2011-07-27 19:43:50 +00:00
Dan Gohman	8eb36ef497	Add a comment describing why transforming (shl x, 1) to (add x, x) is to be considered safe enough in this context. llvm-svn: 133159	2011-06-16 15:55:48 +00:00
Benjamin Kramer	e30b70073a	X86: smulo -> add is now done target-independently in DAGCombiner, remove the patterns. llvm-svn: 131801	2011-05-21 18:32:01 +00:00
Stuart Hastings	91f1d24736	Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends. rdar://problem/8614450 llvm-svn: 131746	2011-05-20 19:04:40 +00:00
Stuart Hastings	c72240bbd9	Reverting 131641 to investigate 'bot complaint. llvm-svn: 131654	2011-05-19 17:54:42 +00:00
Stuart Hastings	b476b0cc9f	Revise MOVSX16rr8/MOVZX16rr8 (and rm variants) to no longer be pseudos. rdar://problem/8614450 llvm-svn: 131641	2011-05-19 16:59:50 +00:00
Eric Christopher	a1d9e29552	Support XOR and AND optimization with no return value. Finishes off rdar://8470697 llvm-svn: 131458	2011-05-17 08:10:18 +00:00
Eric Christopher	4a34e61e53	Optimize atomic lock or that doesn't use the result value. Next up: xor and and. Part of rdar://8470697 llvm-svn: 131171	2011-05-10 23:57:45 +00:00
Eric Christopher	e33464663f	Refactor lock versions of binary operators to be a little less cut and paste. llvm-svn: 131139	2011-05-10 18:36:16 +00:00
Benjamin Kramer	d724a590e5	X86: Add a bunch of peeps for add and sub of SETB. "b + ((a < b) ? 1 : 0)" compiles into cmpl %esi, %edi adcl $0, %esi instead of cmpl %esi, %edi sbbl %eax, %eax andl $1, %eax addl %esi, %eax This saves a register, a false dependency on %eax (Intel's CPUs still don't ignore it) and it's shorter. llvm-svn: 131070	2011-05-08 18:36:07 +00:00
Dan Gohman	f0f8e14370	The labyrinthine X86 backend no longer appears to require these patterns. llvm-svn: 125759	2011-02-17 18:50:19 +00:00
NAKAMURA Takumi	0cfdac078e	Target/X86: Tweak win64's tailcall. llvm-svn: 124272	2011-01-26 02:04:09 +00:00
NAKAMURA Takumi	9d29eff198	Fix whitespace. llvm-svn: 124270	2011-01-26 02:03:37 +00:00
Eric Christopher	542f8a5221	The stub routine that we're calling uses test and so clobbers the flags. llvm-svn: 123712	2011-01-18 01:37:20 +00:00
Chris Lattner	46b9efcad7	We lower setb to sbb with the hope that the and will go away, when it doesn't, match it back to setb. On a 64-bit version of the testcase before we'd get: movq %rdi, %rax addq %rsi, %rax sbbb %dl, %dl andb $1, %dl ret now we get: movq %rdi, %rax addq %rsi, %rax setb %dl ret llvm-svn: 122217	2010-12-20 01:16:03 +00:00
Chris Lattner	9edf3f50bf	improve the setcc -> setcc_carry optimization to happen more consistently by moving it out of lowering into dag combine. Add some missing patterns for matching away extended versions of setcc_c. llvm-svn: 122201	2010-12-19 22:08:31 +00:00
Evan Cheng	be69d8e2f3	Only rr forms of ADD*_DB are commutable. llvm-svn: 121908	2010-12-15 22:57:36 +00:00
Eric Christopher	c2dc95ae00	Add rsp to the uses for the same reason as 32-bit. llvm-svn: 121328	2010-12-09 00:26:41 +00:00
Rafael Espindola	c4774795ce	Move lowering of TLS_addr32 and TLS_addr64 to X86MCInstLower. llvm-svn: 120263	2010-11-28 21:16:39 +00:00
Rafael Espindola	5d882894d8	Lower TLS_addr32 and TLS_addr64. llvm-svn: 120225	2010-11-27 20:43:02 +00:00
Chris Lattner	941c19b7ba	reject instructions that contain a \n in their asmstring. Mark various X86 and ARM instructions that are bitten by this as isCodeGenOnly, as they are. llvm-svn: 117884	2010-11-01 00:46:16 +00:00

1 2 3 4 5 ...

276 Commits