Commit Graph

1332 Commits

Author SHA1 Message Date
Bill Wendling bae6b2cca3 Reapply r79127. It was fixed by d0k.
llvm-svn: 79136
2009-08-15 21:21:19 +00:00
Bill Wendling d3fade656f Revert r79127. It was causing compilation errors.
llvm-svn: 79135
2009-08-15 21:14:01 +00:00
Evan Cheng 52d4e64711 Change allowsUnalignedMemoryAccesses to take type argument since some targets
support unaligned mem access only for certain types. (Should it be size
instead?)

ARM v7 supports unaligned access for i16 and i32, some v6 variants support it
as well.

llvm-svn: 79127
2009-08-15 19:23:44 +00:00
Dan Gohman 0700a56860 On x86-64, for a varargs function, don't store the xmm registers to
the register save area if %al is 0. This avoids touching xmm
regsiters when they aren't actually used.

llvm-svn: 79061
2009-08-15 01:38:56 +00:00
Anton Korobeynikov 75821f7c69 Properly handle indirect win64 args when they're passed in memory
llvm-svn: 79009
2009-08-14 18:19:10 +00:00
Owen Anderson 55f1c09e31 Push LLVMContexts through the IntegerType APIs.
llvm-svn: 78948
2009-08-13 21:58:54 +00:00
Owen Anderson c6daf8f17c Fix warnings.
llvm-svn: 78725
2009-08-11 21:59:30 +00:00
Owen Anderson 9f94459d24 Split EVT into MVT and EVT, the former representing _just_ a primitive type, while
the latter is capable of representing either a primitive or an extended type.

llvm-svn: 78713
2009-08-11 20:47:22 +00:00
Owen Anderson 53aa7a960c Rename MVT to EVT, in preparation for splitting SimpleValueType out into its own struct type.
llvm-svn: 78610
2009-08-10 22:56:29 +00:00
Owen Anderson 3e77df2bcd SimpleValueType-ify a few more methods on TargetLowering.
llvm-svn: 78595
2009-08-10 20:46:15 +00:00
Owen Anderson 246617857f Continue the SimpleValueType-ification.
llvm-svn: 78593
2009-08-10 20:18:46 +00:00
Owen Anderson c30530d105 Start moving TargetLowering away from using full MVTs and towards SimpleValueType, which will simplify the privatization of IntegerType in the future.
llvm-svn: 78584
2009-08-10 18:56:59 +00:00
Anton Korobeynikov 741ea0d7fd Better handle kernel code model. Also, generalize the things and fix one
subtle bug with small code model.

llvm-svn: 78255
2009-08-05 23:01:26 +00:00
Dan Gohman f9bbcd1afd Major calling convention code refactoring.
Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.

This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.

This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.

llvm-svn: 78142
2009-08-05 01:29:28 +00:00
Anton Korobeynikov 7d80ab1593 Perform bitconvert to proper type
llvm-svn: 77965
2009-08-03 08:14:14 +00:00
Anton Korobeynikov 442beabbf7 Add 'Indirect' LocInfo class and use to pass __m128 on win64. Also minore fixes here and there (mostly __m64).
llvm-svn: 77964
2009-08-03 08:13:56 +00:00
Anton Korobeynikov 72bc3846bc Cleanup Darwin MMX calling conv stuff - make the stuff more generic. This also fixes a subtle bug, when 6th v1i64 argument passed wrongly.
llvm-svn: 77963
2009-08-03 08:13:24 +00:00
Anton Korobeynikov 71386e08fe Unbreak Win64 CC. Step one: honour register save area, fix some alignment and provide a different set of call-clobberred registers.
llvm-svn: 77962
2009-08-03 08:12:53 +00:00
Rafael Espindola 854d34a9fb Remove a bitcast that was a no-op.
Thanks to Eli Friedman for noticing it.

llvm-svn: 77942
2009-08-03 03:00:05 +00:00
Rafael Espindola 18ba271a79 Use movq to move 64 bits in and out of mmx registers.
Fixes PR4669

llvm-svn: 77940
2009-08-03 02:45:34 +00:00
Dan Gohman c278621694 Minor code cleanups.
llvm-svn: 77795
2009-08-01 19:14:37 +00:00
Chris Lattner 51d5b43cda refactor section construction in TLOF to be through an explicit
initialize method, which can be called when an MCContext is available.

llvm-svn: 77687
2009-07-31 17:42:42 +00:00
Dan Gohman 013f007762 Rename GRAD to GR32_AD, to follow the naming convention of other
classes. And define its SubRegClassList.

llvm-svn: 77601
2009-07-30 17:02:08 +00:00
Evan Cheng e62288fdd4 Optimize some common usage patterns of atomic built-ins __sync_add_and_fetch() and __sync_sub_and_fetch.
When the return value is not used (i.e. only care about the value in the memory), x86 does not have to use add to implement these. Instead, it can use add, sub, inc, dec instructions with the "lock" prefix.

This is currently implemented using a bit of instruction selection trick. The issue is the target independent pattern produces one output and a chain and we want to map it into one that just output a chain. The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. DAG combiner can then transform the node before it gets to target node selection.

Problem #2 is we are adding a whole bunch of x86 atomic instructions when in fact these instructions are identical to the non-lock versions. We need a way to add target specific information to target nodes and have this information carried over to machine instructions. Asm printer (or JIT) can use this information to add the "lock" prefix.

llvm-svn: 77582
2009-07-30 08:33:02 +00:00
Eric Christopher 77268a56ff Add llvm_unreachable for ... unreachable code!
llvm-svn: 77480
2009-07-29 18:14:04 +00:00
Chris Lattner 6b6dbb3bd8 whitespace cleanup.
llvm-svn: 77438
2009-07-29 05:48:09 +00:00
Eric Christopher 99f5534296 Fix comment.
llvm-svn: 77415
2009-07-29 01:01:19 +00:00
Eric Christopher f7802a33ce Add support for gcc __builtin_ia32_ptest{z,c,nzc} intrinsics. Lower
to ptest instruction plus setcc. Revamp ptest instruction. Add test.

llvm-svn: 77407
2009-07-29 00:28:05 +00:00
Owen Anderson 4aa3295a65 Return ConstantVector to 2.5 API.
llvm-svn: 77366
2009-07-28 21:19:26 +00:00
Chris Lattner a3242e93b7 the apple "ld_classic" linker doesn't support .literal16 in 32-bit
mode, and "ld64" (the default linker) falls back to it in -static
mode.

llvm-svn: 77334
2009-07-28 17:50:28 +00:00
Chris Lattner 5e693ed07b Rip all of the global variable lowering logic out of TargetAsmInfo. Since
it is highly specific to the object file that will be generated in the end,
this introduces a new TargetLoweringObjectFile interface that is implemented
for each of ELF/MachO/COFF/Alpha/PIC16 and XCore.

Though still is still a brutal and ugly refactoring, this is a major step
towards goodness.

This patch also:
1. fixes a bunch of dangling pointer problems in the PIC16 backend.
2. disables the TargetLowering copy ctor which PIC16 was accidentally using.
3. gets us closer to xcore having its own crazy target section flags and
   pic16 not having to shadow sections with its own objects.
4. fixes wierdness where ELF targets would set CStringSection but not
   CStringSection_.  Factor the code better.
5. fixes some bugs in string lowering on ELF targets.

llvm-svn: 77294
2009-07-28 03:13:23 +00:00
Owen Anderson 69c464dec4 Move ConstantFP construction back to the 2.5-ish API.
llvm-svn: 77247
2009-07-27 20:59:43 +00:00
Owen Anderson edb4a70325 Revert the ConstantInt constructors back to their 2.5 forms where possible, thanks to contexts-on-types. More to come.
llvm-svn: 77011
2009-07-24 23:12:02 +00:00
Eric Christopher f37ea3ad75 Update insertps handling based on feedback. Move to a v4f32 style
to support vector arguments and scalar arguments correctly. Update
lowering and fix comment to refer to pinsr* instead of insertps.

llvm-svn: 76921
2009-07-24 00:33:09 +00:00
Eli Friedman caccc0081a Add support for MMX VSETCC.
llvm-svn: 76713
2009-07-22 01:06:52 +00:00
Owen Anderson 47db941fd3 Get rid of the Pass+Context magic.
llvm-svn: 76702
2009-07-22 00:24:57 +00:00
Eli Friedman da9eda8ef6 Remove shift amount flavor. It isn't actually complete enough to
be useful, and it's currently unused.  (Some issues: it isn't actually 
rich enough to capture the semantics on many architectures, and
semantics can vary depending on the type being shifted.)

llvm-svn: 76633
2009-07-21 20:12:16 +00:00
Dale Johannesen 68cb8d0310 revert 76503 while I figure out what's going on
llvm-svn: 76517
2009-07-21 00:12:29 +00:00
Dale Johannesen a2d2adfa4e Make sure a global matching asm 'i' constraint gets its
flags set properly.  (hasMemory is clearly irrelevant
when matching 'i', I don't understand what this was
supposed to be doing.)
gcc.apple/asm-block-25.c (test passed before by
accident, but generated code was wrong)

llvm-svn: 76503
2009-07-20 23:39:13 +00:00
Chris Lattner 5849d22bd1 Copy ExpandInlineAsm to TargetLowering from TargetAsmInfo.
llvm-svn: 76441
2009-07-20 17:51:36 +00:00
Evan Cheng 18fe458103 Fix x86 inline ams 'q' constraint support. In 32-bit mode, it's just like 'Q', i.e. EAX, EDX, ECX, EBX. In 64-bit mode, it just means all the i64r registers. Yeah, that makes sense.
llvm-svn: 76248
2009-07-17 22:13:25 +00:00
Owen Anderson f945a9ed07 Move a few more convenience factory functions from Constant to LLVMContext.
llvm-svn: 75840
2009-07-15 21:51:10 +00:00
Torok Edwin fbcc663cbf llvm_unreachable->llvm_unreachable(0), LLVM_UNREACHABLE->llvm_unreachable.
This adds location info for all llvm_unreachable calls (which is a macro now) in
!NDEBUG builds.
In NDEBUG builds location info and the message is off (it only prints
"UREACHABLE executed").

llvm-svn: 75640
2009-07-14 16:55:14 +00:00
Chris Lattner e91900097e Fix PR4533, which is about buggy codegen in x86-64 -static mode.
Basically, using:
  lea symbol(%rip), %rax

is not valid in -static mode, because the current RIP may not be
within 32-bits of "symbol" when an app is built partially pic and
partially static.  The fix for this is to compile it to:

  lea symbol, %rax

It would be better to codegen this as:

  movq $symbol, %rax

but that will come next.


The hard part of fixing this bug was fixing abi-isel, which was actively
testing for the wrong behavior.  Also, the RUN lines are completely impossible
to understand what they are testing.  To help with this, convert the -static 
x86-64 codegen tests to use filecheck.  This is much more stable and makes it
more clear what the codegen is expected to be.

llvm-svn: 75382
2009-07-11 20:29:19 +00:00
Torok Edwin 56d0659726 assert(0) -> LLVM_UNREACHABLE.
Make llvm_unreachable take an optional string, thus moving the cerr<< out of
line.
LLVM_UNREACHABLE is now a simple wrapper that makes the message go away for
NDEBUG builds.

llvm-svn: 75379
2009-07-11 20:10:48 +00:00
Chris Lattner 21c2940553 remove the now-dead TM argument to these methods.
llvm-svn: 75276
2009-07-10 21:00:45 +00:00
Chris Lattner e2f524f176 add a couple of predicates to test for "stub style pic in PIC mode" and "stub style pic in dynamic-no-pic" mode.
llvm-svn: 75273
2009-07-10 20:47:30 +00:00
Chris Lattner 93f0f79178 eliminate GVRequiresRegister, replacing it with predicates we
need for other purposes.

llvm-svn: 75243
2009-07-10 07:38:24 +00:00
Chris Lattner 2ff35f528c change a bunch of logic in LowerGlobalAddress to leverage the work
done in ClassifyGlobalReference instead of reconstructing the info
awkwardly.

llvm-svn: 75240
2009-07-10 07:34:39 +00:00
Chris Lattner dc842c06c2 move some classification logic around. Now GVRequiresExtraLoad
is just a trivial wrapper around "ClassifyGlobalReference", which
stole a ton of logic from LowerGlobalAddress.

llvm-svn: 75237
2009-07-10 07:20:05 +00:00
Chris Lattner ca9d784bf1 change isGlobalStubReference to take target flags instead of a MachineOperand.
llvm-svn: 75236
2009-07-10 06:29:59 +00:00
Chris Lattner b9af63a4d2 GVRequiresExtraLoad is now never used for calls, simplify it based on this.
llvm-svn: 75232
2009-07-10 05:52:02 +00:00
Chris Lattner ace6ec26d9 actually, just eliminate PCRelGVRequiresExtraLoad. It makes the code
more complex and slow than just directly testing what we care about.

llvm-svn: 75231
2009-07-10 05:48:03 +00:00
Chris Lattner 7277a807f0 There is only one case where GVRequiresExtraLoad returns true for calls:
split its handling out to PCRelGVRequiresExtraLoad, and simplify code
based on this.

llvm-svn: 75230
2009-07-10 05:45:15 +00:00
Chris Lattner 1cc7ae7c3b the "isDirectCall" operand of GVRequiresRegister is always false, eliminate it.
llvm-svn: 75229
2009-07-10 05:37:11 +00:00
Owen Anderson 0504e0a222 Thread LLVMContext through MVT and related parts of SDISel.
llvm-svn: 75153
2009-07-09 17:57:24 +00:00
Chris Lattner 212c0506cd simplify this logic a bit more.
llvm-svn: 75118
2009-07-09 07:02:30 +00:00
Chris Lattner 72e3deca47 move reasoning about darwin $non_lazy_ptr stubs from asmprinter into
isel.

llvm-svn: 75117
2009-07-09 06:59:17 +00:00
Chris Lattner e0dae0e716 make isel use MO_PIC_BASE_OFFSET when lowering globalvalues on darwin in pic
mode, instead of having asmprinter just "know" to print them.

llvm-svn: 75109
2009-07-09 05:47:33 +00:00
Chris Lattner d047d06358 make isel decide whether to emit $stub's on darwin instead of asmprinter.
llvm-svn: 75107
2009-07-09 05:27:35 +00:00
Chris Lattner d5d80ab6b6 Make isel determine where to emit PLT-relative calls instead of having
asmprinter do it.

llvm-svn: 75104
2009-07-09 05:02:21 +00:00
Chris Lattner fef11d6e77 simplify some code based on the fact that picstyles != none are only valid
in pic or dynamic-no-pic mode. Also, x86-64 never used picstylegot.

llvm-svn: 75101
2009-07-09 04:39:06 +00:00
Chris Lattner 62f568c037 all this logic always returns true because GOT mode is never active in x86-64 mode.
Simplify it away, someone should evaluate this.

llvm-svn: 75100
2009-07-09 04:27:47 +00:00
Chris Lattner 3945dd0c44 isPICStyleRIPRel() and friends are never true in -static mode.
Simplify code based on this.

llvm-svn: 75099
2009-07-09 04:24:46 +00:00
Chris Lattner 1c5bf9d26d When in -static mode, force the PIC style to none. Doing this requires fixing
code which conflated RIPRel PIC with x86-64.  Fix these to just check for X86-64
directly.

llvm-svn: 75092
2009-07-09 03:15:51 +00:00
Chris Lattner 384a738b4a merge two identical functions and simplify things that are GOT specific
llvm-svn: 75091
2009-07-09 02:55:47 +00:00
Chris Lattner 4d1e9bf717 hoist check for IsTailCall to callers. Eliminate redundant check for
x86-64: GOT-style PIC is never used on x86-64.

llvm-svn: 75090
2009-07-09 02:46:53 +00:00
Chris Lattner 88765d4831 change a few methods to be static functions.
llvm-svn: 75089
2009-07-09 02:44:11 +00:00
Chris Lattner 47f64ea174 move handling of dllimport linkage in isel, not in asmprinter.
llvm-svn: 75086
2009-07-09 00:58:53 +00:00
Torok Edwin fb8d6d5b58 Implement changes from Chris's feedback.
Finish converting lib/Target.

llvm-svn: 75043
2009-07-08 20:53:28 +00:00
Torok Edwin fa04002254 Convert more abort() calls to llvm_report_error().
Also remove trailing semicolon.

llvm-svn: 75027
2009-07-08 19:04:27 +00:00
Torok Edwin 6dd2730024 Start converting to new error handling API.
cerr+abort -> llvm_report_error
assert(0)+abort -> LLVM_UNREACHABLE (assert(0)+llvm_unreachable-> abort() included)

llvm-svn: 75018
2009-07-08 18:01:40 +00:00
Dale Johannesen 56a53d02e1 Don't accept globals as matching 'i' constraint
in PIC modes (in accordance with existing comment).
gcc.apple/asm-block-25.c

llvm-svn: 74886
2009-07-07 00:18:49 +00:00
Tilmann Scheller aea6059ed4 Add NumFixedArgs attribute to CallSDNode which indicates the number of fixed arguments in a vararg call.
With the SVR4 ABI on PowerPC, vector arguments for vararg calls are passed differently depending on whether they are a fixed or a variable argument. Variable vector arguments always go into memory, fixed vector arguments are put 
into vector registers. If there are no free vector registers available, fixed vector arguments are put on the stack.

The NumFixedArgs attribute allows to decide for an argument in a vararg call whether it belongs to the fixed or variable portion of the parameter list.

llvm-svn: 74764
2009-07-03 06:44:53 +00:00
Bill Wendling 512ff7353e Update comments to make it clear that the function alignment is the Log2 of the
bytes and not bytes.

llvm-svn: 74624
2009-07-01 18:50:55 +00:00
Bill Wendling 31ceb1bcba Add an "alignment" field to the MachineFunction object. It makes more sense to
have the alignment be calculated up front, and have the back-ends obey whatever
alignment is decided upon.

This allows for future work that would allow for precise no-op placement and the
like.

llvm-svn: 74564
2009-06-30 22:38:32 +00:00
David Greene 8adf1fdc80 Add a 256-bit register class and YMM registers.
llvm-svn: 74469
2009-06-29 22:50:51 +00:00
Owen Anderson 45c299ef65 Add a target-specific DAG combine on X86 to fold the common pattern of
fence-atomic-fence down to just the atomic op.  This is possible thanks to
X86's relatively strong memory model, which guarantees that locked instructions
(which are used to implement atomics) are implicit fences.

llvm-svn: 74435
2009-06-29 18:04:45 +00:00
David Greene f92ba97cda Add more vector ValueTypes for AVX and other extended vector instruction
sets.

llvm-svn: 74427
2009-06-29 16:47:10 +00:00
Chris Lattner ae0acfcef0 pull @GOT, @GOTOFF, @GOTPCREL handling into isel from the asmprinter.
llvm-svn: 74378
2009-06-27 05:39:56 +00:00
Chris Lattner fea81da433 Reimplement rip-relative addressing in the X86-64 backend. The new
implementation primarily differs from the former in that the asmprinter
doesn't make a zillion decisions about whether or not something will be
RIP relative or not.  Instead, those decisions are made by isel lowering
and propagated through to the asm printer.  To achieve this, we:

1. Represent RIP relative addresses by setting the base of the X86 addr
   mode to X86::RIP.
2. When ISel Lowering decides that it is safe to use RIP, it lowers to
   X86ISD::WrapperRIP.  When it is unsafe to use RIP, it lowers to
   X86ISD::Wrapper as before.
3. This removes isRIPRel from X86ISelAddressMode, representing it with
   a basereg of RIP instead.
4. The addressing mode matching logic in isel is greatly simplified.
5. The asmprinter is greatly simplified, notably the "NotRIPRel" predicate
   passed through various printoperand routines is gone now.
6. The various symbol printing routines in asmprinter now no longer infer
   when to emit (%rip), they just print the symbol.

I think this is a big improvement over the previous situation.  It does have
two small caveats though: 1. I implemented a horrible "no-rip" modifier for
the inline asm "P" constraint modifier.  This is a short term hack, there is
a much better, but more involved, solution.  2. I had to xfail an 
-aggressive-remat testcase because it isn't handling the use of RIP in the
constant-pool reading instruction.  This specific test is easy to fix without
-aggressive-remat, which I intend to do next.

llvm-svn: 74372
2009-06-27 04:16:01 +00:00
Chris Lattner 49ed726e46 Move all the TLS processing logic into isel, don't do it in asmprinter at all.
llvm-svn: 74327
2009-06-26 21:20:29 +00:00
Chris Lattner 2ed6a9d7bd move magic for PIC constantpool references from asmprinter to isel.
llvm-svn: 74313
2009-06-26 19:22:52 +00:00
Chris Lattner 2aaad91bbe start adding logic in isel to determine asm printer semantics, step N of M.
llvm-svn: 74246
2009-06-26 00:43:52 +00:00
Chris Lattner a3da048c82 indentation fix
llvm-svn: 73840
2009-06-21 02:22:34 +00:00
Eli Friedman 48021d15d6 Misc accumulated tweaks to legalization logic for various targets.
llvm-svn: 73476
2009-06-16 06:40:59 +00:00
Chris Lattner c68a564cdd I got J and K backward, many thanks to Eli for spotting this!
llvm-svn: 73372
2009-06-15 04:39:05 +00:00
Chris Lattner ea3621a6b1 implement support for the 'K' asm constraint, PR4347
llvm-svn: 73366
2009-06-15 04:01:39 +00:00
Arnold Schwaighofer e3a018d707 Fix Bug 4278: X86-64 with -tailcallopt calling convention
out of sync with regular cc.

The only difference between the tail call cc and the normal
cc was that one parameter register - R9 - was reserved for
calling functions through a function pointer. After time the
tail call cc has gotten out of sync with the regular cc. 

We can use R11 which is also caller saved but not used as
parameter register for potential function pointers and
remove the special tail call cc on x86-64.

llvm-svn: 73233
2009-06-12 16:26:57 +00:00
Anton Korobeynikov 06039d1190 Silence a warning
llvm-svn: 73152
2009-06-09 23:00:39 +00:00
Eli Friedman 0d4234416f Get rid of some unnecessary code.
llvm-svn: 73017
2009-06-07 07:28:45 +00:00
Eli Friedman 3234587213 Slightly generalize the code that handles shuffles of consecutive loads
on x86 to handle more cases.  Fix a bug in said code that would cause it 
to read past the end of an object.  Rewrite the code in 
SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general. 
Remove PerformBuildVectorCombine, which is no longer necessary with 
these changes.  In addition to simplifying the code, with this change, 
we can now catch a few more cases of consecutive loads.

llvm-svn: 73012
2009-06-07 06:52:44 +00:00
Eli Friedman 75c496f920 Avoid crashing on a variable-index insertelement with element type i16.
llvm-svn: 72991
2009-06-06 06:32:50 +00:00
Eli Friedman 1b1844ad1f Get rid of some bogus patterns for X86vzmovl. Don't create VZEXT_MOVL
nodes for vectors with an i16 element type.  Add an optimization for 
building a vector which is all zeros/undef except for the bottom 
element, where the bottom element is an i8 or i16.

llvm-svn: 72988
2009-06-06 06:05:10 +00:00
Eli Friedman b45e8ce69a PR2598: make sure to expand illegal forms of integer/floating-point
conversions for x86, like <2 x i32> -> <2 x float> and <4 x i16> -> 
<4 x float>.

llvm-svn: 72983
2009-06-06 03:57:58 +00:00
Devang Patel d1c7d34924 Add new function attribute - noimplicitfloat
Update code generator to use this attribute and remove NoImplicitFloat target option.
Update llc to set this attribute when -no-implicit-float command line option is used.

llvm-svn: 72959
2009-06-05 21:57:13 +00:00
Nate Begeman 624690c6b2 Adapt the x86 build_vector dagcombine to the current state of the legalizer.
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector 
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.

Teach the build_vector dag combine in x86 back end to recognize consecutive 
loads producing the low part of the vector.

Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.

Add a testcase for the transform.

Old:
	subl	$28, %esp
	movl	32(%esp), %eax
	movl	4(%eax), %ecx
	movl	%ecx, 4(%esp)
	movl	(%eax), %eax
	movl	%eax, (%esp)
	movaps	(%esp), %xmm0
	pmovzxwd	%xmm0, %xmm0
	movl	36(%esp), %eax
	movaps	%xmm0, (%eax)
	addl	$28, %esp
	ret

New:
	movl	4(%esp), %eax
	pmovzxwd	(%eax), %xmm0
	movl	8(%esp), %eax
	movaps	%xmm0, (%eax)
	ret

llvm-svn: 72957
2009-06-05 21:37:30 +00:00
Devang Patel 54707b420a Evan thinks NoImplicitFloat check is not required here.
llvm-svn: 72954
2009-06-05 18:48:29 +00:00
Dan Gohman 11231d0c75 Remove unnecessary #includes.
llvm-svn: 72782
2009-06-03 16:47:12 +00:00
Dale Johannesen 5234d3795f Revert 72707 and 72709, for the moment.
llvm-svn: 72712
2009-06-02 03:12:52 +00:00
Dale Johannesen 0b8ca79253 Make the implicit inputs and outputs of target-independent
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag.  Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.

Most targets will still produce a Flag-setting target-dependent
version when selection is done.  X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc.  This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted.  All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly.  The
same can be done on other targets.

The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.

llvm-svn: 72707
2009-06-01 23:27:20 +00:00
Bill Wendling 09f17a8479 Untabification.
llvm-svn: 72604
2009-05-30 01:09:53 +00:00
Evan Cheng a9cda8abf2 Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code.
e.g.
orl     $65536, 8(%rax)
=>
orb     $1, 10(%rax)

Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.

llvm-svn: 72507
2009-05-28 00:35:15 +00:00
Eli Friedman a56159b7e9 Ger rid of some dead code.
llvm-svn: 72494
2009-05-27 20:39:00 +00:00
Eli Friedman acb851a8c0 Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and
FP_TO_XINT.  Necessary for some cleanups I'm working on.  Updated 
from the previous version (r72431) to fix a bug and make some things a 
bit clearer.

llvm-svn: 72445
2009-05-27 00:47:34 +00:00
Daniel Dunbar d96b117872 Back out r72431, it is causing a number of compilation crashes with clang.
llvm-svn: 72436
2009-05-26 21:27:02 +00:00
Eli Friedman 8c7bff96ed Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and
FP_TO_XINT.  Necessary for some cleanups I'm working on. 

llvm-svn: 72431
2009-05-26 19:18:56 +00:00
Eli Friedman 2199ed399f Make the X86 backend mark EXTRACT_SUBVECTOR as Expand, at least for the
moment.

llvm-svn: 72350
2009-05-23 22:44:52 +00:00
Eli Friedman dfe4f25355 Make the x86 backend custom-lower UINT_TO_FP and FP_TO_UINT on 32-bit
systems instead of attempting to promote them to a 64-bit SINT_TO_FP or 
FP_TO_SINT.  This is in preparation for removing the type legalization 
code from LegalizeDAG: once type legalization is gone from LegalizeDAG, 
it won't be able to handle the i64 operand/result correctly.

This isn't quite ideal, but I don't think any other operation for any 
target ends up in this situation, so treating this case specially seems 
reasonable.

llvm-svn: 72324
2009-05-23 09:59:16 +00:00
Evan Cheng ab0d23396a Run code placement optimization for targets that want it (arm and x86 for now).
llvm-svn: 71726
2009-05-13 21:42:09 +00:00
Chris Lattner f1d9b91434 Fix PR4152: asm constraint validation happens before dag combine, so we
need to work a bit to combine things like (x+c1+c2) into x+c3.

llvm-svn: 71232
2009-05-08 18:23:14 +00:00
Nate Begeman 7e6e352735 Fix infinite recursion in the C++ code which handles movddup by making it unnecessary.
llvm-svn: 70425
2009-04-29 22:47:44 +00:00
Nate Begeman 5f829d896d Implement review feedback for vector shuffle work.
llvm-svn: 70372
2009-04-29 05:20:52 +00:00
Nate Begeman 8d6d4b9289 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.
PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225
2009-04-27 18:41:29 +00:00
Rafael Espindola c1396a2313 Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not
very elegant, but neither is the tls specification :-(

llvm-svn: 69968
2009-04-24 12:59:40 +00:00
Rafael Espindola b93db668b3 Revert 69952. Causes testsuite failures on linux x86-64.
llvm-svn: 69967
2009-04-24 12:40:33 +00:00
Nate Begeman bb881d66f4 PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952
2009-04-24 03:42:54 +00:00
Duncan Sands 7ce5cc6bd1 Get rid of what looks like a copy-and-pasted typo.
Spotted by gcc-4.5.

llvm-svn: 69673
2009-04-21 09:44:39 +00:00
Bob Wilson f8b85477ae Move duplicated AddLiveIn function from X86 and ARM backends to be a method
in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock.  Thanks for Anton for suggesting this.

llvm-svn: 69615
2009-04-20 18:36:57 +00:00
Rafael Espindola 355fe12c82 For general dynamic TLS access we must use
leaq	foo@TLSGD(%rip), %rdi

as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.

llvm-svn: 69350
2009-04-17 14:35:58 +00:00
Rafael Espindola 6d6c6043ea X86-64 TLS support for local exec and initial exec.
llvm-svn: 68947
2009-04-13 13:02:49 +00:00
Dan Gohman de912e2475 Remove the obsolete SelectionDAG::getNodeValueTypes and simplify
code that uses it by using SelectionDAG::getVTList instead.

llvm-svn: 68744
2009-04-09 23:54:40 +00:00
Dan Gohman f15454866c Fix grammaros in comments.
llvm-svn: 68666
2009-04-09 02:06:09 +00:00
Rafael Espindola 3b2df10c9e Re-apply 68552.
Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645
2009-04-08 21:14:34 +00:00
Rafael Espindola d173f4237d Avoid a hard coded constant.
llvm-svn: 68603
2009-04-08 08:09:33 +00:00
Dan Gohman ad3e549a53 Implement support for using modeling implicit-zero-extension on x86-64
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.

This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.

Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.

Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.

Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.

Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.

llvm-svn: 68576
2009-04-08 00:15:30 +00:00
Bill Wendling 4aa25b79f9 Temporarily revert r68552. This was causing a failure in the self-hosting LLVM
builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560
2009-04-07 22:35:25 +00:00
Rafael Espindola 1edda06792 Reduce code duplication on the TLS implementation.
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552
2009-04-07 21:37:46 +00:00
Mon P Wang 9c186c5d27 Added a x86 dag combine to increase the chances to use a
movq for v2i64 on x86-32.

llvm-svn: 68368
2009-04-03 02:43:30 +00:00
Chris Lattner d2eb0a63a1 silence warning in release-asserts build.
llvm-svn: 68253
2009-04-01 22:14:45 +00:00
Evan Cheng d9d6e427d6 i128 shift libcalls are not available on x86.
llvm-svn: 68133
2009-03-31 19:38:51 +00:00
Evan Cheng a84a318873 When optimzing a mul by immediate into two, the resulting mul's should get a x86 specific node to avoid dag combiner from hacking on them further.
llvm-svn: 68066
2009-03-30 21:36:47 +00:00
Rafael Espindola 6ff3dabbb4 Have only one definition of X86AddrNumOperands.
llvm-svn: 67949
2009-03-28 18:55:31 +00:00
Evan Cheng fd81c73cde Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g.
x * 40
=>
shlq    $3, %rdi
leaq    (%rdi,%rdi,4), %rax

This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq    (%rdi,%rdi,2), %rax
leaq    (%rsi,%rax,8), %rax

llvm-svn: 67917
2009-03-28 05:57:29 +00:00
Rafael Espindola e728019392 I am trying to add a segment to the X86 addresses matching to
improve TLS support (see http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090309/075220.html), but that code is VERY brittle.

This patch just makes it a bit more resistant.

llvm-svn: 67843
2009-03-27 15:26:30 +00:00
Evan Cheng d88ebc352c -no-implicit-float means explicit fp operations are legal.
llvm-svn: 67784
2009-03-26 23:06:32 +00:00
Bill Wendling aa28be652c Pull transform from target-dependent code into target-independent code.
llvm-svn: 67742
2009-03-26 06:14:09 +00:00
Bill Wendling 94f299f2c5 Match this pattern so that we can generate simpler code:
%a = ...
  %b = and i32 %a, 2
  %c = srl i32 %b, 1
  %d = br i32 %c, 

into

  %a = ...
  %b = and %a, 2
  %c = X86ISD::CMP %b, 0
  %d = X86ISD::BRCOND %c ...

This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.

llvm-svn: 67728
2009-03-26 01:47:50 +00:00
Bill Wendling 798fd56d0f These instructions have special lowering that may lower them to SSE
instructions. Prevent that if we don't want implicit uses of SSE.

llvm-svn: 66877
2009-03-13 08:41:47 +00:00
Evan Cheng 1fb8aedd1e Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.


Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875
2009-03-13 07:51:59 +00:00
Chris Lattner 99cc133710 generalize the previous code to use the full generality of LEA
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869
2009-03-13 05:53:31 +00:00
Chris Lattner 4be6df5d86 optimize the case of cond ? 42 : 41 and friends. This compiles the
example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868
2009-03-13 05:22:11 +00:00
Chris Lattner 4147f08e44 Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))"
related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779
2009-03-12 06:52:53 +00:00
Evan Cheng ef0b7cc2d5 On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.
llvm-svn: 66776
2009-03-12 05:59:15 +00:00
Bill Wendling 42adc73a2b Add a -no-implicit-float flag. This acts like -soft-float, but may generate
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
2009-03-11 22:30:01 +00:00
Mon P Wang 25c6a46a81 For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits.
llvm-svn: 66684
2009-03-11 18:47:57 +00:00
Mon P Wang ce6a26cb1a Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw.
llvm-svn: 66645
2009-03-11 06:35:11 +00:00
Chris Lattner 248ad00afd formatting change, reduce indentation. No functionality change.
llvm-svn: 66642
2009-03-11 05:48:52 +00:00
Dan Gohman ff659b5b86 Arithmetic instructions don't set EFLAGS bits OF and CF bits
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318
2009-03-07 01:58:32 +00:00
Dan Gohman e014b193c9 When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand.
The extra operand didn't appear to cause any trouble, but it was
erroneous regardless.

llvm-svn: 66206
2009-03-05 21:29:28 +00:00
Dan Gohman 2c2f192c74 Fix the "test" optimization to recognize "dec" as an add of
negative one, as subtracts of immediates are canonicalized
to adds.

llvm-svn: 66180
2009-03-05 19:32:48 +00:00
Dan Gohman 55d7b2ac4f Re-apply 66008, now that the unfoldMemoryOperand bug is fixed.
llvm-svn: 66058
2009-03-04 19:44:21 +00:00
Dan Gohman 6728f892be Revert r66004 for now; it's causing a variety of test failures.
llvm-svn: 66008
2009-03-04 03:54:19 +00:00
Dan Gohman fe8d71f42a Teach the x86 backend to eliminate "test" instructions by using the EFLAGS
result from add, sub, inc, and dec instructions in simple cases.

llvm-svn: 66004
2009-03-04 02:33:24 +00:00
Rafael Espindola 000421eade Refactor TLS code and add some tests. The tests and expected results are:
pic |  declaration | linkage  | visibility |

!pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
 pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
!pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
 pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic

!pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
 pic |  declaration | external | hidden     | X                       | local dynamic
!pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
 pic | !declaration | external | hidden     | X                       | local dynamic

!pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
 pic |  declaration | internal | default    | X                       | local dynamic

The ones marked with an X have not been implemented since local dynamic is not implemented.

llvm-svn: 65632
2009-02-27 13:37:18 +00:00
Evan Cheng a49de9de2e Revert BuildVectorSDNode related patches: 65426, 65427, and 65296.
llvm-svn: 65482
2009-02-25 22:49:59 +00:00
Evan Cheng 9f8fddeed8 Only v1i16 (i.e. _m64) is returned via RAX / RDX.
llvm-svn: 65313
2009-02-23 09:03:22 +00:00
Nate Begeman e684da3e5d Generate better code for v8i16 shuffles on SSE2
Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311
2009-02-23 08:49:38 +00:00
Scott Michel 9d31aca679 Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.

llvm-svn: 65296
2009-02-22 23:36:09 +00:00
Evan Cheng e4ffc030e2 Be bug compatible with gcc by returning MMX values in RAX.
llvm-svn: 65274
2009-02-22 08:05:12 +00:00
Evan Cheng 2a9bad5ac1 Support return of MMX values in 64-bit mode.
llvm-svn: 65152
2009-02-20 20:43:02 +00:00
Scott Michel cf0da6c597 Remove trailing whitespace to reduce later commit patch noise.
(Note: Eventually, commits like this will be handled via a pre-commit hook that
 does this automagically, as well as expand tabs to spaces and look for 80-col
 violations.)

llvm-svn: 64827
2009-02-17 22:15:04 +00:00
Evan Cheng c2fde91703 Teach x86 target -soft-float.
llvm-svn: 64496
2009-02-13 22:36:38 +00:00
Dale Johannesen 655775293f Arrange to print constants that match "n" and "i" constraints
in inline asm as signed (what gcc does).  Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.

llvm-svn: 64400
2009-02-12 20:58:09 +00:00
Dale Johannesen 9c310711bb Use getDebugLoc forwarder instead of getNode()->getDebugLoc.
No functional change.

llvm-svn: 64026
2009-02-07 19:59:05 +00:00
Dan Gohman 747e55bc9a Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing
ScheduleDAG's TLI member to use const.

llvm-svn: 64018
2009-02-07 16:15:20 +00:00
Dale Johannesen 62fd95d6ec Get rid of the last non-DebugLoc versions of getNode!
Many targets build placeholder nodes for special operands, e.g.
GlobalBaseReg on X86 and PPC for the PIC base.  There's no
sensible way to associate debug info with these.  I've left
them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. 
I'm not too happy about this but don't see a good improvement;
I considered adding a getPseudoOperand or something, but it
seems to me that'll just make it harder to read.

llvm-svn: 63992
2009-02-07 00:55:49 +00:00
Dale Johannesen 84935759d5 Remove more non-DebugLoc getNode variants. Use
getCALLSEQ_{END,START} to permit passing no DebugLoc
there.  UNDEF doesn't logically have DebugLoc; add
getUNDEF to encapsulate this.

llvm-svn: 63978
2009-02-06 23:05:02 +00:00
Dale Johannesen 400dc2e2e4 Remove more non-DebugLoc versions of getNode.
llvm-svn: 63969
2009-02-06 21:50:26 +00:00
Dale Johannesen 9f3f72f144 Get rid of one more non-DebugLoc getNode and
its corresponding getTargetNode.  Lots of
caller changes.

llvm-svn: 63904
2009-02-06 01:31:28 +00:00
Dale Johannesen 021052a705 Remove non-DebugLoc versions of getLoad and getStore.
Adjust the many callers of those versions.

llvm-svn: 63767
2009-02-04 20:06:27 +00:00
Dan Gohman 556d14d483 Minor code cleanups; no functionality change.
llvm-svn: 63740
2009-02-04 17:28:58 +00:00
Mon P Wang 4379a795fe Fixes a case where we generate an incorrect mask for pshfhw in the presence
of undefs and incorrectly determining if we have punpckldq.

llvm-svn: 63702
2009-02-04 01:16:59 +00:00
Dale Johannesen bbf13f54e0 Patch up omissions in DebugLoc propagation.
llvm-svn: 63693
2009-02-04 00:33:20 +00:00
Dale Johannesen abf66b8343 Add some DL propagation to places that didn't
have it yet.  More coming.

llvm-svn: 63673
2009-02-03 22:26:09 +00:00
Dale Johannesen 1eb1ef2cfd DebugLoc propagation. done with file.
llvm-svn: 63656
2009-02-03 20:21:25 +00:00
Dale Johannesen 66e03e6f7b DebugLoc propagation. 2/3 through file.
llvm-svn: 63650
2009-02-03 19:33:06 +00:00
Evan Cheng dc636c4080 ADD / SUB / SMUL / UMUL with overflow second result top bits must be zero.
llvm-svn: 63509
2009-02-02 09:15:04 +00:00
Evan Cheng 4988c597b3 Add comment.
llvm-svn: 63506
2009-02-02 08:19:07 +00:00
Evan Cheng 50e15bdf81 Teach LowerBRCOND to recognize (xor (setcc x), 1). The xor inverts the condition. It's normally transformed by the dag combiner, unless the condition is set by a arithmetic op with overflow.
llvm-svn: 63505
2009-02-02 08:07:36 +00:00
Torok Edwin a2d1f35e9a Implement -mno-sse: if SSE is disabled on x86-64, don't store XMM on stack for
var-args, and don't allow FP return values

llvm-svn: 63495
2009-02-01 18:15:56 +00:00
Duncan Sands 3ed768868d Fix PR3453 and probably a bunch of other potential
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.

llvm-svn: 63494
2009-02-01 18:06:53 +00:00
Dale Johannesen 555a375bb6 Make LowerCallTo and LowerArguments take a DebugLoc
argument.  Adjust all callers and overloaded versions.

llvm-svn: 63444
2009-01-30 23:10:59 +00:00
Bill Wendling 8fb81f1b3d Get rid of the non-DebugLoc-ified getNOT() method.
llvm-svn: 63442
2009-01-30 23:03:19 +00:00
Mon P Wang cbb20a6ee1 When PerformBuildVectorCombine, avoid creating a X86ISD::VZEXT_LOAD of
an illegal type.

llvm-svn: 63380
2009-01-30 07:07:40 +00:00
Dan Gohman e58ab79f33 Make x86's BT instruction matching more thorough, and add some
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.

llvm-svn: 63266
2009-01-29 01:59:02 +00:00
Mon P Wang 9150f735fa Fixed lowering of v816 shuffles.
llvm-svn: 63252
2009-01-28 23:11:14 +00:00
Mon P Wang 5a685a52c1 Add shuffle splat pattern for x86 sse shifts.
llvm-svn: 63193
2009-01-28 08:12:05 +00:00
Dan Gohman 8e4ac9b71a Take the next steps in making SDUse more consistent with LLVM Use, and
tidy up SDUse and related code.
 - Replace the operator= member functions with a set method, like
   LLVM Use has, and variants setInitial and setNode, which take
   care up updating use lists, like LLVM Use's does. This simplifies
   code that calls these functions.
 - getSDValue() is renamed to get(), as in LLVM Use, though most
   places can either use the implicit conversion to SDValue or the
   convenience functions instead.
 - Fix some more node vs. value terminology issues.

Also, eliminate the one remaining use of SDOperandPtr, and
SDOperandPtr itself.

llvm-svn: 62995
2009-01-26 04:35:06 +00:00
Nate Begeman a2550a8e96 De-identifying per sabre review
llvm-svn: 62988
2009-01-26 03:15:31 +00:00
Nate Begeman 8a51d8c8f7 Support pattern matching various x86 sse shifts.
llvm-svn: 62979
2009-01-26 00:52:55 +00:00
Bob Wilson c58900504b Add SelectionDAG::getNOT method to construct bitwise NOT operations,
corresponding to the "not" and "vnot" PatFrags.  Use the new method
in some places where it seems appropriate.

llvm-svn: 62768
2009-01-22 17:39:32 +00:00
Evan Cheng 8f367e53c7 Minor tweak to LowerUINT_TO_FP_i32. Bias (after scalar_to_vector) has two uses so we should make it the second source operand of ISD::OR so 2-address pass won't have to be smart about commuting.
%reg1024<def> = MOVSDrm %reg0, 1, %reg0, <cp#0>, Mem:LD(8,8) [ConstantPool + 0]
%reg1025<def> = MOVSD2PDrr %reg1024
%reg1026<def> = MOVDI2PDIrm <fi#-1>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack-1 + 0]
%reg1027<def> = ORPSrr %reg1025<kill>, %reg1026<kill>
%reg1028<def> = MOVPD2SDrr %reg1027<kill>
%reg1029<def> = SUBSDrr %reg1028<kill>, %reg1024<kill>
%reg1030<def> = CVTSD2SSrr %reg1029<kill>
MOVSSmr <fi#0>, 1, %reg0, 0, %reg1030<kill>, Mem:ST(4,4) [FixedStack0 + 0]
%reg1031<def> = LD_Fp32m80 <fi#0>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack0 + 0]
RET %reg1031<kill>, %ST0<imp-use,kill>

The reason 2-addr pass isn't smart enough to commute the ORPSrr is because it can't look pass the MOVSD2PDrr instruction.

llvm-svn: 62505
2009-01-19 08:19:57 +00:00
Evan Cheng 7e9ef4d776 Now not UINT_TO_FP is legal (it's marked custom), dag combiner won't
optimize it to a SINT_TO_FP when the sign bit is known zero. X86 isel should perform the optimization itself.

llvm-svn: 62504
2009-01-19 08:08:22 +00:00
Bill Wendling f9291cf43c Extend thi
llvm-svn: 62415
2009-01-17 07:40:19 +00:00
Bill Wendling dd40f26877 Temporarily revert my last change. It is causing a bootstrap failure.
llvm-svn: 62405
2009-01-17 04:23:51 +00:00
Bill Wendling 4d5275905e Implement a special algorithm for converting uint_to_fp for i32 values on
X86. This code:

void f() {
  uint32_t x;
  float y = (float)x;
}

used to be:

     movl     %eax, -8(%ebp)
     movl     [2^52 double], -4(%ebp)
     movsd    -8(%ebp), %xmm0
     subsd    [2^52 double], %xmm0
     cvtsd2ss %xmm0, %xmm0

Is now:

   movsd        [2^52 double], %xmm0
   movsd        %xmm0, %xmm1
   movd         %ecx, %xmm2
   orps         %xmm2, %xmm1
   subsd        %xmm0, %xmm1
   cvtsd2ss     %xmm1, %xmm0

This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That
will be fixed in a later coalescer fix.

llvm-svn: 62404
2009-01-17 03:56:04 +00:00
Bill Wendling e04334730e Add support for non-zero __builtin_return_address values on X86.
llvm-svn: 62338
2009-01-16 19:25:27 +00:00
Mon P Wang ebfafee903 Expand insert/extract of a <4 x i32> with a variable index.
llvm-svn: 62281
2009-01-15 21:10:20 +00:00
Dan Gohman 0ad43ca6e5 Make getWidenVectorType const.
llvm-svn: 62265
2009-01-15 17:34:08 +00:00
Dan Gohman a63bede3c6 BT appears to be available on all >= i386 chips.
llvm-svn: 62196
2009-01-13 23:27:15 +00:00
Dan Gohman d3942af5cb Don't use a BT instruction if the AND has multiple uses.
llvm-svn: 62195
2009-01-13 23:25:30 +00:00
Devang Patel 5c6e1e3b7d Use DebugInfo interface to lower dbg_* intrinsics.
llvm-svn: 62127
2009-01-13 00:35:13 +00:00
Dan Gohman 33e6fcd56f X86_COND_C and X86_COND_NC are alternate mnemonics for
X86_COND_B and X86_COND_AE, respectively.

llvm-svn: 61835
2009-01-07 00:15:08 +00:00
Devang Patel 56a8bb670f squash warnings.
llvm-svn: 61707
2009-01-05 17:31:22 +00:00
Evan Cheng 1671a309fd Use movaps / movd to extract vector element 0 even with sse4.1. It's still cheaper than pextrw especially if the value is in memory.
llvm-svn: 61555
2009-01-02 05:29:08 +00:00
Duncan Sands 8feb694e8f Fix PR3274: when promoting the condition of a BRCOND node,
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType.  In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).

llvm-svn: 61542
2009-01-01 15:52:00 +00:00
Chris Lattner 2a7c988627 Add a simple pattern for matching 'bt'.
llvm-svn: 61426
2008-12-25 05:34:37 +00:00
Chris Lattner 8175f27d3f translateX86CC can never fail. Simplify it based on this.
llvm-svn: 61423
2008-12-24 23:53:05 +00:00
Chris Lattner 4b46b74ece indentation
llvm-svn: 61407
2008-12-24 00:11:37 +00:00
Chris Lattner e9988b661d simplify some control flow and reduce indentation, no functionality change.
llvm-svn: 61404
2008-12-23 23:42:27 +00:00
Dan Gohman 25a767d7f4 Add instruction patterns and encodings for the x86 bt instructions.
llvm-svn: 61400
2008-12-23 22:45:23 +00:00
Dan Gohman 12f2490489 Clean up the atomic opcodes in SelectionDAG.
This removes all the _8, _16, _32, and _64 opcodes and replaces each
group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode
is now used to carry the size information. In tablegen, the size-specific
opcodes are replaced by size-independent opcodes that utilize the
ability to compose them with predicates.

This shrinks the per-opcode tables and makes the code that handles
atomics much more concise.

llvm-svn: 61389
2008-12-23 21:37:04 +00:00
Mon P Wang ec95070ca3 Fixed code generation for v8i16 and v16i8 splats on X86.
Fixed lowering of v8i16 shuffles for v8i16 when we fall back to extract/insert.

llvm-svn: 61365
2008-12-23 04:03:27 +00:00
Mon P Wang 998fd29ce1 Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1.
llvm-svn: 61211
2008-12-18 21:42:19 +00:00
Bill Wendling c4499feb1a - Use patterns instead of creating completely new instruction matching patterns,
which are identical to the original patterns.

- Change the multiply with overflow so that we distinguish between signed and
  unsigned multiplication. Currently, unsigned multiplication with overflow
  isn't working!

llvm-svn: 60963
2008-12-12 21:15:41 +00:00
Mon P Wang 9c2d26d208 Added support for SELECT v8i8 v4i16 for X86 (MMX)
Added support for TRUNC v8i16 to v8i8 for X86 (MMX)

llvm-svn: 60916
2008-12-12 01:25:51 +00:00
Bill Wendling 1a317678bc Redo the arithmetic with overflow architecture. I was changing the semantics of
ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.

Similar for SUB and MUL instructions.

llvm-svn: 60915
2008-12-12 00:56:36 +00:00
Bill Wendling f482f379ef Whitespace changes.
llvm-svn: 60826
2008-12-10 02:01:32 +00:00
Bill Wendling db8ec2d75a Add sub/mul overflow intrinsics. This currently doesn't have a
target-independent way of determining overflow on multiplication. It's very
tricky. Patch by Zoltan Varga!

llvm-svn: 60800
2008-12-09 22:08:41 +00:00
Dale Johannesen 9efd2ce55b Make LoopStrengthReduce smarter about hoisting things out of
loops when they can be subsumed into addressing modes.

Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)

llvm-svn: 60608
2008-12-05 21:47:27 +00:00
Evan Cheng 501089f6f4 Refactor code. No functionality change.
llvm-svn: 60478
2008-12-03 08:38:43 +00:00
Bill Wendling f8d1ef9842 CC should only be a ConstantSDNode at this point. Just use 'cast' instead of 'dyn_cast'.
llvm-svn: 60477
2008-12-03 08:32:02 +00:00
Bill Wendling 30e9dc81c8 Second stab at target-dependent lowering of everyone's favorite nodes: [SU]ADDO
- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
  EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
  depending on the type of ADDO requested.

- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
  COND_C set.

llvm-svn: 60388
2008-12-02 01:06:39 +00:00
Duncan Sands 3d960941b1 There are no longer any places that require a
MERGE_VALUES node with only one operand, so get
rid of special code that only existed to handle
that possibility.

llvm-svn: 60349
2008-12-01 11:41:29 +00:00
Duncan Sands 6ed40141f7 Change the interface to the type legalization method
ReplaceNodeResults: rather than returning a node which
must have the same number of results as the original
node (which means mucking around with MERGE_VALUES,
and which is also easy to get wrong since SelectionDAG
folding may mean you don't get the node you expect),
return the results in a vector.

llvm-svn: 60348
2008-12-01 11:39:25 +00:00
Bill Wendling 128f032cc8 Comment out code that isn't entirely correct.
llvm-svn: 60156
2008-11-27 07:18:35 +00:00
Bill Wendling 751a694ad3 Generate something sensible for an [SU]ADDO op when the overflow/carry flag is
the conditional for the BRCOND statement. For instance, it will generate:

    addl %eax, %ecx
    jo LOF

instead of

    addl %eax, %ecx
    ; About 10 instructions to compare the signs of LHS, RHS, and sum.
    jl LOF

llvm-svn: 60123
2008-11-26 22:37:40 +00:00
Bill Wendling 66835479d7 - Make lowering of "add with overflow" customizable by back-ends.
- Mark "add with overflow" as having a custom lowering for X86. Give it a null
  lowering representation for now.

llvm-svn: 59971
2008-11-24 19:21:46 +00:00
Mon P Wang 35a70ec131 Added missing description for -disable-mmx option.
llvm-svn: 59929
2008-11-24 02:10:43 +00:00
Duncan Sands 8d6e2e13d5 Rename SetCCResultContents to BooleanContents. In
practice these booleans are mostly produced by SetCC,
however the concept is more general.

llvm-svn: 59911
2008-11-23 15:47:28 +00:00
Mon P Wang 0aa8f0a549 Added -disable-mmx using a patch from Preston Gurd.
llvm-svn: 59901
2008-11-23 04:37:22 +00:00
Dale Johannesen bee1ad9707 Extend InlineAsm::C_Register to allow multiple specific registers
(actually, code already all worked, only the comment
changed).  Use this to implement 'A' constraint on x86.
Fixes PR 1779.

llvm-svn: 59266
2008-11-13 21:52:36 +00:00
Mon P Wang 9a8d60a7c0 Widening cleanup
llvm-svn: 58796
2008-11-06 05:31:54 +00:00
Evan Cheng 3cd5e8c97b Indentation.
llvm-svn: 58750
2008-11-05 06:03:38 +00:00
Dan Gohman 99cdf8893e Use MOVSSmr instead of EXTRACTPSmr in the case of extracting
vector element 0 for a store, as it's smaller and faster.

llvm-svn: 58483
2008-10-31 00:57:24 +00:00
Mon P Wang 58c3794c27 Add initial support for vector widening. Logic is set to widen for X86.
One will only see an effect if legalizetype is not active.  Will move
support to LegalizeType soon.

llvm-svn: 58426
2008-10-30 08:01:45 +00:00
Chris Lattner 38461f6b2f Fix a nasty miscompilation of 176.gcc on linux/x86 where we synthesized
a memset using 16-byte XMM stores, but where the stack realignment code
didn't work.  Until it does (PR2962) disable use of xmm regs in memcpy
and memset formation for linux and other targets with insufficiently
aligned stacks.

This is part of PR2888

llvm-svn: 58317
2008-10-28 05:49:35 +00:00
Duncan Sands 014f5bbaad Fix translateX86CC: if SetCCOpcode is SETULE and
LHS is a foldable load, then LHS and RHS are swapped
and SetCCOpcode is changed to SETUGT.  But the later
code is expecting operands to be the wrong way round
for SETUGT, but they are not in this case, resulting
in an inverted compare.  The solution is to move the
load normalization before the correction for SETUGT.
This bug was tickled by LegalizeTypes which happened
to legalize the testcase slightly differently to
LegalizeDAG.

llvm-svn: 58092
2008-10-24 13:03:10 +00:00
Dale Johannesen f6655a9e79 Remove allocation of unused stack slot.
llvm-svn: 57987
2008-10-22 17:26:06 +00:00
Duncan Sands 5ee1dde8fa Get this working with LegalizeTypes: (1) don't
assume that i64 has been turned into a BUILD_PAIR
node (when called from LegalizeTypes this hasn't
happened yet) and don't use a vector shuffle mask
with an illegal element type.

llvm-svn: 57972
2008-10-22 11:24:12 +00:00
Dale Johannesen cf4607fcce Adjust comments for pedantic satisfaction.
llvm-svn: 57940
2008-10-22 00:02:32 +00:00
Dale Johannesen 3d7ece1acb Add comments to explain uint64->f64 algorithm,
well, sort of.  (Algorithm by Ian Ollmann.)

llvm-svn: 57932
2008-10-21 23:07:49 +00:00
Dale Johannesen 28929589e7 Add an SSE2 algorithm for uint64->f64 conversion.
The same one Apple gcc uses, faster.  Also gets the
extreme case in gcc.c-torture/execute/ieee/rbug.c
correct which we weren't before; this is not
sufficient to get the test to pass though, there
is another bug.

llvm-svn: 57926
2008-10-21 20:50:01 +00:00
Dan Gohman 269246b034 Don't create TargetGlobalAddress nodes with offsets that don't fit
in the 32-bit signed offset field of addresses. Even though this
may be intended, some linkers refuse to relocate code where the
relocated address computation overflows.

Also, fix the sign-extension of constant offsets to use the
actual pointer size, rather than the size of the GlobalAddress
node, which may be different, for example on x86-64 where MVT::i32
is used when the address is being fit into the 32-bit displacement
field.

llvm-svn: 57885
2008-10-21 03:38:42 +00:00
Dan Gohman 97d95d6d85 Optimized FCMP_OEQ and FCMP_UNE for x86.
Where previously LLVM might emit code like this:

        ucomisd %xmm1, %xmm0
        setne   %al
        setp    %cl
        orb     %al, %cl
        jne     .LBB4_2

it now emits this:

        ucomisd %xmm1, %xmm0
        jne     .LBB4_2
        jp      .LBB4_2

It has fewer instructions and uses fewer registers, but it does
have more branches. And in the case that this code is followed by
a non-fallthrough edge, it may be followed by a jmp instruction,
resulting in three branch instructions in sequence. Some effort
is made to avoid this situation.

To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
FCMP_UNE in lowered form, and replace them with code that emits
two branches, except in the case where it would require converting
a fall-through edge to an explicit branch.

Also, X86InstrInfo.cpp's branch analysis and transform code now
knows now to handle blocks with multiple conditional branches. It
uses loops instead of having fixed checks for up to two
instructions. It can now analyze and transform code generated
from FCMP_OEQ and FCMP_UNE.

llvm-svn: 57873
2008-10-21 03:29:32 +00:00
Duncan Sands 1d20ab5784 Have X86 custom lowering for LegalizeTypes use
LowerOperation if it doesn't know what else to do.
This methods should probably be factorized some,
but this is good enough for the moment.  Have
LowerATOMIC_BINARY_64 use EXTRACT_ELEMENT rather
than assuming the operand is a BUILD_PAIR (if it
is then getNode will automagically simplify the
EXTRACT_ELEMENT).  This way LowerATOMIC_BINARY_64
usable from LegalizeTypes.

llvm-svn: 57831
2008-10-20 15:56:33 +00:00
Dan Gohman 2fe6bee5b6 Teach DAGCombine to fold constant offsets into GlobalAddress nodes,
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)

This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.

This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.

Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.

The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.

llvm-svn: 57748
2008-10-18 02:06:02 +00:00
Chris Lattner 8e2ef196ae add support for 128 bit inputs on both x86-64 and x86-32.
llvm-svn: 57709
2008-10-17 18:15:05 +00:00
Chris Lattner c7e65f4377 Fix a bug where the x86 backend would reject 64-bit r constraints when
in 32-bit mode instead of assigning a register pair.  This has nothing to
do with PR2356, but I happened to notice it while working on it.

llvm-svn: 57704
2008-10-17 17:59:52 +00:00
Dan Gohman 4a87660127 Remove an unused variable.
llvm-svn: 57621
2008-10-16 01:47:47 +00:00
Evan Cheng 3b0f5e4d61 - Add target lowering hooks that specify which setcc conditions are illegal,
i.e. conditions that cannot be checked with a single instruction. For example,
SETONE and SETUEQ on x86.
- Teach legalizer to implement *illegal* setcc as a and / or of a number of
legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is
implemented as SETO & SETNE, SETUEQ is SETUO | SETEQ.
- Move x86 target over.

llvm-svn: 57542
2008-10-15 02:05:31 +00:00
Dan Gohman e7ced74558 FastISel support for exception-handling constructs.
- Move the EH landing-pad code and adjust it so that it works
   with FastISel as well as with SDISel.
 - Add FastISel support for @llvm.eh.exception and
   @llvm.eh.selector.

llvm-svn: 57539
2008-10-14 23:54:11 +00:00
Evan Cheng 07d53b1d33 Rename LoadX to LoadExt.
llvm-svn: 57526
2008-10-14 21:26:46 +00:00
Chris Lattner 2753955fc0 Change CALLSEQ_BEGIN and CALLSEQ_END to take TargetConstant's as
parameters instead of raw Constants.  This prevents the constants from
being selected by the isel pass, fixing PR2735.

llvm-svn: 57385
2008-10-11 22:08:30 +00:00
Dale Johannesen 4f0bd68cfe Add a "loses information" return value to APFloat::convert
and APFloat::convertToInteger.  Restore return value to
IEEE754.  Adjust all users accordingly.

llvm-svn: 57329
2008-10-09 23:00:39 +00:00
Evan Cheng 94d14f2d45 Fix PR2850 and PR2863. Only generate movddup for 128-bit SSE vector shuffles.
llvm-svn: 57210
2008-10-06 21:13:08 +00:00
Dale Johannesen 8c36a1c09c Make atomic Swap work, 64-bit on x86-32.
Make it all work in non-pic mode.

llvm-svn: 57034
2008-10-03 22:25:52 +00:00
Dale Johannesen 5d60c1ebb1 Pass MemOperand through for 64-bit atomics on 32-bit,
incidentally making the case where the memop is a
pointer deref work.  Fix cmp-and-swap regression.

llvm-svn: 57027
2008-10-03 19:41:08 +00:00
Dan Gohman 0d1e9a8e04 Switch the MachineOperand accessors back to the short names like
isReg, etc., from isRegister, etc.

llvm-svn: 57006
2008-10-03 15:45:36 +00:00
Dale Johannesen 867d549fce Handle some 64-bit atomics on x86-32, some of the time.
llvm-svn: 56963
2008-10-02 18:53:47 +00:00
Bill Wendling 68f12ee567 Implement the -fno-builtin option in the front-end, not in the back-end.
llvm-svn: 56900
2008-10-01 00:59:58 +00:00
Bill Wendling 1782584f56 Just don't transform this memset into "bzero" if no-builtin is specified.
llvm-svn: 56888
2008-09-30 22:05:33 +00:00
Bill Wendling bd09262e97 Add the new `-no-builtin' flag. This flag is meant to mimic the GCC
`-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero"
instead of "__bzero" on Darwin10+. This arguably violates the meaning of this
flag, but is currently sufficient. The meaning of this flag should become more
specific over time.

llvm-svn: 56885
2008-09-30 21:22:07 +00:00
Dale Johannesen f61a84ec43 Remove misuse of ReplaceNodeResults for atomics with
valid types.  No functional change.

llvm-svn: 56808
2008-09-29 22:25:26 +00:00
Evan Cheng 3774b2f292 Re-apply 56683 with fixes.
llvm-svn: 56748
2008-09-27 01:56:22 +00:00
Bill Wendling c966a737c5 Temporarily reverting r56683. This is causing a failure during the build of llvm-gcc:
/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -mmacosx-version-min=10.4 -O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include  -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Gir/devel/llvm/clean/llvm.obj/include -I/Volumes/Gir/devel/llvm/clean/llvm.src/include -fexceptions -fvisibility=hidden -DHIDE_EXPORTS -c ../../llvm-gcc.src/gcc/unwind-dw2-fde-darwin.c -o libgcc/./unwind-dw2-fde-darwin.o
Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) && TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical register live information"), function runOnMachineFunction, file /Volumes/Gir/devel/llvm/clean/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp, line 311.
../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
{standard input}:3521:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb"
{standard input}:3521:symbol: "_dwarf_reg_size_table" can't be undefined in a subtraction expression
{standard input}:3520:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb"
...

llvm-svn: 56703
2008-09-26 22:10:44 +00:00
Dan Gohman 6e0548336a Rename ConstantSDNode's getSignExtended to getSExtValue, for
consistancy with ConstantInt, and re-implement it in terms
of ConstantInt's getSExtValue.

llvm-svn: 56700
2008-09-26 21:54:37 +00:00
Evan Cheng d77cbe8947 Fix @llvm.frameaddress codegen. FP elimination optimization should be disabled when frame address is desired. Also add support for depth > 0.
llvm-svn: 56683
2008-09-26 19:48:35 +00:00
Dale Johannesen 0e32a2c935 Add "inreg" field to CallSDNode (doesn't increase
its size).  Adjust various lowering functions to
pass this info through from CallInst.  Use it to
implement sseregparm returns on X86.  Remove
X86_ssecall calling convention.

llvm-svn: 56677
2008-09-26 19:31:26 +00:00
Evan Cheng 9dbe45c000 Prefer movlhps over punpcklqdq, etc. in more cases.
llvm-svn: 56627
2008-09-25 23:35:16 +00:00
Devang Patel 4c758ea3e0 Large mechanical patch.
s/ParamAttr/Attribute/g
s/PAList/AttrList/g
s/FnAttributeWithIndex/AttributeWithIndex/g
s/FnAttr/Attribute/g

This sets the stage 
- to implement function notes as function attributes and 
- to distinguish between function attributes and return value attributes.

This requires corresponding changes in llvm-gcc and clang.

llvm-svn: 56622
2008-09-25 21:00:45 +00:00
Evan Cheng 74c9ed91b0 With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps
llvm-svn: 56620
2008-09-25 20:50:48 +00:00
Evan Cheng 4751549f9b X86ISD::VZEXT_LOAD should produce and fold a chain.
llvm-svn: 56593
2008-09-24 23:26:36 +00:00
Evan Cheng e0add20c1b Properly handle 'm' inline asm constraints. If a GV is being selected for the addressing mode, it requires the same logic for PIC relative addressing, etc.
llvm-svn: 56526
2008-09-24 00:05:32 +00:00
Dan Gohman 918fe08a56 Arrange for FastISel code to have access to the MachineModuleInfo
object. This will be needed to support debug info.

llvm-svn: 56508
2008-09-23 21:53:34 +00:00
Evan Cheng 9e9426cb82 Support x86 specific inline asm modifier 'J'.
llvm-svn: 56483
2008-09-22 23:57:37 +00:00
Dale Johannesen 7a74e71489 Make log, log2, log10, exp, exp2 use Expand by
default.

llvm-svn: 56471
2008-09-22 21:57:32 +00:00
Arnold Schwaighofer 796a271c5f Change the calling convention used when tail call optimization is enabled from CC_X86_32_TailCall to CC_X86_32_FastCC.
llvm-svn: 56436
2008-09-22 14:50:07 +00:00
Bill Wendling 24c79f28b1 Reverting r56249. On further investigation, this functionality isn't needed.
Apologies for the thrashing.

llvm-svn: 56251
2008-09-16 21:48:12 +00:00
Bill Wendling 8bc392fb1d - Change "ExternalSymbolSDNode" to "SymbolSDNode".
- Add linkage to SymbolSDNode (default to external).
- Change ISD::ExternalSymbol to ISD::Symbol.
- Change ISD::TargetExternalSymbol to ISD::TargetSymbol

These changes pave the way to allowing SymbolSDNodes with non-external linkage.

llvm-svn: 56249
2008-09-16 21:12:30 +00:00
Dan Gohman 38453eebdc Remove isImm(), isReg(), and friends, in favor of
isImmediate(), isRegister(), and friends, to avoid confusion
about having two different names with the same meaning. I'm
not attached to the longer names, and would be ok with
changing to the shorter names if others prefer it.

llvm-svn: 56189
2008-09-13 17:58:21 +00:00
Dan Gohman d3fe174c53 Define CallSDNode, an SDNode subclass for use with ISD::CALL.
Currently it just holds the calling convention and flags
for isVarArgs and isTailCall.

And it has several utility methods, which eliminate magic
5+2*i and similar index computations in several places.

CallSDNodes are not CSE'd. Teach UpdateNodeOperands to handle
nodes that are not CSE'd gracefully.

llvm-svn: 56183
2008-09-13 01:54:27 +00:00
Dan Gohman effb894453 Rename ConstantSDNode::getValue to getZExtValue, for consistency
with ConstantInt. This led to fixing a bug in TargetLowering.cpp
using getValue instead of getAPIntValue.

llvm-svn: 56159
2008-09-12 16:56:44 +00:00
Arnold Schwaighofer dd45bc25ac When tailcallopt is enabled all fastcc calls must have an aligned argument stack size. Add a test case.
llvm-svn: 56119
2008-09-11 20:28:43 +00:00
Dale Johannesen 58d084c05b The version of AtomicSDNode::AtomicSDNode used (only) for
cmp-and-swap reversed the Cmp and Swap arguments; comments
make it clear this is unintentional.  Unfortunately, the
x86 BE had a compensating reversal, which is removed here.
PPC is OK.

From inspection of the Alpha code I think it is OK, but
if somebody has that platform please check it out.  I
cannot test on that platform.

llvm-svn: 56091
2008-09-11 03:12:59 +00:00
Dan Gohman 39d82f902a Add X86FastISel support for static allocas, and refences
to static allocas. As part of this change, refactor the
address mode code for laods and stores.

llvm-svn: 56066
2008-09-10 20:11:02 +00:00
Evan Cheng 710c3cf36a Fix a fastcc + sret bug. If fastcc and sret, callee doesn't need to pop the hidden struct ptr; Re-enable fastcc.
llvm-svn: 56061
2008-09-10 18:25:29 +00:00
Dale Johannesen 4cc893bab6 Handle new intrinsics with vector arguments.
Patch by Paul Redmond.

llvm-svn: 56059
2008-09-10 17:31:40 +00:00
Duncan Sands 6d6a65310b Fix name.
llvm-svn: 56055
2008-09-10 13:22:10 +00:00
Duncan Sands 83e45acc25 Add trampoline support for the new FastCC calling
convention (not related to recent Ada testsuite
failures).

llvm-svn: 56054
2008-09-10 13:11:09 +00:00
Duncan Sands 536c399579 Turn off the new FastCC for the moment. It causes
a slew of Ada testsuite failures on x86-32 linux.
Seems to be related to the use of float.

llvm-svn: 56053
2008-09-10 13:09:24 +00:00
Anton Korobeynikov 6acb2219b6 Replace explicit pointer-size constants to TargetData query.
No functionality change.

llvm-svn: 55996
2008-09-09 18:22:57 +00:00
Anton Korobeynikov 2fd24e7713 Reapply 55899: First draft of EH support on x86/64-linux
Now with fix, which prevents subtle codegen bug to trigger on darwin.
No fix for bug though, it's still there.

llvm-svn: 55955
2008-09-08 21:12:47 +00:00
Anton Korobeynikov 4112634ca6 Reapply blindly reverted 55898: Implement FRAME_TO_ARGS_OFFSET for x86-64
llvm-svn: 55954
2008-09-08 21:12:11 +00:00
Bill Wendling 3871441861 Reverting r55898 as well. This wasn't reverted in the original revert...
llvm-svn: 55938
2008-09-08 19:42:32 +00:00
Bill Wendling 99b83712f3 Reverting r55898 to r55909. One of these patches was causing an ICE during the full bootstrap on Darwin:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include
-O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2
-D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc
-I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include
-I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include
-I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include
-DSHARED -m64 -DL_negdi2 -c ../../llvm-gcc.src/gcc/libgcc2.c -o
libgcc/x86_64/_negdi2_s.o
Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) &&
TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical
register live information"), function runOnMachineFunction, file
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp,
line 311.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include
-O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2
-D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc
-I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include
-I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include
-I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include
-DSHARED -m64 -DL_lshrdi3 -c ../../llvm-gcc.src/gcc/libgcc2.c -o
libgcc/x86_64/_lshrdi3_s.o
../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
{standard input}:unknown:Undefined local symbol LBB21_11
{standard input}:unknown:Undefined local symbol LBB21_12
{standard input}:unknown:Undefined local symbol LBB21_13
{standard input}:unknown:Undefined local symbol LBB21_8

llvm-svn: 55928
2008-09-08 17:59:12 +00:00
Anton Korobeynikov 82b9540046 First draft of EH support on x86/64-linux
llvm-svn: 55899
2008-09-08 14:21:53 +00:00
Anton Korobeynikov cb0655d626 Implement FRAME_TO_ARGS_OFFSET for x86-64
llvm-svn: 55898
2008-09-08 14:21:10 +00:00
Evan Cheng 6f343bd543 Some code clean up.
llvm-svn: 55881
2008-09-07 09:07:23 +00:00
Evan Cheng 6c94b99c62 For whatever the reason, x86 CallingConv::Fast (i.e. fastcc) was not passing scalar arguments in registers. This patch defines a new fastcc CC which is slightly different from the FastCall CC. In addition to passing integer arguments in ECX and EDX, it also specify doubles are passed in 8-byte slots which are 8-byte aligned (instead of 4-byte aligned). This avoids a potential performance hazard where doubles span cacheline boundaries.
llvm-svn: 55807
2008-09-04 22:59:58 +00:00
Evan Cheng 3152edf474 Remove code that pad number of bytes to pop for X86_FastCall CC. The code doesn't do the "aligning" for Cygwin, Mingw, and Windows. But aligning it on Darwin and Linux breaks gcc compatibility. That ruled out all the platforms we support!
llvm-svn: 55756
2008-09-04 01:04:15 +00:00
Dale Johannesen da2d80688b Add intrinsics for log, log2, log10, exp, exp2.
No functional change (and no FE change to generate them).

llvm-svn: 55753
2008-09-04 00:47:13 +00:00
Dan Gohman 7bda51f5a4 Create HandlePHINodesInSuccessorBlocksFast, a version of
HandlePHINodesInSuccessorBlocks that works FastISel-style. This
allows PHI nodes to be updated correctly while using FastISel.

This also involves some code reorganization; ValueMap and
MBBMap are now members of the FastISel class, so they needn't
be passed around explicitly anymore. Also, SelectInstructions
is changed to SelectInstruction, and only does one instruction
at a time.

llvm-svn: 55746
2008-09-03 23:12:08 +00:00
Evan Cheng 24422d4928 Let tblgen only generate fastisel routines, not the class definition. This makes it easier for targets to define its own fastisel class.
llvm-svn: 55679
2008-09-03 00:03:49 +00:00
Evan Cheng 3fddc7e906 Swap fp comparison operands and change predicate to allow load folding (safely this time).
llvm-svn: 55553
2008-08-29 23:22:12 +00:00
Evan Cheng b3ed09703c Backing out 55521. Not safe.
llvm-svn: 55548
2008-08-29 22:13:21 +00:00
Evan Cheng 960b17a3c2 Swap fp comparison operands and change predicate to allow load folding.
llvm-svn: 55521
2008-08-28 23:48:31 +00:00
Gabor Greif 95d77f5466 remove tabs, fix > 80 cols
llvm-svn: 55511
2008-08-28 23:19:51 +00:00
Gabor Greif f304a7aa4d erect abstraction boundaries for accessing SDValue members, rename Val -> Node to reflect semantics
llvm-svn: 55504
2008-08-28 21:40:38 +00:00
Rafael Espindola 26d54b3ef3 Use resize instead of reserve. Reserve doesn't change size().
llvm-svn: 55486
2008-08-28 18:32:53 +00:00
Dale Johannesen 41be0d4445 Split the ATOMIC NodeType's to include the size, e.g.
ATOMIC_LOAD_ADD_{8,16,32,64} instead of ATOMIC_LOAD_ADD.
Increased the Hardcoded Constant OpActionsCapacity to match.
Large but boring; no functional change.

This is to support partial-word atomics on ppc; i8 is
not a valid type there, so by the time we get to lowering, the
ATOMIC_LOAD nodes looks the same whether the type was i8 or i32.
The information can be added to the AtomicSDNode, but that is the
largest SDNode; I don't fully understand the SDNode allocation,
but it is sensitive to the largest node size, so increasing
that must be bad.  This is the alternative.

llvm-svn: 55457
2008-08-28 02:44:49 +00:00
Gabor Greif abfdf928d8 disallow direct access to SDValue::ResNo, provide a getter instead
llvm-svn: 55394
2008-08-26 22:36:50 +00:00
Chris Lattner 09f8cef571 If an xmm register is referenced explicitly in an inline asm, make sure to
assign it to a version of the xmm register with the regclass that matches its
type.  This fixes PR2715, a bug handling some crazy xpcom case in mozilla.

llvm-svn: 55358
2008-08-26 06:19:02 +00:00
Evan Cheng f00f1e50b5 Try approach to moving call address load inside of callseq_start. Now it's done during the preprocess of x86 isel. callseq_start's chain is changed to load's chain node; while load's chain is the last of callseq_start or the loads or copytoreg nodes inserted to move arguments to the right spot.
llvm-svn: 55338
2008-08-25 21:27:18 +00:00
Bill Wendling 5b836c5f77 Temporarily reverting r55292. It's causing a bootstraping failure:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc ... src/libiberty/make-temp-file.c -o make-temp-file.o
Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->Dep->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp, line 508.
../../../../llvm-gcc.src/libiberty/hashtab.c:955: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[4]: *** [hashtab.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [multi-do] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-target-libiberty] Error 2
make: *** [all] Error 2

llvm-svn: 55295
2008-08-24 21:45:30 +00:00
Evan Cheng 8fa17424f7 Move callseq_start above the call address load to allow load to be folded into the call node.
llvm-svn: 55292
2008-08-24 19:19:55 +00:00
Bill Wendling 2fd7dbaf1d If part of the mask is "undef", then ignore it as we don't care what goes into it.
llvm-svn: 55147
2008-08-21 22:36:36 +00:00
Bill Wendling 765d3e0013 Fix whitespace. No functionality change.
llvm-svn: 55146
2008-08-21 22:35:37 +00:00
Evan Cheng 9534ea03e8 Fix a number of byval / memcpy / memset related codegen issues.
1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying.
2. Do not use byte repeat move and store operations. These are slow.

llvm-svn: 55139
2008-08-21 21:00:15 +00:00
Mon P Wang 5c2ac4a5e0 Treat floating point ST1 the same as ST0 when lowering for a call result
llvm-svn: 55135
2008-08-21 19:54:16 +00:00
Dan Gohman 02c84b8910 Simplify FastISel's constructor argument list, make the FastISel
class hold a MachineRegisterInfo member, and make the
MachineBasicBlock be passed in to SelectInstructions rather
than the FastISel constructor.

llvm-svn: 55076
2008-08-20 21:05:57 +00:00
Dale Johannesen 6f765f392c Add remaining 64-bit atomic patterns for x86-64.
llvm-svn: 55029
2008-08-20 00:48:50 +00:00
Bill Wendling f00f3055d8 Revert r55018 and apply the correct "fix" for the 64-bit sub_and_fetch atomic.
Just expand it like the other X-bit sub_and_fetches.

llvm-svn: 55023
2008-08-20 00:28:16 +00:00
Dan Gohman daef7f43af Instantiate FastISel for X86.
llvm-svn: 55011
2008-08-19 21:45:35 +00:00
Dan Gohman 4619e93bd3 The X86 target will soon have an implementation of createFastISel.
llvm-svn: 55010
2008-08-19 21:32:53 +00:00
Dale Johannesen 5afbf510aa Add support for 8 and 16 bit forms of __sync
builtins on X86.

Change "lock" instructions to be on a separate line.
This is needed to work around a bug in the Darwin
assembler.

llvm-svn: 54999
2008-08-19 18:47:28 +00:00
Evan Cheng ab35bfdf18 Fix a (u)comiss intrinsic lowering bug. It was using anyext which can return junk in higher bits. Patch by Nate Begeman.
llvm-svn: 54903
2008-08-17 19:22:34 +00:00
Anton Korobeynikov 93584cd5a0 Use correct name for TLS address resolution routine on x86-64
llvm-svn: 54845
2008-08-16 12:58:29 +00:00
Dan Gohman 7c2bf62b14 Also avoid pinsrw and pinsrb with a variable insertelement index.
llvm-svn: 54803
2008-08-14 22:53:18 +00:00
Dan Gohman 65d83ccf26 Don't try to use the insertps instruction for vector
element inserts with non-constant indices. This fixes
CodeGen/X86/vector-variable-idx.ll on machines that
have SSE4.1.

llvm-svn: 54801
2008-08-14 22:43:26 +00:00
Evan Cheng 7823a411d5 Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64.
llvm-svn: 54376
2008-08-05 22:19:15 +00:00
Dan Gohman 8ef79ebd5f Add an assert to catch invalid VECTOR_SHUFFLE mask indices.
llvm-svn: 54329
2008-08-04 23:09:15 +00:00
Andrew Lenharth 77e3e86e70 Add atomic sub for other sizes
llvm-svn: 54314
2008-08-03 20:17:34 +00:00
Dan Gohman 2ce6f2ad5e Rename SDOperand to SDValue.
llvm-svn: 54128
2008-07-27 21:46:04 +00:00
Dan Gohman 91e5dcb680 Tidy SDNode::use_iterator, and complete the transition to have it
parallel its analogue, Value::value_use_iterator. The operator* method
now returns the user, rather than the use.

llvm-svn: 54127
2008-07-27 20:43:25 +00:00
Nate Begeman 283b2da27a Disable mov{L, LP, HP, HLP, *DUP} shuffles for mmx
mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code.

Also commit Mon Ping's VSETCC patch

llvm-svn: 54039
2008-07-25 19:05:58 +00:00
Evan Cheng a2b4b4ad99 Fix PR2485: do all 4-element SSE shuffles in max. of 2 shuffle instructions.
Based on patch by Nicolas Capens.

llvm-svn: 53939
2008-07-23 00:22:17 +00:00
Evan Cheng 0c23ed6364 Factor out SSE 4 wide shuffle lowering code into its own function. No functionality changes.
llvm-svn: 53933
2008-07-22 21:13:36 +00:00
Evan Cheng 0384670141 Fix PR2574: implement v2f32 scalar_to_vector.
llvm-svn: 53927
2008-07-22 18:39:19 +00:00
Duncan Sands b0e3938651 Add VerifyNode, a place to put sanity checks on
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor).  Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check".  My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!

llvm-svn: 53850
2008-07-21 10:20:31 +00:00
Bill Wendling 75840e6435 Fix for first part of PR2562. Generate the "pinsrw" instruction for inserts
into v4i16 vectors.

llvm-svn: 53807
2008-07-20 02:32:23 +00:00
Nate Begeman 55b7becb29 SSE codegen for vsetcc nodes
llvm-svn: 53719
2008-07-17 16:51:19 +00:00
Mon P Wang 1e2c6bfa41 When lowering certain atomics, we need to copy the memoperand from the old
atomic operation to the new one.

llvm-svn: 53714
2008-07-17 04:54:06 +00:00
Evan Cheng cf06fe476f x86-64 PIC JIT fixes: do not generate the extra load for external GV's.
llvm-svn: 53661
2008-07-16 01:34:02 +00:00
Dan Gohman 02c7c6cb33 Include a frame index in the "fixed stack" pseudo source value
instead of using the frame index for the SVOffset, which was
inconsistent.

llvm-svn: 53486
2008-07-11 22:44:52 +00:00
Bill Wendling 5774466a33 The frame address on an x86-64 box needs to be offset by -8, not -4.
llvm-svn: 53450
2008-07-11 07:18:52 +00:00
Dan Gohman 3b46030375 Pool-allocation for MachineInstrs, MachineBasicBlocks, and
MachineMemOperands. The pools are owned by MachineFunctions.

This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.

llvm-svn: 53212
2008-07-07 23:14:23 +00:00
Duncan Sands 93e180342a Rather than having a different custom legalization
hook for each way in which a result type can be
legalized (promotion, expansion, softening etc),
just use one: ReplaceNodeResults, which returns
a node with exactly the same result types as the
node passed to it, but presumably with a bunch of
custom code behind the scenes.  No change if the
new LegalizeTypes infrastructure is not turned on.

llvm-svn: 53137
2008-07-04 11:47:58 +00:00
Duncan Sands 739a0548c4 Add a new getMergeValues method that does not need
to be passed the list of value types, and use this
where appropriate.  Inappropriate places are where
the value type list is already known and may be
long, in which case the existing method is more
efficient.

llvm-svn: 53035
2008-07-02 17:40:58 +00:00