Commit Graph

4223 Commits

Author SHA1 Message Date
Evan Cheng 2fa281106a Optimize code placement in loop to eliminate unconditional branches or move unconditional branch to the outside of the loop. e.g.
///       A:                                                                                                                                                                 
///       ...                                                                                                                                                                
///       <fallthrough to B>                                                                                                                                                 
///                                                                                                                                                                          
///       B:  --> loop header                                                                                                                                                
///       ...                                                                                                                                                                
///       jcc <cond> C, [exit]                                                                                                                                               
///                                                                                                                                                                          
///       C:                                                                                                                                                                 
///       ...                                                                                                                                                                
///       jmp B                                                                                                                                                              
///                                                                                                                                                                          
/// ==>                                                                                                                                                                      
///                                                                                                                                                                          
///       A:                                                                                                                                                                 
///       ...                                                                                                                                                                
///       jmp B                                                                                                                                                              
///                                                                                                                                                                          
///       C:  --> new loop header                                                                                                                                            
///       ...                                                                                                                                                                
///       <fallthough to B>                                                                                                                                                  
///                                                                                                                                                                          
///       B:                                                                                                                                                                 
///       ...                                                                                                                                                                
///       jcc <cond> C, [exit] 

llvm-svn: 71209
2009-05-08 06:34:09 +00:00
Dale Johannesen 72b6582c0f Use X86AddrNumOperands instead of magic constant one
more place.  This fixes a bunch of x86-64 JIT regressions.
(Introduced when the value of the magic constant changed
in 68645.  At the time apparently nobody noticed; failures
were hidden in 70343-70439 by an unrelated bug, so showed
up again as "new" failures in 70440.)

llvm-svn: 71106
2009-05-06 19:04:30 +00:00
Chris Lattner be9fa506ad Add basic support for code generation of
addrspace(257) -> FS relative on x86.  Patch by Zoltan Varga!

llvm-svn: 70992
2009-05-05 18:52:19 +00:00
Evan Cheng a35aed567a Revert part of 70929 that has to do with determining whether a SIB byte is needed. It causes a lot of x86_64 JIT failures.
llvm-svn: 70986
2009-05-05 18:18:57 +00:00
Evan Cheng c298ccb998 - Avoid the longer SIB encoding on x86_64 when it's not needed.
- Synchronize instruction length computation code in X86InstrInfo with code in X86CodeEmitter.cpp
Patch by Zoltan Varga.

llvm-svn: 70929
2009-05-04 22:49:16 +00:00
Dan Gohman bb525f7e02 X86FastISel doesn't support the -tailcallopt ABI.
llvm-svn: 70902
2009-05-04 19:50:33 +00:00
Argyrios Kyrtzidis 31af617924 Fix compilation for some targets other than x86.
llvm-svn: 70522
2009-04-30 23:50:26 +00:00
Argyrios Kyrtzidis a5037484a4 Make DebugLoc independent of DwarfWriter.
-Replace DebugLocTuple's Source ID with CompileUnit's GlobalVariable*
-Remove DwarfWriter::getOrCreateSourceID
-Make necessary changes for the above (fix callsites, etc.)

llvm-svn: 70520
2009-04-30 23:22:31 +00:00
Dan Gohman db3a57ec5c Set mayLoad on MOVZX32_NOREXrm8 too.
llvm-svn: 70466
2009-04-30 03:11:48 +00:00
Evan Cheng 99578674fd Mark MOV8mr_NOREX and MOV8rm_NOREX as mayStore / mayLoad respectively.
llvm-svn: 70461
2009-04-30 00:58:57 +00:00
Bill Wendling 026e5d7667 Instead of passing in an unsigned value for the optimization level, use an enum,
which better identifies what the optimization is doing. And is more flexible for
future uses.

llvm-svn: 70440
2009-04-29 23:29:43 +00:00
Nate Begeman 7e6e352735 Fix infinite recursion in the C++ code which handles movddup by making it unnecessary.
llvm-svn: 70425
2009-04-29 22:47:44 +00:00
Nate Begeman 5f829d896d Implement review feedback for vector shuffle work.
llvm-svn: 70372
2009-04-29 05:20:52 +00:00
Bill Wendling 084669a1c9 Second attempt:
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.

Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.

llvm-svn: 70343
2009-04-29 00:15:41 +00:00
Anton Korobeynikov dac88bae4f Properly print 'P' modifier on inline asm memory operands.
This should fix PR3379 and PR4064.
Patch inspired by Edwin Török!

llvm-svn: 70328
2009-04-28 21:49:33 +00:00
Bill Wendling 56f2987a87 r70270 isn't ready yet. Back this out. Sorry for the noise.
llvm-svn: 70275
2009-04-28 01:04:53 +00:00
Bill Wendling d0ae15946c Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.

Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...

llvm-svn: 70270
2009-04-28 00:21:31 +00:00
Nate Begeman 8d6d4b9289 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.
PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225
2009-04-27 18:41:29 +00:00
Dan Gohman 2986972118 Rename GR8_ABCD to GR8_ABCD_L and create GR8_ABCD_H, and use these
to precisely describe the h-register subreg register classes.
Thanks to Jakob Stoklund Olesen for spotting this and for the
initial patch!

Also, make getStoreRegOpcode and getLoadRegOpcode aware of the
needs of h registers.

llvm-svn: 70211
2009-04-27 16:41:36 +00:00
Dan Gohman ec542ca65e Rename GR8_, GR16_, GR32_, and GR64_ to GR8_ABCD, GR16_ABCD,
GR32_ABCD, and GR64_ABCD, respectively, to help describe them.

llvm-svn: 70210
2009-04-27 16:33:14 +00:00
Dan Gohman ba99bddf1f Break up long multi-mnemonic strings into separate lines for readability.
llvm-svn: 70209
2009-04-27 15:13:28 +00:00
Mon P Wang e15bf109be Revised 68749 to allow matching of load/stores for address spaces < 256.
llvm-svn: 70197
2009-04-27 07:22:10 +00:00
Chris Lattner 3ad60b18cb add support for detecting process features on win64, patch by
Nicolas Capens!

llvm-svn: 70057
2009-04-25 18:27:23 +00:00
Rafael Espindola c1396a2313 Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not
very elegant, but neither is the tls specification :-(

llvm-svn: 69968
2009-04-24 12:59:40 +00:00
Rafael Espindola b93db668b3 Revert 69952. Causes testsuite failures on linux x86-64.
llvm-svn: 69967
2009-04-24 12:40:33 +00:00
Nate Begeman bb881d66f4 PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952
2009-04-24 03:42:54 +00:00
Dan Gohman 14efb90fcf Add support for printing MO_ExternalSymbol operands in
memory operand tuples. This doesn't ever come up in normal
code however.

llvm-svn: 69848
2009-04-23 00:57:37 +00:00
Duncan Sands 7ce5cc6bd1 Get rid of what looks like a copy-and-pasted typo.
Spotted by gcc-4.5.

llvm-svn: 69673
2009-04-21 09:44:39 +00:00
Rafael Espindola 47ed1f5293 TLS_addr64 and TLS_addr32 define RDI and EAX. They don't use them.
This fixes PR4002.

llvm-svn: 69672
2009-04-21 08:22:09 +00:00
Dan Gohman 1addf64735 Make X86's copyRegToReg able to handle copies to and from subclasses.
This makes the extra copyRegToReg calls in ScheduleDAGSDNodesEmit.cpp
unnecessary. Derived from a patch by Jakob Stoklund Olesen.

llvm-svn: 69635
2009-04-20 22:54:34 +00:00
Bob Wilson f8b85477ae Move duplicated AddLiveIn function from X86 and ARM backends to be a method
in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock.  Thanks for Anton for suggesting this.

llvm-svn: 69615
2009-04-20 18:36:57 +00:00
Mon P Wang 6c8bcf9da1 Fixed a few 64 bit cases in X86InstrInfo::commuteInstruction
llvm-svn: 69417
2009-04-18 05:16:01 +00:00
Bill Wendling 06684350c4 Recommit r69335 and r69336. These were not causing problems.
llvm-svn: 69394
2009-04-17 22:40:38 +00:00
Rafael Espindola 355fe12c82 For general dynamic TLS access we must use
leaq	foo@TLSGD(%rip), %rdi

as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.

llvm-svn: 69350
2009-04-17 14:35:58 +00:00
Bill Wendling 30527b1114 Revert r69335 and r69336. They were causing build failures.
llvm-svn: 69347
2009-04-17 04:19:22 +00:00
Dan Gohman 09dbb0b5e0 MOV8rr_NOREX is a "Move" instruction. This doesn't currently
matter, because this instruction isn't generated until after
things that care.

llvm-svn: 69336
2009-04-17 00:45:17 +00:00
Dan Gohman 74835ce1cb Don't use MOV8rr_NOREX on x86-32. It doesn't actually hurt anything at
present, but it's inconsistent.

llvm-svn: 69335
2009-04-17 00:43:09 +00:00
Rafael Espindola 5e42177a0f fix PR3995. A scale must be 1, 2, 4 or 8.
llvm-svn: 69284
2009-04-16 12:34:53 +00:00
Dan Gohman de7b3e74be Fix 80-column violations.
llvm-svn: 69204
2009-04-15 19:48:57 +00:00
Dan Gohman 6711216e84 Add a folding table entry for MOV8rr_NOREX.
llvm-svn: 69203
2009-04-15 19:48:28 +00:00
Dan Gohman 6f873b446a Fix X86MachineFunctionInfo's doxygen comment.
llvm-svn: 69127
2009-04-15 01:20:18 +00:00
Dan Gohman dd07f638f5 Do for GR16_NOREX what r69049 did for GR8_NOREX, to avoid trouble with
the local register allocator.

llvm-svn: 69115
2009-04-15 00:10:16 +00:00
Dan Gohman 7913ea5e4a Add a new MOV8rr_NOREX, and make X86's copyRegToReg use it when
either the source or destination is a physical h register.

This fixes sqlite3 with the post-RA scheduler enabled.

llvm-svn: 69111
2009-04-15 00:04:23 +00:00
Dan Gohman 821e13a8f4 GR8_NOREX can contain the H registers, since they don't require
REX prefixes.

llvm-svn: 69108
2009-04-15 00:00:48 +00:00
Dan Gohman 62f4498646 For the h-register addressing-mode trick, use the correct value for
any non-address uses of the address value. This fixes 186.crafty.

llvm-svn: 69094
2009-04-14 22:45:05 +00:00
Evan Cheng dfbbf5c043 Some of GR8_NOREX registers are only available in 64-bit mode.
llvm-svn: 69049
2009-04-14 16:57:43 +00:00
Dan Gohman 6c1426308c Rename COPY_TO_SUBCLASS to COPY_TO_REGCLASS, and generalize
it accordingly. Thanks to Jakob Stoklund Olesen for pointing
out how this might be useful.

llvm-svn: 68986
2009-04-13 21:06:25 +00:00
Devang Patel 80be3511ed Reapply 68847.
Now debug_inlined section is covered by TAI->doesDwarfUsesInlineInfoSection(), which is false by default.

llvm-svn: 68964
2009-04-13 17:02:03 +00:00
Dan Gohman 57d6bd36b2 Implement x86 h-register extract support.
- Add patterns for h-register extract, which avoids a shift and mask,
   and in some cases a temporary register.
 - Add address-mode matching for turning (X>>(8-n))&(255<<n), where
   n is a valid address-mode scale value, into an h-register extract
   and a scaled-offset address.
 - Replace X86's MOV32to32_ and related instructions with the new
   target-independent COPY_TO_SUBREG instruction.

On x86-64 there are complicated constraints on h registers, and
CodeGen doesn't currently provide a high-level way to express all of them,
so they are handled with a bunch of special code. This code currently only
supports extracts where the result is used by a zero-extend or a store,
though these are fairly common.

These transformations are not always beneficial; since there are only
4 h registers, they sometimes require extra move instructions, and
this sometimes increases register pressure because it can force out
values that would otherwise be in one of those registers. However,
this appears to be relatively uncommon.

llvm-svn: 68962
2009-04-13 16:09:41 +00:00
Dan Gohman f20462c217 Remove x86's special-case handling for ISD::TRUNCATE and
ISD::SIGN_EXTEND_INREG. Tablegen-generated code can handle
these cases, and the scheduling issues observed earlier
appear to be resolved now.

llvm-svn: 68959
2009-04-13 15:29:31 +00:00