Commit Graph

1554 Commits

Author SHA1 Message Date
Chad Rosier f4e832b14e Enables vararg functions that pass all arguments via registers to be optimized into tail-calls when possible.
llvm-svn: 131560
2011-05-18 19:59:50 +00:00
Eli Friedman d000a2c26e Clean up the mess created by r131467+r131469.
llvm-svn: 131471
2011-05-17 18:02:22 +00:00
Stuart Hastings c65d8eda7b Revert 131467 due to buildbot complaint.
llvm-svn: 131469
2011-05-17 16:59:46 +00:00
Stuart Hastings 3cf5308890 Fix an obscure issue in X86_64 parameter passing: if a tiny byval is
passed as the fifth parameter, insure it's passed correctly (in R9).
rdar://problem/6920088

llvm-svn: 131467
2011-05-17 16:45:55 +00:00
Nadav Rotem d8edb1d5cc Fix a bug in PerformEXTRACT_VECTOR_ELTCombine. The code created an ADD SDNode
with two different types, in cases where the index and the ptr had different
types.

llvm-svn: 131461
2011-05-17 08:31:57 +00:00
Eli Friedman d4a3609d30 Remove dead code. Fix associated test to use FileCheck.
llvm-svn: 131424
2011-05-16 21:28:22 +00:00
Nadav Rotem 8f971c27fb Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector.
llvm-svn: 131179
2011-05-11 08:12:09 +00:00
Eli Friedman 2518f8376d Make the logic for determining function alignment more explicit. No functionality change.
llvm-svn: 131012
2011-05-06 20:34:06 +00:00
Daniel Dunbar cd01ed5bd6 ADT/Triple: Renambe isOSX... methods to isMacOSX for consistency with the OS
triple component.

llvm-svn: 129838
2011-04-20 00:14:25 +00:00
Daniel Dunbar 100455a3c8 Target/X86: Eliminate uses of getDarwinVers().
llvm-svn: 129813
2011-04-19 21:04:12 +00:00
Chris Lattner 0ab5e2cded Fix a ton of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129558
2011-04-15 05:18:47 +00:00
Evan Cheng ee9d45dd55 Don't try to create zero-sized stack objects.
llvm-svn: 128586
2011-03-30 23:44:13 +00:00
Benjamin Kramer 8d2227373d Make helper static.
llvm-svn: 128338
2011-03-26 12:38:19 +00:00
NAKAMURA Takumi 521eb7c11e Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets.
FIXME: Some cleanups would be needed.
llvm-svn: 128206
2011-03-24 07:07:00 +00:00
Andrew Trick 4ab9a16569 Revert r128175.
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.

llvm-svn: 128181
2011-03-23 23:11:02 +00:00
Andrew Trick 4046a0de91 Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS.
(target-specific branchless method for double-width relational comparisons on x86)

llvm-svn: 128175
2011-03-23 22:16:02 +00:00
Evan Cheng 0663f23bd8 Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.
llvm-svn: 127981
2011-03-21 01:19:09 +00:00
Daniel Dunbar 327cd36f74 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng 824a711305 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Nadav Rotem e7a101ccab Add support for legalizing UINT_TO_FP of vectors on platforms which do
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951
2011-03-19 13:09:10 +00:00
Eli Friedman 59721e3238 Revert r127852; it's apparently causing an ICE on mingw.
llvm-svn: 127909
2011-03-18 21:12:29 +00:00
Eli Friedman 1a916a3c0c Add a target-specific branchless method for double-width relational
comparisons on x86.  Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.

This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.

llvm-svn: 127852
2011-03-18 02:34:11 +00:00
Cameron Zwarich 2ef0c69df1 Move more logic into getTypeForExtArgOrReturn.
llvm-svn: 127809
2011-03-17 14:53:37 +00:00
Cameron Zwarich 34e7b3f77e Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn().
llvm-svn: 127807
2011-03-17 14:21:56 +00:00
Cameron Zwarich ac106273d4 The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.

This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.

llvm-svn: 127766
2011-03-16 22:20:18 +00:00
Eric Christopher cf56a5034f Change the x86 32-bit scheduler to register pressure and fix up the
corresponding testcases back to the previous versions.

Fixes some performance regressions only seen on 32-bit.

llvm-svn: 127441
2011-03-11 01:05:58 +00:00
Stuart Hastings d17ae4e939 Revert 127359; it broke lencod.
llvm-svn: 127382
2011-03-10 00:25:53 +00:00
Stuart Hastings 9955e2f912 X86 byval copies no longer always_inline. <rdar://problem/8706628>
llvm-svn: 127359
2011-03-09 21:10:30 +00:00
NAKAMURA Takumi 58d1f93b03 Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4.
llvm-svn: 127328
2011-03-09 11:33:15 +00:00
Benjamin Kramer 679cfb54ec X86: Fix the (saddo/ssub x, 1) -> incl/decl selection to check the right operand for 1.
Found by inspection.

llvm-svn: 127247
2011-03-08 15:20:20 +00:00
Eric Christopher eb19e9e9fc Turn on list-ilp scheduling by default on x86 and x86-64, fix up
testcases accordingly. Some are currently xfailed and will be filed
as bugs to be fixed or understood.

Performance results:

roughly neutral on SPEC
some micro benchmarks in the llvm suite are up between 100 and 150%, only
a pair of regressions that are due to be investigated

john-the-ripper saw:
10% improvement in traditional DES
8% improvement in BSDI DES
59% improvement in FreeBSD MD5
67% improvement in OpenBSD Blowfish
14% improvement in LM DES

Small compile time impact.

llvm-svn: 127208
2011-03-08 02:42:25 +00:00
Cameron Zwarich df61694417 Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.
llvm-svn: 127175
2011-03-07 21:56:36 +00:00
Andrew Trick 641e2d4f8c Increased the register pressure limit on x86_64 from 8 to 12
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.

Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.

Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.

llvm-svn: 127067
2011-03-05 08:00:22 +00:00
David Greene dd567b214b [AVX] Fix mask predicates for 256-bit UNPCKLPS/D and implement
missing patterns for them.

      Add a SIMD test subdirectory to hold tests for SIMD instruction
      selection correctness and quality.
'

llvm-svn: 126845
2011-03-02 17:23:43 +00:00
David Greene 20a1cbefad [AVX] Add decode support for VUNPCKLPS/D instructions, both 128-bit
and 256-bit forms.  Because the number of elements in a vector
      does not determine the vector type (4 elements could be v4f32 or
      v4f64), pass the full type of the vector to decode routines.

llvm-svn: 126664
2011-02-28 19:06:56 +00:00
Owen Anderson b2c80da4ae Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS.
llvm-svn: 126518
2011-02-25 21:41:48 +00:00
Chris Lattner 0152b7bc7c remove command line option debugging hook.
llvm-svn: 126441
2011-02-24 21:53:03 +00:00
David Greene 9a6040dc86 [AVX] General VUNPCKL codegen support.
llvm-svn: 126264
2011-02-22 23:31:46 +00:00
Devang Patel f3292b2196 Revert r124611 - "Keep track of incoming argument's location while emitting LiveIns."
In other words, do not keep track of argument's location.  The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working. 
 - The debugger needs to be aware of prolog_end attribute attached with line table entries.
 - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)

llvm-svn: 126155
2011-02-21 23:21:26 +00:00
Eric Christopher ac6b001f56 If both operands are loads from stores in memory we can't use movlpd/movlps
since one needs to be a register operand. Just use movss instead of forcing
an operand into a register.

Fixes PR9239

llvm-svn: 126072
2011-02-20 05:04:42 +00:00
Eric Christopher c509ff6944 Fix typos.
llvm-svn: 126018
2011-02-19 03:19:09 +00:00
David Greene 3a2b508e8f [AVX] Recorganize X86ShuffleDecode into its own library
(LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a.  Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients.  Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror.  Putting this in its own
library solves both problems at once.

llvm-svn: 125765
2011-02-17 19:18:59 +00:00
Stuart Hastings 81c4306005 Swap VT and DebugLoc operands of getExtLoad() for consistency with
other getNode() methods.  Radar 9002173.

llvm-svn: 125665
2011-02-16 16:23:55 +00:00
Chris Lattner 46c01a30f4 Enhance ComputeMaskedBits to know that aligned frameindexes
have their low bits set to zero.  This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.

Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively.  Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST).  The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.

llvm-svn: 125470
2011-02-13 22:25:43 +00:00
David Greene 79827a5a26 [AVX] Implement 256-bit vector lowering for SCALAR_TO_VECTOR. This
largely completes support for 128-bit fallback lowering for code that
is not 256-bit ready.

llvm-svn: 125315
2011-02-10 23:11:29 +00:00
David Greene ce318e4958 [AVX] Implement 256-bit vector lowering for EXTRACT_VECTOR_ELT.
llvm-svn: 125284
2011-02-10 16:57:36 +00:00
David Greene b36195ab26 [AVX] Implement 256-bit vector lowering for INSERT_VECTOR_ELT.
llvm-svn: 125187
2011-02-09 15:32:06 +00:00
David Greene 10b0db1d5f [AVX] Implement BUILD_VECTOR lowering for 256-bit vectors. For
anything but the simplest of cases, lower a 256-bit BUILD_VECTOR by
splitting it into 128-bit parts and recombining.

llvm-svn: 125105
2011-02-08 19:04:41 +00:00
David Greene 79651c527b [AVX] Insert/extract subvector lowering support. This includes a
couple of utility functions that will be used in other places for more
AVX lowering.

llvm-svn: 125029
2011-02-07 19:36:54 +00:00
NAKAMURA Takumi 1850c80afb Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be enough for caller to allocate one.
llvm-svn: 124949
2011-02-05 15:11:32 +00:00