Commit Graph

2285 Commits

Author SHA1 Message Date
Jakub Staszak 40ee5674cd Remove unneeded function, since PR8156 was fixed over a year ago.
llvm-svn: 169534
2012-12-06 19:05:46 +00:00
Jakub Staszak 65ca2fb9e6 Simplify code.
llvm-svn: 169521
2012-12-06 18:22:59 +00:00
Evan Cheng 5213139f48 Let targets provide hooks that compute known zero and ones for any_extend
and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.

define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
  %tmp1 = icmp ult i32 %a, 100
  br i1 %tmp1, label %bb1, label %bb2
bb1:
  %tmp2 = load i16* %ptr, align 2
  br label %bb2
bb2:
  %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
  %cmp = icmp ult i16 %tmp3, 24
  br i1 %cmp, label %bb3, label %exit
bb3:
  call void @bar() nounwind
  br label %exit
exit:
  ret void
}

This compiles to the followings before:
        push    {lr}
        mov     r2, #0
        cmp     r1, #99
        bhi     LBB0_2
@ BB#1:                                 @ %bb1
        ldrh    r2, [r0]
LBB0_2:                                 @ %bb2
        uxth    r0, r2
        cmp     r0, #23
        bhi     LBB0_4
@ BB#3:                                 @ %bb3
        bl      _bar
LBB0_4:                                 @ %exit
        pop     {lr}
        bx      lr

The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.

rdar://12771555

llvm-svn: 169459
2012-12-06 01:28:01 +00:00
Elena Demikhovsky cd3c1c4a16 Simplified BLEND pattern matching for shuffles.
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2.

llvm-svn: 169366
2012-12-05 09:24:57 +00:00
Evan Cheng d31802c1f6 Add x86 isel lowering logic to form bit test with inverted condition. e.g.
x ^ -1.

Patch by David Majnemer.
rdar://12755626

llvm-svn: 169339
2012-12-05 00:10:38 +00:00
Chandler Carruth ed0881b2a6 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Shuxin Yang abcc370423 rdar://12100355 (part 1)
This revision attempts to recognize following population-count pattern:

 while(a) { c++; ... ; a &= a - 1; ... },
  where <c> and <a>could be used multiple times in the loop body.

 TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent 
instruction sequence, which need to be improved in the following commits.

Reviewed by Nadav, really appreciate!

llvm-svn: 168931
2012-11-29 19:38:54 +00:00
Elena Demikhovsky eace43bff7 I changed hasAVX() to hasFp256() and hasAVX2() to hasInt256() in X86IselLowering.cpp.
The logic was not changed, only names.

llvm-svn: 168875
2012-11-29 12:44:59 +00:00
Jakub Staszak a17f3f8c30 Normalize splat 256bit vectors with 8 elements.
llvm-svn: 168600
2012-11-26 19:24:31 +00:00
Craig Topper c8c28d1ff0 Mark ISD::FMA as Legal instead of custom for x86 with FMA3/FMA4. Needed so that llvm.muladd can be converted to ISD::FMA for fp_contract.
llvm-svn: 168413
2012-11-21 05:36:24 +00:00
Duncan Sands d71b4e4568 Add the Erlang/HiPE calling convention, patch by Yiannis Tsiouris.
llvm-svn: 168166
2012-11-16 12:36:39 +00:00
Craig Topper 70601ba6f9 Use roundps/pd for llvm.ceil, llvm.trunc, llvm.rint, and llvm.nearbyint of vector types.
llvm-svn: 168141
2012-11-16 06:37:56 +00:00
Craig Topper 61d045781a Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics.
llvm-svn: 168025
2012-11-15 06:51:10 +00:00
Benjamin Kramer 6293429b51 X86: Enable SSE memory intrinsics even when stack alignment is less than 16 bytes.
The stack realignment code was fixed to work when there is stack realignment and
a dynamic alloca is present so this shouldn't cause correctness issues anymore.

Note that this also enables generation of AVX instructions for memset
under the assumptions:
- Unaligned loads/stores are always fast on CPUs supporting AVX
- AVX is not slower than SSE
We may need some tweaked heuristics if one of those assumptions turns out not to
be true.

Effectively reverts r58317. Part of PR2962.

llvm-svn: 167967
2012-11-14 20:08:40 +00:00
Craig Topper a7f489d1ab Factor out an overly replicated typecast. No functional change.
llvm-svn: 167916
2012-11-14 06:41:09 +00:00
Manman Ren 0f3240d3a7 X86: when constructing VZEXT_LOAD from other loads, makes sure its output
chain is correctly setup.

As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.

rdar://12684358

llvm-svn: 167859
2012-11-13 19:13:05 +00:00
Michael Liao d39c0fb19f Fix PR14314
- Fix operand order for atomic sub, where the minuend is the value
  loaded from memory and the subtrahend is the parameter specified.

llvm-svn: 167718
2012-11-12 06:49:17 +00:00
Craig Topper dd13d3fda1 Move some helper methods to being static functions in the implementation file.
llvm-svn: 167696
2012-11-11 22:45:02 +00:00
Craig Topper a43e2fd3eb Remove unnecessary subtraction and addition by 1 around a couple for loops.
llvm-svn: 167673
2012-11-10 09:25:36 +00:00
Craig Topper 84afbf2b02 Tidy up spacing. No functional change.
llvm-svn: 167671
2012-11-10 09:02:47 +00:00
Craig Topper f5d527401f Simplify custom emitter code for pcmp(e/i)str(i/m) and make the helper functions static.
llvm-svn: 167669
2012-11-10 08:57:41 +00:00
Craig Topper 9268c94b15 Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support.
llvm-svn: 167652
2012-11-10 01:23:36 +00:00
Nadav Rotem d1e906e1f1 indent
llvm-svn: 167607
2012-11-09 07:02:24 +00:00
Michael Liao 73cffddb95 Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573
2012-11-08 07:28:54 +00:00
Jakub Staszak 7d6ee3e1b4 Simplify code. No functionality change.
llvm-svn: 167505
2012-11-06 23:52:19 +00:00
Nadav Rotem 1c89744f32 Make the helper functions static. No functional change.
llvm-svn: 167501
2012-11-06 23:36:00 +00:00
Nadav Rotem f036ca466e CostModel: add another known vector trunc optimization.
llvm-svn: 167488
2012-11-06 21:17:17 +00:00
Nadav Rotem 0914f0b262 Cost Model: add tables for some avx type-conversion hacks.
llvm-svn: 167480
2012-11-06 19:33:53 +00:00
Nadav Rotem 48c5b8e659 Refactor the getTypeLegalizationCost interface. No functionality change.
llvm-svn: 167422
2012-11-05 23:57:45 +00:00
Nadav Rotem c378a8067d CostModel: Add tables for the common x86 compares.
llvm-svn: 167421
2012-11-05 23:48:20 +00:00
Richard Smith 18d2762048 Suppress signed/unsigned comparison warning.
llvm-svn: 167410
2012-11-05 22:01:44 +00:00
Nadav Rotem 856ffa6677 Cost Model: Normalize the insert/extract index when splitting types
llvm-svn: 167402
2012-11-05 21:12:13 +00:00
Nadav Rotem 7411623fd8 Implement the cost of abnormal x86 instruction lowering as a table.
llvm-svn: 167395
2012-11-05 19:32:46 +00:00
Nadav Rotem c2345cbe73 X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2.
llvm-svn: 167347
2012-11-03 00:39:56 +00:00
Chandler Carruth 5da3f0512e Revert the majority of the next patch in the address space series:
r165941: Resubmit the changes to llvm core to update the functions to
         support different pointer sizes on a per address space basis.

Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.

However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.

In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.

In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.

This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.

llvm-svn: 167222
2012-11-01 09:14:31 +00:00
Shuxin Yang 01efdd6c28 (For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization.
The adc/sbb optimization is to able to convert following expression
into a single adc/sbb instruction:
  (ult) ... = x + 1 // where the ult is unsigned-less-than comparison
  (ult) ... = x - 1

  This change is to flip the "x >u y" (i.e. ugt comparison) in order 
to expose the adc/sbb opportunity.

llvm-svn: 167180
2012-10-31 23:11:48 +00:00
Michael Liao e2d7e4e8e5 Clean up redundant SP register maintained in X86 TLI
llvm-svn: 167104
2012-10-31 04:14:09 +00:00
Manman Ren acb8becc73 X86 MMX: optimize transfer from mmx to i32
We used to generate a store (movq) + a load.
Now we use movd.

rdar://9946746

llvm-svn: 167056
2012-10-30 22:15:38 +00:00
Jakub Staszak a3d8e9974a Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chance
to test it with chapni's fix (-mattr=+avx).

llvm-svn: 166985
2012-10-30 00:01:57 +00:00
Jakub Staszak d74cb61d86 Revert r166971. It causes buildbot failure. To be investigated.
llvm-svn: 166979
2012-10-29 23:13:50 +00:00
Jakub Staszak c3a92131dc Remove unused variable.
llvm-svn: 166973
2012-10-29 22:04:32 +00:00
Jakub Staszak 9c361bdfeb Simplify code. No functionality change.
llvm-svn: 166972
2012-10-29 22:02:26 +00:00
Jakub Staszak c8f4825ba6 Allow to fold vector load if there is more than one bitcast, so in the case:
%0 = load <8 x i16>* %dest
%1 = shufflevector <8 x i16> %0, <8 x i16> %in,
      <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14>
store <8 x i16> %1, <8 x i16>* %dest

We get:
  vmovlpd (%eax), %xmm0, %xmm0

instead of:
  vmovaps (%eax), %xmm1
  vmovsd  %xmm1, %xmm0, %xmm0

No extra test-case is added. I just fixed the existing one
(also it uses FileCheck now).

llvm-svn: 166971
2012-10-29 21:56:35 +00:00
Duncan Sands ac8448e0d0 Silence a GCC warning about comparing signed and unsigned types.
llvm-svn: 166922
2012-10-29 11:29:53 +00:00
Michael Liao 6d810bd9b8 Clean up where SlotSize should be used instead of pointer size.
llvm-svn: 166664
2012-10-25 06:29:14 +00:00
Michael Liao c5af149e70 Add custom conversion from v2u32 to v2f32 in 32-bit mode
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
  v2f32 is added to improve the efficiency of the code generated.

llvm-svn: 166545
2012-10-24 04:09:32 +00:00
Michael Liao 2843625bb5 Fix PR14161
- Check index being extracted to be constant 0 before simplfiying.
  Otherwise, retain the original sequence.

llvm-svn: 166504
2012-10-23 21:40:15 +00:00
Matt Beaumont-Gay bdcebd323a Silence -Wsign-compare
llvm-svn: 166494
2012-10-23 19:46:36 +00:00
Michael Liao c03c03d56e Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce
  DAG combine overhead.
- Extend the support to v4i16/v8i16 as well.

llvm-svn: 166487
2012-10-23 17:36:08 +00:00
Michael Liao 1be96bb5ce Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1
llvm-svn: 166486
2012-10-23 17:34:00 +00:00
Shuxin Yang cdde059a34 This patch is to fix radar://8426430. It is about llvm support of __builtin_debugtrap()
which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().

  The X86 backend is already able to handle debugtrap(). This patch is to:
  1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
  2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
     make the __builtin_debugtrap() "available" to all existing ports without the hassle of
     changing their code.
  3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
     __builtin_trap() will be expanded into the function call of the specified trap function.
    This behavior may need change in the future.

  The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86. 

llvm-svn: 166300
2012-10-19 20:11:16 +00:00
Michael Liao 4b7ccfcaad Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86
- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
  sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
  INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
  (so far 1) elements being inserted.

llvm-svn: 166288
2012-10-19 17:15:18 +00:00
Michael Liao cef9541dac Check SSSE3 instead of SSE4.1
- All shuffle insns required, especially PSHUB, are added in SSSE3.

llvm-svn: 166086
2012-10-17 03:59:18 +00:00
Michael Liao 6f7206132f Fix setjmp on models with non-Small code model nor non-Static relocation model
- MBB address is only valid as an immediate value in Small & Static
  code/relocation models. On other models, LEA is needed to load IP address of
  the restore MBB.
- A minor fix of MBB in MC lowering is added as well to enable target
  relocation flag being propagated into MC.

llvm-svn: 166084
2012-10-17 02:22:27 +00:00
Michael Liao 02ca34541e Support v8f32 to v8i8/vi816 conversion through custom lowering
- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
  vector element-wise widening) to reduce DAG combiner and its overhead added
  in X86 backend.

llvm-svn: 166036
2012-10-16 18:14:11 +00:00
NAKAMURA Takumi 1705a999fa Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>.
Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

Original message since r165661:

My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the *ORIGINAL* condition code.

llvm-svn: 166017
2012-10-16 06:28:34 +00:00
Michael Liao 97bf363a9e Add __builtin_setjmp/_longjmp supprt in X86 backend
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
  used as a light-weight replacement of setjmp/longjmp which are used to
  implementation continuation, user-level threading, and etc. The support added
  in this patch ONLY addresses this usage and is NOT intended to support SjLj
  exception handling as zero-cost DWARF exception handling is used by default
  in X86.

llvm-svn: 165989
2012-10-15 22:39:43 +00:00
Micah Villmow 4bb926d91d Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis.
llvm-svn: 165941
2012-10-15 16:24:29 +00:00
Benjamin Kramer ecd15d7f6c X86: Fix accidentally swapped operands.
llvm-svn: 165871
2012-10-13 12:50:19 +00:00
Benjamin Kramer d6b9362fc2 X86: Promote i8 cmov when both operands are coming from truncates of the same width.
X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this
level is often not a good idea because it's too late for many optimizations to
kick in. This solution doesn't add any extensions (truncs are free) and tries
to avoid introducing partial register stalls by filtering direct copyfromregs.

I'm seeing a ~10% speedup on reading a random .png file with libpng15 via
graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture.

llvm-svn: 165868
2012-10-13 10:39:49 +00:00
Micah Villmow 0c61134d8d Revert 165732 for further review.
llvm-svn: 165747
2012-10-11 21:27:41 +00:00
Micah Villmow 083189730e Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly.
llvm-svn: 165726
2012-10-11 17:21:41 +00:00
NAKAMURA Takumi da0730c2d7 Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>."
It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode.

llvm-svn: 165692
2012-10-11 02:02:05 +00:00
Evan Cheng 60a25a571e Change MachineInstrBuilder::addDisp to copy over target flags by default.
llvm-svn: 165677
2012-10-11 00:15:48 +00:00
Nadav Rotem 17418964f8 Patch by Shuxin Yang <shuxin.llvm@gmail.com>.
Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

llvm-svn: 165661
2012-10-10 21:31:55 +00:00
Michael Liao e999b865dd Add support for FP_ROUND from v2f64 to v2f32
- Due to the current matching vector elements constraints in
  ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
  v2f32) is scalarized. Add a customized v2f32 widening to convert it
  into a target-specific X86ISD::VFPROUND to work around this
  constraints.

llvm-svn: 165631
2012-10-10 16:53:28 +00:00
Michael Liao effae0c8e1 Add alternative support for FP_ROUND from v2f32 to v2f64
- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
  rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
  to convert it into a target-specific X86ISD::VFPEXT to work around this
  constraints. This patch also reverts a previous attempt to fix this issue by
  recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
  reduces the overhead of supporting non-power-2 vector FP extend.

llvm-svn: 165625
2012-10-10 16:32:15 +00:00
Evan Cheng 3903e1be01 When expanding atomic load arith instructions, do not lose target flags. rdar://12453106
llvm-svn: 165568
2012-10-09 23:48:33 +00:00
Bill Wendling c9b22d735a Create enums for the different attributes.
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488
2012-10-09 07:45:08 +00:00
Micah Villmow cdfe20b97f Move TargetData to DataLayout.
llvm-svn: 165402
2012-10-08 16:38:25 +00:00
Preston Gurd 0d67f5106c This patch corrects commit 165126 by using an integer bit width instead of
a pointer to a type, in order to remove the uses of getGlobalContext().

Patch by Tyler Nowicki.

llvm-svn: 165255
2012-10-04 21:33:40 +00:00
Michael Liao f54249b55f Add register encoding support in X86 backend
- Add 'HwEncoding' for X86 registers and call getEncodingValue() to
  retrieve their encoding values.
- This's the first step to adopt new scheme. Furthur revising is onging.

llvm-svn: 165241
2012-10-04 19:50:43 +00:00
Bill Wendling b0a290ef9e Use new accessor methods to query for attributes.
llvm-svn: 165205
2012-10-04 06:43:21 +00:00
Michael Liao d60d8143cf Clean up tailing whitespaces
llvm-svn: 165182
2012-10-03 23:43:52 +00:00
Craig Topper 4f1c8caf2f Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an EVT and add llvm_unreachable to the switches. Helps it compile to dramatically better code.
llvm-svn: 164919
2012-09-30 19:49:56 +00:00
Bill Wendling 863bab689a Remove the `hasFnAttr' method from Function.
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725
2012-09-26 21:48:26 +00:00
Michael Liao de51caf2a0 Add missing i64 max/min/umax/umin on 32-bit target
- Turn on atomic6432.ll and add specific test case as well

llvm-svn: 164616
2012-09-25 18:08:13 +00:00
Evan Cheng 446ff28df1 Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511
llvm-svn: 164588
2012-09-25 05:32:34 +00:00
Michael Liao a880186030 Add missing i8 max/min/umax/umin support
- Fix PR5145 and turn on test 8-bit atomic ops

llvm-svn: 164358
2012-09-21 03:18:52 +00:00
Michael Liao c33bebff52 Revise td of X86 atomic instructions
- Rewirte most atomic instructions in templates for both better
  maintenance and future extensions, such as HLE in TSX.

llvm-svn: 164357
2012-09-21 03:00:17 +00:00
Michael Liao 3237662b65 Re-work X86 code generation of atomic ops with spin-loop
- Rewrite/merge pseudo-atomic instruction emitters to address the
  following issue:
  * Reduce one unnecessary load in spin-loop

    previously the spin-loop looks like

        thisMBB:
        newMBB:
          ld  t1 = [bitinstr.addr]
          op  t2 = t1, [bitinstr.val]
          not t3 = t2  (if Invert)
          mov EAX = t1
          lcs dest = [bitinstr.addr], t3  [EAX is implicit]
          bz  newMBB
          fallthrough -->nextMBB

    the 'ld' at the beginning of newMBB should be lift out of the loop
    as lcs (or CMPXCHG on x86) will load the current memory value into
    EAX. This loop is refined as:

        thisMBB:
          EAX = LOAD [MI.addr]
        mainMBB:
          t1 = OP [MI.val], EAX
          LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
          JNE mainMBB
        sinkMBB:

  * Remove immopc as, so far, all pseudo-atomic instructions has
    all-register form only, there is no immedidate operand.

  * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
    td

  * Fix issues in PR13458

- Add comprehensive tests on atomic ops on various data types.
  NOTE: Some of them are turned off due to missing functionality.

- Revise tests due to the new spin-loop generated.

llvm-svn: 164281
2012-09-20 03:06:15 +00:00
Benjamin Kramer ece434252c X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math.
This was only an issue if sse is disabled.

llvm-svn: 163967
2012-09-15 12:44:27 +00:00
Michael Liao 8b48bf27b0 Fix comment
llvm-svn: 163835
2012-09-13 20:30:16 +00:00
Michael Liao 137f8aedea Add wider vector/integer support for PR12312
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
  integer. If more than 1 fully evaluated vectors are found, POR them
  first followed by the final PTEST.

llvm-svn: 163832
2012-09-13 20:24:54 +00:00
Michael Liao abb87d4857 Fix PR11985
- BlockAddress has no support of BA + offset form and there is no way to
  propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
  simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
  support BA + offset addressing.

llvm-svn: 163743
2012-09-12 21:43:09 +00:00
Craig Topper ad495964f1 Indentation fixes. No functional change.
llvm-svn: 163682
2012-09-12 06:20:41 +00:00
Craig Topper a29ed865d0 Make a bunch of lowering helper functions static instead of member functions. No functional change.
llvm-svn: 163596
2012-09-11 06:15:32 +00:00
Dmitri Gribenko ca1e27be0d Remove redundant semicolons which are null statements.
llvm-svn: 163547
2012-09-10 21:26:47 +00:00
Michael Liao 400f7ef871 Enhance PR11334 fix to support extload from v2f32/v4f32
- Fix an remaining issue of PR11674 as well

llvm-svn: 163528
2012-09-10 18:33:51 +00:00
Michael Liao c3d5b21c39 Add boolean simplification support from CMOV
- If a boolean value is generated from CMOV and tested as boolean value,
  simplify the use of test result by referencing the original condition.
  RDRAND intrinisc is one of such cases.

llvm-svn: 163516
2012-09-10 16:36:16 +00:00
Elena Demikhovsky 264fb0217e The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer.
I've added the "zeroinitializer" case in this patch.

llvm-svn: 163506
2012-09-10 12:13:11 +00:00
Craig Topper 4ed79bd7d7 Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled.
llvm-svn: 163473
2012-09-08 17:42:27 +00:00
Craig Topper 0955a9f4e1 Use 256-bit alignment for constant pool value for 256-bit vector FNEG lowering.
llvm-svn: 163463
2012-09-08 07:46:05 +00:00
Craig Topper 98f2e861a0 Add support for lowering FABS of vector types.
llvm-svn: 163461
2012-09-08 07:31:51 +00:00
Craig Topper 3e41a5bb31 Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct.
llvm-svn: 163458
2012-09-08 04:58:43 +00:00
Elena Demikhovsky 42777877c2 AVX2 optimization.
Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible.

llvm-svn: 163312
2012-09-06 12:42:01 +00:00
Michael Liao 2d95a2b5c4 Remove duplicated helper function
llvm-svn: 163295
2012-09-06 07:11:22 +00:00
Craig Topper f3e4aa8cdd Use iPTR instead of i32 for extract_subvector/insert_subvector index in lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder.
llvm-svn: 163293
2012-09-06 06:09:01 +00:00
Roman Divacky ad06cee239 Stop casting away const qualifier needlessly.
llvm-svn: 163258
2012-09-05 22:26:57 +00:00
Preston Gurd cdf540d5d6 Generic Bypass Slow Div
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder,  or both.

Patch by Tyler Nowicki!

llvm-svn: 163150
2012-09-04 18:22:17 +00:00
Elena Demikhovsky cbe99bbb36 This patch optimizes shuffle instruction - generates 2 instructions instead of 4.
Since this specific shuffle is widely used in many workloads we have ~10% performance on them.

shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>

vmovaps (%rdx), %ymm0
vshufps $8, %ymm0, %ymm0, %ymm0
vmovaps (%rcx), %ymm1
vshufps $8, %ymm0, %ymm1, %ymm1
vunpcklps       %ymm0, %ymm1, %ymm0

vmovaps (%rcx), %ymm0
vmovsldup       (%rdx), %ymm1
vblendps        $85, %ymm0, %ymm1, %ymm0

llvm-svn: 163134
2012-09-04 12:49:02 +00:00
Craig Topper d6cc4062be Typos
llvm-svn: 163053
2012-09-01 06:33:50 +00:00
Manman Ren 26c5d0f607 SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure its
output chain is correctly setup.

As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.

rdar://11457792

llvm-svn: 163036
2012-08-31 23:16:57 +00:00
Michael Liao 3224543bf9 Fix PR12359
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
  well as PSHUFB will zero elements with negative indices.

  Patch by Sriram Murali <sriram.murali@intel.com>

llvm-svn: 163018
2012-08-31 20:12:31 +00:00
Craig Topper c30fdbc46c Add support for converting llvm.fma to fma4 instructions.
llvm-svn: 162999
2012-08-31 15:40:30 +00:00
Craig Topper e39ad7b549 Only perform DAG combine on FMAs of legal types.
llvm-svn: 162892
2012-08-30 06:56:15 +00:00
Craig Topper a999c66292 Convert FMA4 patterns to use target specific nodes instead of intrinsics to align with FMA3.
llvm-svn: 162829
2012-08-29 07:18:25 +00:00
Michael Liao 407d659fa5 Add comments on the literal value used.
llvm-svn: 162805
2012-08-28 23:42:17 +00:00
Michael Liao 710e1a594b Explicitly update the number of nodes to be traversed
llvm-svn: 162780
2012-08-28 19:20:29 +00:00
Michael Liao b7d85b6328 Fix PR12312
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
  Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
  X86ISD::OR node has only its flag result being used as a boolean value and
  all its leaves are extracted from the same vector, it could be folded into an
  X86ISD::PTEST node.

llvm-svn: 162735
2012-08-28 03:34:40 +00:00
Craig Topper a737ef8964 Remove MMX shift intrinsic handling code that also exists in SelectionDAGBuilder.
llvm-svn: 162661
2012-08-27 08:08:30 +00:00
Craig Topper 663d160adb Custom lower FMA intrinsics to target specific nodes and remove the patterns.
llvm-svn: 162534
2012-08-24 04:03:22 +00:00
Michael Liao 10ff96ce8c fix a case where all operands of BUILD_VECTOR are undefined
llvm-svn: 162214
2012-08-20 17:59:18 +00:00
Nadav Rotem 178250ad87 When unsafe math is used, we can use commutative FMAX and FMIN. In some cases
this allows for better code generation.

Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.

For example:

  movaps  %xmm0, %xmm1
  movsd LC(%rip), %xmm0
  minsd %xmm1, %xmm0

becomes:

  minsd LC(%rip), %xmm0

llvm-svn: 162187
2012-08-19 13:06:16 +00:00
Nadav Rotem a136939fa9 Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code.
llvm-svn: 162172
2012-08-18 17:53:03 +00:00
Craig Topper 0128f9bad7 Refactor code a bit to reduce number of calls in the final compiled code. No functional change intended.
llvm-svn: 162166
2012-08-18 06:39:34 +00:00
Nadav Rotem c324af609e Revert r162160 because it made a few buildbots fail.
llvm-svn: 162164
2012-08-18 05:02:36 +00:00
Nadav Rotem 2cb14a5c4b The X86 backend has a number of optimizations for SETCC nodes which use
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.

Before:
  xorl  %esi, %edi
  testb %dil, %dil
  setne %al
  ret

After:
  xorb  %dil, %sil
  setne %al
  ret

rdar://12081007

llvm-svn: 162160
2012-08-18 02:43:28 +00:00
Craig Topper 31625574db Use nested switch to select arguments to reduce calls to EmitPCMP.
llvm-svn: 162089
2012-08-17 07:15:56 +00:00
Craig Topper 602e1abe0d Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to reduce to only a single call to it thus allowing it to be inlined by the compiler.
llvm-svn: 162088
2012-08-17 06:55:11 +00:00
Michael Liao 06f6fe875a minor fix of X86ISD::VSEXT_MOVL dump
llvm-svn: 161902
2012-08-14 22:53:17 +00:00
Michael Liao 34107b9177 fix PR11334
- FP_EXTEND only support extending from vectors with matching elements.
  This results in the scalarization of extending to v2f64 from v2f32,
  which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
  extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.

llvm-svn: 161894
2012-08-14 21:24:47 +00:00
Craig Topper 925a281b00 Factor duplicate calls to getUNDEF in several functions.
llvm-svn: 161860
2012-08-14 08:18:43 +00:00
Craig Topper d0d4b11f66 Re-factor intrinsic lowering to combine common parts of similar intrinsics. Reduces compiled code size a little bit.
llvm-svn: 161859
2012-08-14 07:43:25 +00:00
Craig Topper 4e5eb72735 Tidy up VSETCC lowering code a bit more by adding an llvm_unreachable and putting an a couple if conditions in a better order.
llvm-svn: 161746
2012-08-13 03:42:38 +00:00
Craig Topper 5145a0d967 Refactor code a bit to share commonalities. No functional change intended.
llvm-svn: 161745
2012-08-13 02:34:03 +00:00
Craig Topper ff6e4d1928 Fix an unused variable warning from r161742.
llvm-svn: 161743
2012-08-13 01:26:45 +00:00
Craig Topper a7aaa62d54 Remove the LowerMMXCONCAT_VECTORS function. It could never execute because there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result.
llvm-svn: 161742
2012-08-13 01:23:55 +00:00
Craig Topper 3d2b271362 Remove call to setOperationAction for SETCC of v4f32. SETCC returns an integer type not an FP type.
llvm-svn: 161738
2012-08-12 05:31:32 +00:00
Craig Topper 498228d089 Remove unnecessary call to setOperationAction for SETCC of v2i64 under SSE42. It was already called for the same under SSE2.
llvm-svn: 161737
2012-08-12 05:15:16 +00:00
Craig Topper 10a8bf3b8c Make replace many calls to getSizeInBits() with is128BitVector/is256BitVector
llvm-svn: 161734
2012-08-12 02:23:29 +00:00
Craig Topper 03d2787275 Use MVT.isXBitVector instead of EVT.isXBitVector when setting up operation actions. Compiles to smaller code.
llvm-svn: 161733
2012-08-12 00:34:56 +00:00
Michael Liao e7e828fd64 fix PR13577, an issue introduced by r161687
- FCMOV only supports a subset of X86 conditions. Skip boolean
  simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.

llvm-svn: 161732
2012-08-11 23:47:06 +00:00
Craig Topper b5bcf58ba1 Move setOperationAction for CONCAT_VECTORS for 256-bit vectors into loop since all 256-bit types are supported.
llvm-svn: 161730
2012-08-11 22:34:26 +00:00
Michael Liao 5248e9913f add X86-specific DAG optimization to simplify boolean test
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
  generated from X86ISD::SETCC, try to simplify the boolean value
  generation and checking by reusing the original EFLAGS with proper
  condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
  consuming EFLAGS

part of patches fixing PR12312

llvm-svn: 161687
2012-08-10 19:58:13 +00:00
Michael Liao ea7d906b0f remove tailing whitespaces and test commit
llvm-svn: 161664
2012-08-10 14:39:24 +00:00
Joerg Sonnenberger aa2f801ca3 Add some missing includes for the build against stdcxx.
llvm-svn: 161657
2012-08-10 10:53:56 +00:00
Manman Ren 1be131ba27 X86: enable CSE between CMP and SUB
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.

Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.

rdar://11873276

llvm-svn: 161462
2012-08-08 00:51:41 +00:00
Evan Cheng fbdd25c135 X86 cmp lowering is looking past truncate on the condition node. It should only
do so when the high bits are known zero. This caused a subtle miscompilation.

rdar://12027825 

llvm-svn: 161451
2012-08-07 22:21:00 +00:00
Craig Topper ab47fe4e16 Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.
llvm-svn: 161318
2012-08-06 06:22:36 +00:00
Craig Topper 6d0408d3a5 Remove custom inserter for MWAIT. It doesn't do anything that couldn't be represented in a pattern.
llvm-svn: 161306
2012-08-05 00:36:57 +00:00
Craig Topper 43ee9fae92 Use a COPY node instead of an explicit MOVA opcode in the custom insterter for pcmpestrm/pcmpistrm. Allows the register allocator to handle it better and prevent wasted identity moves.
llvm-svn: 161305
2012-08-05 00:17:48 +00:00
Bob Wilson 3e6fa462f3 Fall back to selection DAG isel for calls to builtin functions.
Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>

llvm-svn: 161232
2012-08-03 04:06:28 +00:00
Chad Rosier 24c19d20c0 Whitespace.
llvm-svn: 161122
2012-08-01 18:39:17 +00:00
Elena Demikhovsky 3cb3b0045c Added FMA functionality to X86 target.
llvm-svn: 161110
2012-08-01 12:06:00 +00:00
Rafael Espindola 11c38b9657 When a return struct pointer is passed in registers, the called has nothing
to pop.

llvm-svn: 160725
2012-07-25 13:41:10 +00:00
Sylvestre Ledru 35521e2310 Fix a typo (the the => the)
llvm-svn: 160621
2012-07-23 08:51:15 +00:00
Evan Cheng e6a3b03ee0 Back out r160101 and instead implement a dag combine to recover from instcombine transformation.
llvm-svn: 160387
2012-07-17 18:54:11 +00:00
Evan Cheng 780f9b5f92 Implement r160312 as target indepedenet dag combine.
llvm-svn: 160354
2012-07-17 08:31:11 +00:00
Evan Cheng f579beca6d This is another case where instcombine demanded bits optimization created
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346
2012-07-17 06:53:39 +00:00