Commit Graph

23654 Commits

Author SHA1 Message Date
Arnold Schwaighofer 8070b382ec ARM cost model: Increase cost of some vector selects we do terrible on
By terrible I mean we store/load from the stack.

This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.

LV: Found an estimated cost of 2 for VF 8 For instruction:   icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction:   select i1, i32, i32

The bug that tracks the CodeGen part is PR14868.

radar://13403975

llvm-svn: 177105
2013-03-14 19:17:02 +00:00
Akira Hatanaka 44ebe00158 [mips] Fix filename in comment and delete unnecessary lines of code.
No functionality changes.

llvm-svn: 177104
2013-03-14 19:09:52 +00:00
Jyotsna Verma ec613665c2 Hexagon: Removed asserts regarding alignment and offset.
We are warning the user about the alignment, so we should not assert.

llvm-svn: 177103
2013-03-14 19:08:03 +00:00
Akira Hatanaka 7cc48f45cb Add back lines which were accidentally deleted in CMakeLists.txt.
llvm-svn: 177096
2013-03-14 18:46:46 +00:00
Akira Hatanaka b8835b8213 [mips] Define function MipsSEDAGToDAGISel::selectAddESubE.
No intended functionality changes.

llvm-svn: 177095
2013-03-14 18:39:25 +00:00
Hal Finkel ad92b46505 Add a comment about overlapping PPC frame offsets
I don't think that it is otherwise clear how the overlapping offsets
are processed into distinct spill slots. Comment that this is done
in processFunctionBeforeFrameFinalized.

llvm-svn: 177094
2013-03-14 18:38:31 +00:00
Akira Hatanaka 040d225588 [mips] Rename functions and variables to start with proper case.
llvm-svn: 177092
2013-03-14 18:33:23 +00:00
Akira Hatanaka 29a0da3551 Add header file MipsISelDAGToDAG.h.
llvm-svn: 177090
2013-03-14 18:28:19 +00:00
Akira Hatanaka 30a847876b [mips] Define two subclasses of MipsDAGToDAGISel. Mips16DAGToDAGISel is for
mips16 and MipsSEDAGToDAGISel is for mips32/64. 

No functionality changes.

llvm-svn: 177089
2013-03-14 18:27:31 +00:00
Vincent Lejeune 0a22bc4156 R600: Factorize code handling Const Read Port limitation
llvm-svn: 177078
2013-03-14 15:50:45 +00:00
Craig Topper ba82429826 Fix the name of a variable to match its declaration. Fixes build failure from r177014.
llvm-svn: 177015
2013-03-14 07:47:43 +00:00
Craig Topper 872999737d Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185.
llvm-svn: 177014
2013-03-14 07:40:52 +00:00
Craig Topper a66d81d521 Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior.
llvm-svn: 177011
2013-03-14 07:09:57 +00:00
Michael Liao 20d287044c Fix PR15309
- Fix the typo on type checking

llvm-svn: 177010
2013-03-14 06:57:42 +00:00
Bill Wendling 965bd58902 Reset some of the target options which affect code generation.
This doesn't reset all of the target options within the TargetOptions
object. This is because some of those are ABI-specific and must be determined if
it's okay to change those on the fly.

llvm-svn: 176986
2013-03-13 22:26:59 +00:00
Vincent Lejeune 14c3fd8480 R600: Remove unused Outputs variable
llvm-svn: 176967
2013-03-13 20:13:25 +00:00
Benjamin Kramer dfbcba5ae0 Add one more overload to make VS2008's debug mody happy.
sigh.

llvm-svn: 176946
2013-03-13 13:50:47 +00:00
Akira Hatanaka 96ca182904 [mips] Define two subclasses of MipsTargetLowering. Mips16TargetLowering is for
mips16 and MipsSETargetLowering is for mips32/64. 

No functionality changes.

llvm-svn: 176917
2013-03-13 00:54:29 +00:00
Arnold Schwaighofer 90774f3c8f ARM cost model: Increase the cost for vector casts that use the stack
Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend
currently lowers those using stack accesses.

This was responsible for a significant degradation on
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
where we vectorize one loop to a vector factor of 16. After this patch we select
a vector factor of 4 which will generate reasonable code.

unsigned char cle[32];

void test(short c) {
  unsigned short compte;
  for (compte = 0; compte <= 31; compte++) {
    cle[compte] = cle[compte] ^ c;
  }
}

radar://13220512

llvm-svn: 176898
2013-03-12 21:19:22 +00:00
Hal Finkel 01271c6022 Don't reserve R2 on Darwin/PPC
Now that only the register-scavenger version of the CR spilling code remains,
we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in
any case where the System V ABI uses it (R0 is special architecturally, and so
is reserved under all common ABIs).

A few test cases needed to be updated to reflect the register-allocation changes.

llvm-svn: 176868
2013-03-12 15:18:14 +00:00
Hal Finkel e154c8f23e PPC should always use the register scavenger for CR spilling
This removes the -disable-ppc[32|64]-regscavenger options; the code
that uses the register scavenger has been working well (and has been the default)
for some time, and we don't need options to enable the old (broken) CR spilling code.

llvm-svn: 176865
2013-03-12 14:12:16 +00:00
Akira Hatanaka 0bb60d8972 [mips] Rename function and variable names to start with proper case. Fix typos.
Delete commented-out code.

llvm-svn: 176844
2013-03-12 00:16:36 +00:00
Kevin Enderby f15856ebb4 Fixes disassembler crashes on 2013 Haswell RTM instructions.
rdar://13318048

llvm-svn: 176828
2013-03-11 21:17:13 +00:00
Vincent Lejeune e5ecf10a02 R600: Fix JUMP handling so that MachineInstr verification can occur
This allows R600 Target to use the newly created -verify-misched llc flag

llvm-svn: 176819
2013-03-11 18:15:06 +00:00
NAKAMURA Takumi 756cf8867a R600MachineScheduler.cpp: Fix use cases of dbgs(). Don't include <iostream> here.
llvm-svn: 176797
2013-03-11 08:19:28 +00:00
Nick Lewycky 7b287eea22 Correct this error message, and most importantly make it distinct from the
error above. Based on a patch by Peter Zotov!

llvm-svn: 176794
2013-03-10 22:01:44 +00:00
Jakub Staszak df17ddd56b Cleanup #includes.
llvm-svn: 176787
2013-03-10 13:11:23 +00:00
Lang Hames be3d971143 Don't glue users to extract_subreg when selecting the llvm.arm.ldrexd
intrinsic - it can cause impossible-to-schedule subgraphs to be
introduced.

PR15053.

llvm-svn: 176777
2013-03-09 22:56:09 +00:00
Benjamin Kramer 160f72dc8e TLI: Microoptimize calls to strlen+memcmp to strncmp.
The strlen+memcmp was hidden in a call to StringRef::operator==. We check if
there are any null bytes in the string upfront so we can simplify the comparison
Small speedup when compiling code with many function calls.

llvm-svn: 176766
2013-03-09 13:48:23 +00:00
Tom Stellard 5e524897ed R600: Optimize another selectcc case
fold selectcc (selectcc x, y, a, b, cc), b, a, b, setne ->
     selectcc x, y, a, b, cc

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176700
2013-03-08 15:37:11 +00:00
Tom Stellard 2add82de09 R600: Improve custom lowering of select_cc
Two changes:
1. Prefer SET* instructions when possible
2. Handle the CND*_INT case with floating-point args

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176699
2013-03-08 15:37:09 +00:00
Tom Stellard 492ebeabe9 R600: Change operation action from Custom to Expand for BR_CC
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176698
2013-03-08 15:37:07 +00:00
Tom Stellard e8f9f2877b R600: Change operation action from Custom to Expand for SETCC
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176697
2013-03-08 15:37:05 +00:00
Tom Stellard b852af5dc4 R600: Set BooleanContents to ZeroOrNegativeOneBooleanContent
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176696
2013-03-08 15:37:03 +00:00
Tom Stellard b1588fc057 DAGCombiner: Use correct value type for checking legality of BR_CC v3
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694
2013-03-08 15:36:57 +00:00
Jyotsna Verma 7825e064b9 Hexagon: Add patterns for zero extended loads from i1->i64.
llvm-svn: 176689
2013-03-08 14:15:15 +00:00
Tim Northover 95f4892d4c AArch64: expand sincos operations, we don't support them.
Patch based on Mans Rullgard's.

llvm-svn: 176688
2013-03-08 13:55:07 +00:00
Michel Danzer f52a672bf5 R600/SI: Use source scheduler
This is certainly not the last word on scheduling for this target, but
right now this allows a few apps to run / finish with radeonsi, most
notably UT2004 / Lightsmark. They fail to compile some shaders with the
default scheduler because it ends up trying to spill registers, which
we don't support yet (and which is probably a bad idea in general for
performance if it can be avoided).

NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176687
2013-03-08 10:58:01 +00:00
Benjamin Kramer fdf362bd69 ArrayRefize some code. No functionality change.
llvm-svn: 176648
2013-03-07 20:33:29 +00:00
Jyotsna Verma c7dcc2fbc5 Hexagon: Handle i8, i16 and i1 Var Args.
llvm-svn: 176647
2013-03-07 20:28:34 +00:00
Jyotsna Verma 2ba0c0b927 Hexagon: Add support to lower block address.
llvm-svn: 176637
2013-03-07 19:10:28 +00:00
Benjamin Kramer 2c3d0df8ee X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR.
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634
2013-03-07 18:48:40 +00:00
Christian Konig 99ee0f4790 R600/SI: rework input interpolation v2
v2: update CMakeLists.txt as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176626
2013-03-07 09:04:14 +00:00
Christian Konig aa9f4e6d3a R600/SI: remove SI_vs_load_buffer_index
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176625
2013-03-07 09:04:04 +00:00
Christian Konig 189357c6b2 R600/SI: remove SGPR address space v2
v2: fix R600 regressions

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176624
2013-03-07 09:03:59 +00:00
Christian Konig 2c8f6d5376 R600/SI: add proper formal parameter handling for SI
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176623
2013-03-07 09:03:52 +00:00
Christian Konig 3625055b8c R600/SI: remove shader type intrinsic
Just encode the type as target specific attribute.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176622
2013-03-07 09:03:46 +00:00
Christian Konig 2214f14ab9 R600/SI: switch types of SGPRs to v*i8
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176621
2013-03-07 09:03:38 +00:00
Christian Konig a0ed657293 R600/SI: fix unused variable warning
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176620
2013-03-07 09:03:30 +00:00
Michael Liao d5cac37dc5 Fix two remaining issue after fixing PR15355 when CMOV is not available
- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598
2013-03-07 01:01:29 +00:00
Akira Hatanaka 0f693a8a77 [mips] Custom-legalize BR_JT.
In N64-static, GOT address is needed to compute the branch address.

llvm-svn: 176580
2013-03-06 21:32:03 +00:00
Michael Liao da22b30be5 Fix PR15355
- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538
2013-03-06 00:17:04 +00:00
Akira Hatanaka 1454ed8ad3 [mips] Remove android calling convention.
This calling convention was added just to handle functions which return vector
of floats. The fix committed in r165585 solves the problem.

llvm-svn: 176530
2013-03-05 23:22:30 +00:00
Akira Hatanaka e092f72956 [mips] Fix MipsCC::analyzeReturn so that, in soft-float mode, fp128 gets
returned in registers $2 and $4.

llvm-svn: 176527
2013-03-05 22:54:59 +00:00
Akira Hatanaka 5f3ba9e595 [mips] Fix MipsTargetLowering::LowerCallResult and LowerReturn to correctly
handle fp128 returns.

llvm-svn: 176523
2013-03-05 22:41:55 +00:00
Akira Hatanaka 3b7391d140 [mips] Fix MipsTargetLowering::LowerCall to pass fp128 arguments in floating
point registers.

llvm-svn: 176521
2013-03-05 22:20:28 +00:00
Akira Hatanaka 4b634fa3b3 [mips] Correct handling of fp128 (long double) formals and read long double
parameters from floating point registers if target is mips64 hard float.

llvm-svn: 176520
2013-03-05 22:13:04 +00:00
Meador Inge b904e6e467 Add more functions to the TLI.
This patch adds many more functions to the target library information.
All of the functions being added were discovered while doing the migration
of the simplify-libcalls attribute annotation functionality to the
functionattrs pass.  As a part of that work the attribute annotation logic
will query TLI to determine if a function should be annotated or not.

Signed-off-by: Meador Inge <meadori@codesourcery.com>
llvm-svn: 176514
2013-03-05 21:47:40 +00:00
Jyotsna Verma 457801f7ab reverting patch 176508.
llvm-svn: 176513
2013-03-05 20:29:23 +00:00
Jyotsna Verma 7179e712dd Hexagon: Add support for lowering block address.
llvm-svn: 176508
2013-03-05 19:37:46 +00:00
Vincent Lejeune fe32bd87c2 R600: Do not predicate vector op
llvm-svn: 176507
2013-03-05 19:12:06 +00:00
Jyotsna Verma 0eeea14e3e Hexagon: Expand addc, adde, subc and sube.
llvm-svn: 176505
2013-03-05 19:04:47 +00:00
Benjamin Kramer 5dc831801a Update cmake build.
llvm-svn: 176501
2013-03-05 18:54:05 +00:00
Jyotsna Verma f1214a8ab7 Hexagon: Use MO operand flags to mark constant extended instructions.
llvm-svn: 176500
2013-03-05 18:51:42 +00:00
Jyotsna Verma f4e324f4fb Hexagon: Add encoding bits to the TFR64 instructions.
Set imMoveImm, isAsCheapAsAMove flags for TFRI instructions.

llvm-svn: 176499
2013-03-05 18:42:28 +00:00
Vincent Lejeune 68b6b6ddfb R600: initial scheduler code
This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
it only tries to expose more parallelism for ALU instructions (this also
makes the distribution of GPR channels more uniform and increases the
chances of ALU instructions to be packed together in a single VLIW group).
Also it tries to reduce clause switching by grouping instruction of the
same kind (ALU/FETCH/CF) together.

Vincent Lejeune:
 - Support for VLIW4 Slot assignement
 - Recomputation of ScheduleDAG to get more parallelism opportunities

Tom Stellard:
 - Fix assertion failure when trying to determine an instruction's slot
   based on its destination register's class
 - Fix some compiler warnings

Vincent Lejeune: [v2]
 - Remove recomputation of ScheduleDAG (will be provided in a later patch)
 - Improve estimation of an ALU clause size so that heuristic does not emit cf
 instructions at the wrong position.
 - Make schedule heuristic smarter using SUnit Depth
 - Take constant read limitations into account

Vincent Lejeune: [v3]
 - Fix some uninitialized values in ConstPair
 - Add asserts to ensure an ALU slot is always populated

llvm-svn: 176498
2013-03-05 18:41:32 +00:00
Vincent Lejeune 0b72f1021d R600: Remove LowerConstCopyPass and lower CONST_COPY right after ISel.
Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
and taking them in account for scheduling is difficult for no real benefit.

llvm-svn: 176488
2013-03-05 15:04:55 +00:00
Vincent Lejeune 3b6f20e944 R600: Turn BUILD_VECTOR into Reg_Sequence
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176487
2013-03-05 15:04:49 +00:00
Vincent Lejeune 10a5e4773e R600: CONST_ADDRESS node is not marked as mayLoad anymore
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>

mayLoad complexify scheduling and does not bring any usefull info
as the location is not writeable at all.

llvm-svn: 176486
2013-03-05 15:04:42 +00:00
Vincent Lejeune a199d01e4d R600: Use MUL_IEEE for trig/fdiv intrinsic
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176485
2013-03-05 15:04:37 +00:00
Vincent Lejeune 743dca0446 R600: Add support for indirect addressing of non default const buffer
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 176484
2013-03-05 15:04:29 +00:00
David Sehr 4c8979cd4d The current X86 NOP padding uses one long NOP followed by the remainder in
one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464
2013-03-05 00:02:23 +00:00
Akira Hatanaka c7828356aa [mips] Print move instructions.
"move $4, $5" is printed instead of "or $4, $5, $zero".

llvm-svn: 176455
2013-03-04 22:25:01 +00:00
Jack Carter 0e149b04f6 Mips specific inline assembler constraint 'R'
'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.

llvm-svn: 176452
2013-03-04 21:33:15 +00:00
Preston Gurd 485296d1e8 Bypass Slow Divides
* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442
2013-03-04 18:13:57 +00:00
Tom Stellard b2f2f960ce R600: Clean up datalayout strings so they better match hardware capabilities
llvm-svn: 176439
2013-03-04 17:40:28 +00:00
Jia Liu 434874db6f Mips ISD typo
llvm-svn: 176426
2013-03-04 01:06:54 +00:00
Jim Grosbach a3c5c769d6 ARM: Creating a vector from a lane of another.
The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.

rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963

llvm-svn: 176413
2013-03-02 20:16:24 +00:00
Jim Grosbach c6f1914ef0 Clean up code format a bit.
llvm-svn: 176412
2013-03-02 20:16:19 +00:00
Jim Grosbach 54efea0a7a Tidy up. Trailing whitespace.
llvm-svn: 176411
2013-03-02 20:16:15 +00:00
Arnold Schwaighofer 99cba9697a ARM NEON: Fix v2f32 float intrinsics
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
2013-03-02 19:38:33 +00:00
Arnold Schwaighofer 20ef54f4c1 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Andrew Trick 63474629e8 Added FIXME for future Hexagon cleanup.
llvm-svn: 176400
2013-03-02 01:43:08 +00:00
Akira Hatanaka ece459bb66 [mips] Fix inefficient code generation.
This patch eliminates the need to emit a constant move instruction when this
pattern is matched:

(select (setgt a, Constant), T, F)

The pattern above effectively turns into this:

(conditional-move (setlt a, Constant + 1), F, T)

llvm-svn: 176384
2013-03-01 21:52:08 +00:00
Akira Hatanaka a4c0341514 Fix indentation.
llvm-svn: 176380
2013-03-01 21:22:21 +00:00
Michael Liao 6af16fc3b7 Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Chad Rosier 9660343b42 Add support for using non-pic code for arm and thumb1 when emitting the sjlj
dispatch code.  As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352

llvm-svn: 176363
2013-03-01 18:30:38 +00:00
Jyotsna Verma 8425643728 Hexagon: Add constant extender support framework.
llvm-svn: 176358
2013-03-01 17:37:13 +00:00
Christian Konig d0e3da1818 R600/SI: handle all registers in copyPhysReg v2
v2: based on Michels patch, but now allows copying of all registers sizes.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176346
2013-03-01 09:46:27 +00:00
Christian Konig 1f344cda53 R600/SI: remove S_MOV immediate patterns
They won't match anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176345
2013-03-01 09:46:22 +00:00
Christian Konig 8465296420 R600/SI: remove GPR*AlignEncode
It's much easier to specify the encoding with tablegen directly.

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176344
2013-03-01 09:46:17 +00:00
Christian Konig 01fd1f6b36 R600/SI: fix warning about overloaded virtual
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176343
2013-03-01 09:46:11 +00:00
Christian Konig 862fd9fa2c R600/SI: fix inserting waits for unordered defines
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176342
2013-03-01 09:46:04 +00:00
Duncan Sands 2cb41d372c GCC thinks that this variable might be used uninitialized (it isn't).
llvm-svn: 176341
2013-03-01 09:46:03 +00:00
Akira Hatanaka e9e588dd72 [mips] Remove unused option. Fix 80-column violations.
llvm-svn: 176330
2013-03-01 02:17:02 +00:00
Akira Hatanaka 8f7bfb39be [mips] Add the capability to search delay slot filling instructions in
successor basic blocks.

Currently this is off by default.

llvm-svn: 176329
2013-03-01 02:03:51 +00:00
Akira Hatanaka 28dc83ceb3 [mips] Do not add SecondLastInst to list BranchInstrs if there is only one
terminator.

No functionality change.

llvm-svn: 176326
2013-03-01 01:22:26 +00:00
Akira Hatanaka 7320b2364d [mips] Define an overloaded version of function MipsInstrInfo::AnalyzeBranchAdd.
This function will be used later when the capability to search delay slot
filling instructions in successor blocks is added. No intended functionality
changes.

llvm-svn: 176325
2013-03-01 01:10:17 +00:00
Akira Hatanaka e44e30ca5a [mips] Add options to disable searching backward and in successor blocks.
llvm-svn: 176321
2013-03-01 01:02:36 +00:00
Akira Hatanaka e01ff9dc60 [mips] Add capability to search in the forward direction for instructions that
can fill the delay slot.

Currently, this is off by default.

llvm-svn: 176320
2013-03-01 00:50:52 +00:00
Akira Hatanaka f815db5bcb [mips] Define helper function searchRange
No functionality change.

llvm-svn: 176318
2013-03-01 00:26:14 +00:00
Akira Hatanaka 50e174d95d [mips] Rename function findDelayInstr to searchBackward.
llvm-svn: 176317
2013-03-01 00:20:16 +00:00
Akira Hatanaka eb33ced08f [mips] Define class MemDefsUses.
This class tracks dependence between memory instructions using underlying
objects of memory operands. 

llvm-svn: 176313
2013-03-01 00:16:31 +00:00
Chad Rosier 537ff50b5d Tidy up; no functional change.
llvm-svn: 176288
2013-02-28 19:16:42 +00:00
Chad Rosier 11a9828745 Style; no functional change.
llvm-svn: 176285
2013-02-28 18:54:27 +00:00
Yiannis Tsiouris d4842e5ee9 Re-format comments (and check commit access)
llvm-svn: 176270
2013-02-28 16:59:10 +00:00
Tim Northover ce17020c97 AArch64: remove post-encoder method from FCMP (immediate) instructions.
The work done by the post-encoder (setting architecturally unused bits to 0 as
required) can be done by the existing operand that covers the "#0.0". This
removes at least one use of the discouraged PostEncoderMethod uses.

llvm-svn: 176261
2013-02-28 14:46:14 +00:00
Tim Northover c3c5c0971d AArch64: be more careful resorting to inefficient addressing for weak vars.
If an otherwise weak var is actually defined in this unit, it can't be
undefined at runtime so we can use normal global variable sequences (ADRP/ADD)
to access it.

llvm-svn: 176259
2013-02-28 14:36:31 +00:00
Tim Northover b9d4fd210b AArch64: don't drop GlobalAddress offset when handling extern_weak decls.
llvm-svn: 176258
2013-02-28 14:36:24 +00:00
Tim Northover 9fafdf6d5a AArch64: Use cbnz instead of cmp/b.ne pair for atomic operations.
llvm-svn: 176253
2013-02-28 13:52:07 +00:00
Jim Grosbach 5f21587648 ARM: FMA is legal only if VFP4 is available.
rdar://13306723

llvm-svn: 176212
2013-02-27 21:31:12 +00:00
Chad Rosier d3e47ca423 Remove this instance of dl as it's defined in a previous scope.
llvm-svn: 176208
2013-02-27 20:34:14 +00:00
Tim Northover 29931ab21d ARM: permit full range of valid ADR immediates.
This fixes an issue where trying to assemlbe valid ADR instructions would cause
LLVM to hit a failed assertion.

Patch by Keith Walker.

llvm-svn: 176189
2013-02-27 16:43:09 +00:00
Nadav Rotem 08ab877cc7 Revert r176166 because it broke one of the lit tests.
llvm-svn: 176171
2013-02-27 05:56:20 +00:00
Nadav Rotem 85e1211fbf std::string to StringRef.
llvm-svn: 176166
2013-02-27 05:23:56 +00:00
Reed Kotler 5bf8020d83 Fix cut/paste error in a comment.
llvm-svn: 176165
2013-02-27 04:20:14 +00:00
Reed Kotler bb3094aa1e Add the skeleton for the Mips constant island pass.
It will only be used for Mips 16 at this time.
 

llvm-svn: 176161
2013-02-27 03:33:58 +00:00
Bill Schmidt 8ea7af8e44 Fix PR15332 (patch by Florian Zeitz).
There's no need to generate a stack frame for PPC32 SVR4 when there are
no local variables assigned to the stack, i.e., when no red zone is needed.
(PPC64 supports a red zone, but PPC32 does not.)

llvm-svn: 176124
2013-02-26 21:28:57 +00:00
Christian Konig e500e445c5 R600/SI: Add promotion of e32 to e64 in operand folding
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176105
2013-02-26 17:52:47 +00:00
Christian Konig f741fbfb1b R600/SI: add VOP mapping functions
Make it possible to map between e32 and e64 encoding opcodes.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176104
2013-02-26 17:52:42 +00:00
Christian Konig 6612ac39c9 R600/SI: swap operands if it helps folding
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176103
2013-02-26 17:52:36 +00:00
Christian Konig 76edd4f2bc R600/SI: add some more instruction flags
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176102
2013-02-26 17:52:29 +00:00
Christian Konig f82901af2a R600/SI: add post ISel folding for SI v2
Include immediate folding and SGPR limit handling for VOP3 instructions.

v2: remove leftover hasExtraSrcRegAllocReq

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176101
2013-02-26 17:52:23 +00:00
Christian Konig d910b7d534 R600/SI: add folding helper
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176100
2013-02-26 17:52:16 +00:00
Christian Konig d303996918 R600/SI: fix VOP3b encoding v2
v2: document why we hardcode VCC for now.

This is a candidate for the mesa-stable branch.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176099
2013-02-26 17:52:09 +00:00
Christian Konig 0f0a8fe2dd R600/SI: fix and cleanup SI register definition v2
Prevent producing real strange tablegen code by using
proper register sizes, alignments and hierarchy.

Also cleanup the unused definitions and add some comments.

v2: add SGPR 512 bit registers, stop registers from wrapping around,
    fix SGPR alignment

This is a candidate for the mesa-stable branch.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176098
2013-02-26 17:52:03 +00:00
Christian Konig d76ed54b60 R600/SI: fix stupid typo
This is a candidate for the mesa-stable branch.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176097
2013-02-26 17:51:57 +00:00
Akira Hatanaka 979899e5cc [mips] Use class RegDefsUses to track register defs and uses.
No functionality change.

llvm-svn: 176070
2013-02-26 01:30:05 +00:00
Chad Rosier 1b33e8d63e [fast-isel] Make sure the FastLowerArguments function checks to make sure the
arguments type is a simple type.
rdar://13290455

llvm-svn: 176066
2013-02-26 01:05:31 +00:00
Michael Liao 609a527286 Refine fix to PR10499, no functionality change
- Put expensive checking after simple one

llvm-svn: 176060
2013-02-25 23:16:36 +00:00
Michael Liao ab97668061 Fix PR10499
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
2013-02-25 23:01:03 +00:00
Chad Rosier a92ef4ba5b [fast-isel] Add X86FastIsel::FastLowerArguments to handle functions with 6 or
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.

Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.

llvm-svn: 176052
2013-02-25 21:59:35 +00:00
Chad Rosier 669bb3ee77 [ms-inline asm] Add support for the pushad/popad mnemonics.
rdar://13254235

llvm-svn: 176036
2013-02-25 19:06:27 +00:00
Bill Schmidt b454829981 Fix missing relocation for TLS addressing peephole optimization.
Report and fix due to Kai Nacke.  Testcase update by me.

llvm-svn: 176029
2013-02-25 16:44:35 +00:00
Reed Kotler bd1058a877 Make pseudos FEXT_CCRX16_ins and FEXT_CCRXI16_ins into custom emitters.
llvm-svn: 176007
2013-02-25 02:25:47 +00:00
Reed Kotler 7a86b3dc2b Make psuedo FEXT_T8I816_ins into a custom emitter.
llvm-svn: 176002
2013-02-24 23:17:51 +00:00
Bill Schmidt c68c6df884 Fix PR14364.
This removes a const_cast hack from PPCRegisterInfo::hasReservedSpillSlot().
The proper place to save the frame index for the CR spill slot is in the
PPCFunctionInfo object, not the PPCRegisterInfo object.

No new test cases, as this just reimplements existing function.  Existing
tests such as test/CodeGen/PowerPC/crsave.ll are sufficient.

llvm-svn: 175998
2013-02-24 17:34:50 +00:00
Francois Pichet d1cef3ecd3 Typo
llvm-svn: 175991
2013-02-24 12:34:13 +00:00
Nadav Rotem b532fca92c Revert r169638 because it broke Mesa llvmpipe tests.
Fix PR15239.

llvm-svn: 175985
2013-02-24 07:09:35 +00:00
Reed Kotler e2bead7a2d Make psuedo FEXT_T8I816_ins a custom inserter. It should be expanded
as early as possible; which means during instruction selection.

llvm-svn: 175984
2013-02-24 06:16:39 +00:00
Reed Kotler 80070bd439 Add new base instruction def for cmpi, cmp, slt and sltu so that def/uses
proper. Fixed this already a few days ago for slti.

llvm-svn: 175975
2013-02-23 23:37:03 +00:00
Benjamin Kramer ee23dcb461 X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.

llvm-svn: 175962
2013-02-23 10:40:58 +00:00
Reed Kotler dacee2bb44 Expand pseudos/macros for Selt. This is the last of the complex
macros.The rest is some small misc. stuff.

llvm-svn: 175950
2013-02-23 03:09:56 +00:00
Jim Grosbach 9be2d71512 ARM: Convenience aliases for 'srs*' instructions.
Handle an implied 'sp' operand.

rdar://11466783

llvm-svn: 175940
2013-02-23 00:52:09 +00:00
Akira Hatanaka 02b0e48f6a [mips] Emit call16 operator instead of got_disp. The former allows lazy binding.
llvm-svn: 175920
2013-02-22 21:10:03 +00:00
Peter Collingbourne 7b57621fb3 x86_64: designate most general purpose and SSE registers as callee save under coldcc
llvm-svn: 175911
2013-02-22 19:19:44 +00:00
Michel Danzer 0cc991e17b R600/SI: Add pattern for sign extension of i1 to i32.
16 more little piglits with radeonsi.

NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175887
2013-02-22 11:22:58 +00:00
Michel Danzer 00fb283560 R600/SI: Add pattern for logical or of i1 values.
24 more little piglits with radeonsi.

NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175886
2013-02-22 11:22:54 +00:00
Michel Danzer c3ea4041b9 R600/SI: Add pattern for fceil.
9 more little piglits with radeonsi.

NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175885
2013-02-22 11:22:49 +00:00
Kristof Beyls 0ba797e8f7 Make ARMAsmPrinter generate the correct alignment specifier syntax in instructions.
The Printer will now print instructions with the correct alignment specifier syntax, like
    vld1.8  {d16}, [r0:64]

llvm-svn: 175884
2013-02-22 10:01:33 +00:00