Commit Graph

16968 Commits

Author SHA1 Message Date
Benjamin Kramer 8bcc971174 Make MemoryBuiltins aware of TargetLibraryInfo.
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.

Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.

Fixes PR13694 and probably others.

llvm-svn: 162841
2012-08-29 15:32:21 +00:00
Jush Lu e87e559e62 [arm-fast-isel] Add support for ARM PIC.
llvm-svn: 162823
2012-08-29 02:41:21 +00:00
NAKAMURA Takumi e9205613f6 Create llvm/test/Object/Mips/lit.local.cfg to check Mips in targets_to_build.
llvm-svn: 162819
2012-08-29 01:37:57 +00:00
NAKAMURA Takumi e1f9fc11d4 llvm/test: [CMake] Add profile_rt-shared to deps.
llvm-svn: 162813
2012-08-29 00:37:56 +00:00
NAKAMURA Takumi 87abb0ee34 llvm/test/Analysis/Profiling: Mark 3 of them as REQUIRES: loadable_module.
FIXME: profile_rt.dll could be built on win32.
llvm-svn: 162811
2012-08-29 00:37:46 +00:00
Jack Carter fc93516642 Moved input for objdump test from Mips to Inputs.
llvm-svn: 162808
2012-08-29 00:10:48 +00:00
Manman Ren abbb01abea Profile: set branch weight metadata with data generated from profiling.
This patch implements ProfileDataLoader which loads profile data generated by
-insert-edge-profiling and updates branch weight metadata accordingly.

Patch by Alastair Murray.

llvm-svn: 162799
2012-08-28 22:21:25 +00:00
Jack Carter cd6b0e1368 The instruction DEXT may be transformed into DEXTU or DEXTM depending
on the size of the extraction and its position in the 64 bit word.

This patch allows support of the dext transformations with mips64 direct
object output.

0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword

32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword

32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword

llvm-svn: 162782
2012-08-28 20:07:41 +00:00
Jack Carter 551efd7fd9 Some of the instructions in the Mips instruction set are revision
delimited. llvm-mc -disassemble access these through the -mattr
option.

llvm-objdump -disassemble had no such way to set the attribute so
some instructions were just not recognized for disassembly.

This patch accepts llvm-mc mechanism for specifying the attributes.

llvm-svn: 162781
2012-08-28 19:24:49 +00:00
Jack Carter c20a21b855 Some instructions are passed to the assembler to be
transformed to the final instruction variant. An
example would be dsrll which is transformed into 
dsll32 if the shift value is greater than 32.

For direct object output we need to do this transformation
in the codegen. If the instruction was inside branch
delay slot, it was being missed. This patch corrects this
oversight.

llvm-svn: 162779
2012-08-28 19:07:39 +00:00
Roman Divacky 8c4b6a307e Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778
2012-08-28 19:06:55 +00:00
Hal Finkel 742b535e40 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764
2012-08-28 16:12:39 +00:00
Benjamin Kramer 9c0a807c27 InstCombine: Guard the transform introduced in r162743 against large ints and non-const shifts.
llvm-svn: 162751
2012-08-28 13:08:13 +00:00
Nadav Rotem d457787fed Make sure that we don't call getZExtValue on values > 64 bits.
Thanks Benjamin for noticing this.

llvm-svn: 162749
2012-08-28 12:23:22 +00:00
Nadav Rotem 11935b29f3 Teach InstCombine to canonicalize [SU]div+[AL]shl patterns.
For example:
  %1 = lshr i32 %x, 2
  %2 = udiv i32 %1, 100

rdar://12182093

llvm-svn: 162743
2012-08-28 10:01:43 +00:00
Bill Wendling cc56718038 The commutative flag is already correctly set within the multiclass. If we set
it here, then a 'register-memory' version would wrongly get the commutative
flag.
<rdar://problem/12180135>

llvm-svn: 162741
2012-08-28 07:36:46 +00:00
Craig Topper bd509eea4a Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo.
llvm-svn: 162738
2012-08-28 07:05:28 +00:00
NAKAMURA Takumi cdfe1d1cdb llvm/test/CodeGen/X86/pr12312.ll: Add -mtriple=x86_64-unknown-unknown.
llvm-svn: 162736
2012-08-28 04:04:29 +00:00
Michael Liao b7d85b6328 Fix PR12312
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
  Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
  X86ISD::OR node has only its flag result being used as a boolean value and
  all its leaves are extracted from the same vector, it could be folded into an
  X86ISD::PTEST node.

llvm-svn: 162735
2012-08-28 03:34:40 +00:00
Jakob Stoklund Olesen 87cb471e52 Remove extra MayLoad/MayStore flags from atomic_load/store.
These extra flags are not required to properly order the atomic
load/store instructions. SelectionDAGBuilder chains atomics as if they
were volatile, and SelectionDAG::getAtomic() sets the isVolatile bit on
the memory operands of all atomic operations.

The volatile bit is enough to order atomic loads and stores during and
after SelectionDAG.

This means we set mayLoad on atomic_load, mayStore on atomic_store, and
mayLoad+mayStore on the remaining atomic read-modify-write operations.

llvm-svn: 162733
2012-08-28 03:11:32 +00:00
Akira Hatanaka b5af7121b1 Fix mips' long branch pass.
Instructions emitted to compute branch offsets now use immediate operands
instead of symbolic labels. This change was needed because there were problems
when R_MIPS_HI16/LO16 relocations were used to make shared objects.

llvm-svn: 162731
2012-08-28 03:03:05 +00:00
Akira Hatanaka adb14f56c7 Fix bug 13532.
In SelectionDAGLegalize::ExpandLegalINT_TO_FP, expand INT_TO_FP nodes without
using any f64 operations if f64 is not a legal type.

Patch by Stefan Kristiansson. 

llvm-svn: 162728
2012-08-28 02:12:42 +00:00
Hal Finkel 686f2ee226 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Hal Finkel 5ab378037f Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725
2012-08-28 02:10:27 +00:00
Hal Finkel e39526a789 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724
2012-08-28 02:10:15 +00:00
Bill Wendling 988a47d7e5 Make sure we add the predicate after all of the registers are added.
<rdar://problem/12183003>

llvm-svn: 162703
2012-08-27 22:12:44 +00:00
NAKAMURA Takumi fee50c8cf1 llvm/test/CodeGen/X86/fma.ll: Add -march=x86, or two tests would fail on non-x86 hosts.
llvm-svn: 162667
2012-08-27 11:50:26 +00:00
NAKAMURA Takumi 10eb4cfc3e llvm/test/CodeGen/X86/fma_patterns.ll: Add -mtriple=x86_64. It was incompatible on i686 and Windows x64.
llvm-svn: 162664
2012-08-27 09:37:54 +00:00
Craig Topper bfc1d0ed48 Commit test change for r162658.
llvm-svn: 162660
2012-08-27 07:55:50 +00:00
Alexey Samsonov 034e57a297 Add basic support for .debug_ranges section to LLVM's DebugInfo library.
This section (introduced in DWARF-3) is used to define instruction address
ranges for functions that are not contiguous and can't be described
by low_pc/high_pc attributes (this is the usual case for inlined subroutines).
The patch is the first step to support fetching complete inlining info from DWARF.

Reviewed by Benjamin Kramer.

llvm-svn: 162657
2012-08-27 07:17:47 +00:00
Anitha Boyapati 0dd589c5f1 FMA3 tests on bdver2 target for changes made in rev 162012. Also made
corresponding changes to existing tests for darwin triple to ensure that
same pattern is tested for bdver2 target.

llvm-svn: 162655
2012-08-27 06:59:01 +00:00
Craig Topper 9e4f0aae17 Make sure that FMA3 is favored even when FMA4 is also enabled. Test case for r162454.
llvm-svn: 162653
2012-08-27 03:38:15 +00:00
Jakob Stoklund Olesen c2272df1be Infer instruction properties from single-instruction patterns.
Previously, instructions without a primary patterns wouldn't get their
properties inferred. Now, we use all single-instruction patterns for
inference, including 'def : Pat<>' instances.

This causes a lot of instruction flags to change.

- Many instructions no longer have the UnmodeledSideEffects flag because
  their flags are now inferred from a pattern.

- Instructions with intrinsics will get a mayStore flag if they already
  have UnmodeledSideEffects and a mayLoad flag if they already have
  mayStore. This is because intrinsics properties are linear.

- Instructions with atomic_load patterns get a mayStore flag because
  atomic loads can't be reordered. The correct workaround is to create
  pseudo-instructions instead of using normal loads. PR13693.

llvm-svn: 162614
2012-08-24 22:46:53 +00:00
Akira Hatanaka 4a08a4a8b6 Disable Mips' delay slot filler when optimization level is O0.
llvm-svn: 162589
2012-08-24 20:40:15 +00:00
Akira Hatanaka e8e4ef102d In MipsDAGToDAGISel::SelectAddr, fold add node into address operand, if its
second operand is MipsISD::GPRel.

llvm-svn: 162584
2012-08-24 20:21:49 +00:00
Manman Ren cf10446ffa BranchProb: modify the definition of an edge in BranchProbabilityInfo to handle
the case of multiple edges from one block to another.

A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.

Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.

llvm-svn: 162572
2012-08-24 18:14:27 +00:00
Roman Divacky ace4707ea6 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Eric Christopher bb69a27dbc Use DW_FORM_flag_present to save space in debug information if we're
not in darwin gdb compat mode.

Fixes rdar://10975088

llvm-svn: 162526
2012-08-24 01:14:27 +00:00
Eric Christopher acb7115bde Remove the DW_AT_MIPS_linkage name attribute when we don't need it
output (we're emitting a specification already and the information
isn't changing) and we're not in old gdb compat mode.

Saves 1% on the debug information for a build of llvm.

Fixes rdar://11043421

llvm-svn: 162493
2012-08-23 22:52:55 +00:00
Eric Christopher 20b76a77c3 Turn these two options in to trinary state so that they can be
turned on and off separate from the platform if you're on darwin.

llvm-svn: 162487
2012-08-23 22:36:40 +00:00
Eric Christopher 35ceacbcc2 Make this darwin specific to try to silence the bots.
llvm-svn: 162435
2012-08-23 07:18:46 +00:00
Eric Christopher 7782618271 Emit pubtypes only when going for darwin gdb compatibility.
rdar://10393214

llvm-svn: 162434
2012-08-23 07:10:56 +00:00
Eric Christopher 6cb5594979 Filecheck-ize.
llvm-svn: 162433
2012-08-23 07:10:51 +00:00
Benjamin Kramer e07728b936 SimplifyLibCalls: Give all safely-shrinkable libcalls the same treatment.
llvm-svn: 162383
2012-08-22 19:39:15 +00:00
Chad Rosier 1df1fb511f Whitespace.
llvm-svn: 162370
2012-08-22 17:34:11 +00:00
Chad Rosier 671dc096ae Add test case for r162368.
llvm-svn: 162369
2012-08-22 17:31:04 +00:00
Stepan Dyatkovskiy 99120e04be Rejected 169195. As Duncan commented, bitcasting to proper type is wrong approach. We need to insert some valid TRANCATE node here.
llvm-svn: 162354
2012-08-22 09:33:55 +00:00
Akira Hatanaka ad4950258b Add register Mips::GP to the list of reserved registers if target is bare-metal
to prevent it from being clobbered. mips uses $gp to access small data section.

This bug was originally reported by Carl Norum.

llvm-svn: 162340
2012-08-22 03:18:13 +00:00
Akira Hatanaka 9d957842e1 Add option disable-mips-delay-filler. Turn on mips' delay slot filler by
default.

Patch by Carl Norum.

llvm-svn: 162339
2012-08-22 02:51:28 +00:00
Jack Carter 77064c0590 For mips64 switch statements in subroutines could generate
within the codegen EK_GPRel64BlockAddress. This was not 
supported for direct object output and resulted in an assertion.

This change adds support for EK_GPRel64BlockAddress for 
direct object.

One fallout from this is to turn on rela relocations 
for mips64 to match gas.

llvm-svn: 162334
2012-08-22 00:49:30 +00:00
Rafael Espindola 2c06448360 Fix macros arguments with an underscore, dot or dollar in them. This is based
on a patch by Andy/PaX. I added the support for dot and dollar.

llvm-svn: 162298
2012-08-21 18:29:30 +00:00
Rafael Espindola af6da83a2c Make the wording in of the "expected identifier" error in the .macro directive
consistent with the other "expected identifier" errors.
Extracted from the Andy/PaX patch. I added the test.

llvm-svn: 162291
2012-08-21 17:12:05 +00:00
Tim Northover f39c1a3f72 Add correct set of regression tests for r162094 commit.
llvm-svn: 162276
2012-08-21 12:43:03 +00:00
Chandler Carruth c908ca1766 Port the global copy optimization from the SROA pass to InstCombine.
This optimization is really just replacing allocas wholesale with
globals, there is no scalarization.

The underlying motivation for this patch is to simplify the SROA pass
and focus it on splitting and promoting allocas.

llvm-svn: 162271
2012-08-21 08:39:44 +00:00
Kostya Serebryany f4be019fba [asan] add code to detect global initialization fiasco in C/C++. The sub-pass is off by default for now. Patch by Reid Watson. Note: this patch changes the interface between LLVM and compiler-rt parts of asan. The corresponding patch to compiler-rt will follow.
llvm-svn: 162268
2012-08-21 08:24:25 +00:00
Jakob Stoklund Olesen 74e6f9fc65 Add a missing def flag.
*** Bad machine code: Explicit definition marked as use ***
- function:    test_cos
- basic block: BB#0 L.entry (0x7ff2a2024fd0)
- instruction: VSETLNi32 %D11, %D11<undef>, %R0, 0, pred:14, pred:%noreg, %Q5<imp-use,kill>, %Q5<imp-def>
- operand 0:   %D11

llvm-svn: 162247
2012-08-21 00:34:53 +00:00
Jakob Stoklund Olesen 7d33c5739f Don't add CFG edges for redundant conditional branches.
IR that hasn't been through SimplifyCFG can look like this:

  br i1 %b, label %r, label %r

Make sure we don't create duplicate Machine CFG edges in this case.

Fix the machine code verifier to accept conditional branches with a
single CFG edge.

llvm-svn: 162230
2012-08-20 21:39:52 +00:00
Michael Liao 10ff96ce8c fix a case where all operands of BUILD_VECTOR are undefined
llvm-svn: 162214
2012-08-20 17:59:18 +00:00
Stepan Dyatkovskiy 6ee89aafc8 Forget to add testcase for r162195. Sorry.
llvm-svn: 162196
2012-08-20 08:03:18 +00:00
Nadav Rotem 178250ad87 When unsafe math is used, we can use commutative FMAX and FMIN. In some cases
this allows for better code generation.

Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.

For example:

  movaps  %xmm0, %xmm1
  movsd LC(%rip), %xmm0
  minsd %xmm1, %xmm0

becomes:

  minsd LC(%rip), %xmm0

llvm-svn: 162187
2012-08-19 13:06:16 +00:00
Benjamin Kramer 9d03242fcf InstCombine: Fix a crasher when encountering a function pointer.
llvm-svn: 162180
2012-08-18 22:04:34 +00:00
Jakob Stoklund Olesen dded061f85 Also combine zext/sext into selects for ARM.
This turns common i1 patterns into predicated instructions:

  (add (zext cc), x) -> (select cc (add x, 1), x)
  (add (sext cc), x) -> (select cc (add x, -1), x)

For a function like:

  unsigned f(unsigned s, int x) {
    return s + (x>0);
  }

We now produce:

  cmp r1, #0
  it  gt
  addgt.w r0, r0, #1

Instead of:

  movs  r2, #0
  cmp r1, #0
  it  gt
  movgt r2, #1
  add r0, r2

llvm-svn: 162177
2012-08-18 21:25:22 +00:00
Jakob Stoklund Olesen aab43dbfbb Also pass logical ops to combineSelectAndUse.
Add these transformations to the existing add/sub ones:

  (and (select cc, -1, c), x) -> (select cc, x, (and, x, c))
  (or  (select cc, 0, c), x)  -> (select cc, x, (or, x, c))
  (xor (select cc, 0, c), x)  -> (select cc, x, (xor, x, c))

The selects can then be transformed to a single predicated instruction
by peephole.

This transformation will make it possible to eliminate the ISD::CAND,
COR, and CXOR custom DAG nodes.

llvm-svn: 162176
2012-08-18 21:25:16 +00:00
Benjamin Kramer 8c2a733c55 InstCombine: Add a couple of fabs identities for comparing with 0.0.
llvm-svn: 162174
2012-08-18 20:06:47 +00:00
Benjamin Kramer 000132454c SimplifyLibcalls: Add fabs and trunc to the list of libcalls that are safe to shrink from double to float.
llvm-svn: 162173
2012-08-18 19:27:32 +00:00
Nadav Rotem a136939fa9 Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code.
llvm-svn: 162172
2012-08-18 17:53:03 +00:00
Nadav Rotem c324af609e Revert r162160 because it made a few buildbots fail.
llvm-svn: 162164
2012-08-18 05:02:36 +00:00
Nadav Rotem 2cb14a5c4b The X86 backend has a number of optimizations for SETCC nodes which use
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.

Before:
  xorl  %esi, %edi
  testb %dil, %dil
  setne %al
  ret

After:
  xorb  %dil, %sil
  setne %al
  ret

rdar://12081007

llvm-svn: 162160
2012-08-18 02:43:28 +00:00
Eli Friedman 79a6b30d8a Make atomic load and store of pointers work. Tighten verification of atomic operations
so other unexpected operations don't slip through.  Based on patch by Logan Chien.
PR11786/PR13186.

llvm-svn: 162146
2012-08-17 23:24:29 +00:00
Jakob Stoklund Olesen 7b1a2e8f02 Avoid folding ADD instructions with FI operands.
PEI can't handle the pseudo-instructions. This can be removed when the
pseudo-instructions are replaced by normal predicated instructions.

Fixes PR13628.

llvm-svn: 162130
2012-08-17 20:55:34 +00:00
Benjamin Kramer 34764fe2e4 MemoryBuiltins: Properly guard ObjectSizeOffsetVisitor against cycles in the IR.
The previous fix only checked for simple cycles, use a set to catch longer
cycles too.

Drop the broken check from the ObjectSizeOffsetEvaluator. The BoundsChecking
pass doesn't have to deal with invalid IR like InstCombine does.

llvm-svn: 162120
2012-08-17 19:26:41 +00:00
Bill Wendling 34bc34ecae Change the `linker_private_weak_def_auto' linkage to `linkonce_odr_auto_hide' to
make it more consistent with its intended semantics.

The `linker_private_weak_def_auto' linkage type was meant to automatically hide
globals which never had their addresses taken. It has nothing to do with the
`linker_private' linkage type, which outputs the symbols with a `l' (ell) prefix
among other things.

The intended semantic is more like the `linkonce_odr' linkage type.

Change the name of the linkage type to `linkonce_odr_auto_hide'. And therefore
changing the semantics so that it produces the correct output for the linker.

Note: The old linkage name `linker_private_weak_def_auto' will still parse but
is not a synonym for `linkonce_odr_auto_hide'. This should be removed in 4.0.
<rdar://problem/11754934>

llvm-svn: 162114
2012-08-17 18:33:14 +00:00
Rafael Espindola 9a16735e22 Assert that dominates is not given a multiple edge. Finding out if we have
multiple edges between two blocks is linear. If the caller is iterating all
edges leaving a BB that would be a square time algorithm. It is more efficient
to have the callers handle that case.

Currently the only callers are:
* GVN: already avoids the multiple edge case.
* Verifier: could only hit this assert when looking at an invalid invoke. Since
it already rejects the invoke, just avoid computing the dominance for it.

llvm-svn: 162113
2012-08-17 18:21:28 +00:00
Benjamin Kramer ca7ca4f6c6 TargetLowering: Use the large shift amount during legalize types. The legalizer may call us with an overly large type.
llvm-svn: 162101
2012-08-17 15:54:21 +00:00
Benjamin Kramer 4901f0d2a2 Guard MemoryBuiltins against self-looping GEPs, which can occur in unreachable code due to constant propagation.
Fixes PR13621.

llvm-svn: 162098
2012-08-17 14:16:37 +00:00
Benjamin Kramer 2f47a3fb07 Fix broken check lines.
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.

llvm-svn: 162097
2012-08-17 12:28:26 +00:00
Tim Northover f66181530f Implement NEON domain switching for scalar <-> S-register vmovs on ARM
llvm-svn: 162094
2012-08-17 11:32:52 +00:00
Jakob Stoklund Olesen 0ea1fce6b4 Add ADD and SUB to the predicable ARM instructions.
It is not my plan to duplicate the entire ARM instruction set with
predicated versions. We need a way of representing predicated
instructions in SSA form without requiring a separate opcode.

Then the pseudo-instructions can go away.

llvm-svn: 162061
2012-08-16 23:21:55 +00:00
Rafael Espindola cc80cdebb9 Teach GVN to reason about edges dominating uses. This allows it to handle cases
where some fact lake a=b dominates a use in a phi, but doesn't dominate the
basic block itself.

This feature could also be implemented by splitting critical edges, but at least
with the current algorithm reasoning about the dominance directly is faster.

The time for running "opt -O2" in the testcase in pr10584 is 1.003 times slower
and on gcc as a single file it is 1.0007 times faster.

llvm-svn: 162023
2012-08-16 15:09:43 +00:00
Jush Lu 26088cb30e [arm-fast-isel] Add support for fastcc.
Without fastcc support, the caller just falls through to CallingConv::C
for fastcc, but callee still uses fastcc, this inconsistency of calling
convention is a problem, and fastcc support can fix it.

llvm-svn: 162013
2012-08-16 05:15:53 +00:00
Akira Hatanaka 269d3fd101 Test case for r162008.
llvm-svn: 162009
2012-08-16 03:48:41 +00:00
Jakob Stoklund Olesen 6cb96120f1 Fold predicable instructions into MOVCC / t2MOVCC.
The ARM select instructions are just predicated moves. If the select is
the only use of an operand, the instruction defining the operand can be
predicated instead, saving one instruction and decreasing register
pressure.

This implementation can turn AND/ORR/EOR instructions into their
corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to
predicate any instruction, but we don't yet support predicated
instructions in SSA form.

llvm-svn: 161994
2012-08-15 22:16:39 +00:00
Bill Wendling 2c8685e327 Rework test so that it reproduces the error without the horrible flag.
llvm-svn: 161989
2012-08-15 21:10:18 +00:00
Bill Wendling d63f1f5a9c Remove invalid test. This test requires that dead basic blocks be kept
around. That's not how we do things. Besides, the commit message tells us that
it is covered by the GCC test suite.

------------------------------------------------------------------------
r127497 | zwarich | 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) | 3 lines

Fix the GCC test suite issue exposed by r127477, which was caused by stack
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
------------------------------------------------------------------------

llvm-svn: 161985
2012-08-15 20:54:09 +00:00
Evan Cheng eec6bc6270 Use vld1/vst1 to load/store f64 if alignment is < 4 and the target allows unaligned access. rdar://12091029
llvm-svn: 161962
2012-08-15 17:44:53 +00:00
Michael Liao 69e172a6f0 fix infinite loop in instcombine with more than 4GB memcpy
- memcpy size is wrongly truncated into 32-bit and treat 8GB memcpy is
  0-sized memcpy
- as 0-sized memcpy/memset is already removed before SimplifyMemTransfer
  and SimplifyMemSet in visitCallInst, replace 0 checking with
  assertions.
- replace getZExtValue() with getLimitedValue() according to
  Eli Friedman

llvm-svn: 161923
2012-08-15 03:49:59 +00:00
Anton Korobeynikov c6d945b11a The names of VFP variants of half-to-float conversion instructions were
reversed. This leads to wrong codegen for float-to-half conversion
intrinsics which are used to support storage-only fp16 type.
NEON variants of same instructions are fine.

llvm-svn: 161907
2012-08-14 23:36:01 +00:00
Michael Liao 34107b9177 fix PR11334
- FP_EXTEND only support extending from vectors with matching elements.
  This results in the scalarization of extending to v2f64 from v2f32,
  which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
  extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.

llvm-svn: 161894
2012-08-14 21:24:47 +00:00
Kostya Serebryany bf479714f9 [asan] insert crash basic blocks inline as opposed to inserting them at the end of the function. This doesn't seem to fix or break anything, but is considered to be more friendly to downstream passes (test change)
llvm-svn: 161871
2012-08-14 14:05:50 +00:00
Craig Topper 2a40418a99 Change greater than to greater than or equal so that an identical sized store to the same offset is treated as completing overwriting.
llvm-svn: 161857
2012-08-14 07:32:05 +00:00
Nadav Rotem 70409991bc During the CodeGenPrepare we often lower intrinsics (such as objsize)
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.

rdar://11973998

llvm-svn: 161852
2012-08-14 05:19:07 +00:00
NAKAMURA Takumi 245920463d llvm/test/CodeGen/ARM/floorf.ll: Add explicit -mtriple=arm-unknown-unknown, or it fails on msvc.
llvm-svn: 161825
2012-08-14 00:56:06 +00:00
Owen Anderson a40319b7f1 Add a roundToIntegral method to APFloat, which can be parameterized over various rounding modes. Use this to implement SelectionDAG constant folding of FFLOOR, FCEIL, and FTRUNC.
llvm-svn: 161807
2012-08-13 23:32:49 +00:00
Nadav Rotem 5d4e205874 MemoryDependenceAnalysis attempts to find the first memory dependency for function calls.
Currently, if GetLocation reports that it did not find a valid pointer (this is the case for volatile load/stores),
we ignore the result. This patch adds code to handle the cases where we did not obtain a valid pointer.

rdar://11872864  PR12899

llvm-svn: 161802
2012-08-13 23:03:43 +00:00
Jim Grosbach b4e1ba7191 ARM: Move Thumb2 tests to Thumb2 test file and fix CHECK lines.
These tests weren't actually being run before (missing ':' after CHECK).

llvm-svn: 161800
2012-08-13 22:25:44 +00:00
Bill Wendling 72baa6eeae Rename test since it's not linux-specific.
llvm-svn: 161792
2012-08-13 21:32:42 +00:00
Jakob Stoklund Olesen 83a927d84a Handle extra Tail predecessors in if-conversion.
It is still possible to if-convert if the tail block has extra
predecessors, but the tail phis must be rewritten instead of being
removed.

llvm-svn: 161781
2012-08-13 20:49:04 +00:00
Arnold Schwaighofer 0bb7f23cfc [Hexagon] Don't mark callee saved registers as clobbered by a tail call
This was causing unnecessary spills/restores of callee saved registers.

Fixes PR13572.

Patch by Pranav Bhandarkar!

llvm-svn: 161778
2012-08-13 19:54:01 +00:00
Manman Ren 9746b33e26 Fix failure on Atom bot due to r161769
llvm-svn: 161777
2012-08-13 19:34:29 +00:00
Nadav Rotem 3a94c545cf Do not optimize (or (and X,Y), Z) into BFI and other sequences if the AND ISDNode has more than one user.
rdar://11876519

llvm-svn: 161775
2012-08-13 18:52:44 +00:00
Manman Ren 959acb106b X86: move Int_CVTSD2SSrr, Int_CVTSI2SSrr, Int_CVTSI2SDrr, Int_CVTSS2SDrr from
OpTbl1 to OpTbl2 since they have 3 operands and the last operand can be changed
to a memory operand.

PR13576

llvm-svn: 161769
2012-08-13 18:29:41 +00:00
Eric Christopher 7d8b53c1f8 Add support for the %H output modifier.
Patch by Weiming Zhao.

llvm-svn: 161768
2012-08-13 18:18:52 +00:00
Tim Northover b4abb84d9c Add test for previous commit correcting NEON load patterns.
llvm-svn: 161750
2012-08-13 10:38:45 +00:00
Nick Lewycky 1dcfe449ef Give this test an explicit triple.
llvm-svn: 161740
2012-08-12 08:21:27 +00:00
Nick Lewycky 333449cd65 When emitting the PC range in an FDE, use the same data encoding for both ends
of the range. Fixes PR13581!

llvm-svn: 161739
2012-08-12 08:09:45 +00:00
Arnold Schwaighofer b73da9453c Revert 161581: Patch to implement UMLAL/SMLAL instructions for the ARM
architecture

It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7
thumb O3.

llvm-svn: 161736
2012-08-12 05:11:56 +00:00
Michael Liao e7e828fd64 fix PR13577, an issue introduced by r161687
- FCMOV only supports a subset of X86 conditions. Skip boolean
  simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.

llvm-svn: 161732
2012-08-11 23:47:06 +00:00
Benjamin Kramer ef6494f24d PR13578: Teach MachineCSE that instructions that use a constant register can be CSE'd safely.
This is common e.g. when doing rip-relative addressing on x86_64.

llvm-svn: 161728
2012-08-11 19:05:13 +00:00
Manman Ren 1acb6707cd X86: when we are auto-detecting the subtarget features, make sure we turn on
FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge.

FeatureFastUAMem is already on if we pass in nehalem or westmere as a command
argument.

rdar: 7252306
llvm-svn: 161717
2012-08-10 23:43:32 +00:00
Eli Friedman 4c923b3b3f The normal edge of an invoke is not allowed to branch to a block with a
landingpad.  Enforce it in the verifier, and fix the regression tests to match.

llvm-svn: 161697
2012-08-10 20:55:20 +00:00
Michael Liao 5248e9913f add X86-specific DAG optimization to simplify boolean test
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
  generated from X86ISD::SETCC, try to simplify the boolean value
  generation and checking by reusing the original EFLAGS with proper
  condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
  consuming EFLAGS

part of patches fixing PR12312

llvm-svn: 161687
2012-08-10 19:58:13 +00:00
Pete Cooper 0deca6be79 Fix crash when when do lto on Bullet. Dynamic GEPs in SROA were incorrectly being applied to all accesses to an alloca, not just the ones which read from the GEP. Thanks to Evan for reducing the test. rdar://11861001
llvm-svn: 161654
2012-08-10 03:26:36 +00:00
Jakob Stoklund Olesen 8c28ac9ec9 Update edge weights correctly in replaceSuccessor().
When replacing Old with New, it can happen that New is already a
successor. Add the old and new edge weights instead of creating a
duplicate edge.

llvm-svn: 161653
2012-08-10 03:23:27 +00:00
Jakob Stoklund Olesen d9b66506a3 Reapply r161633-161634 "Partition use lists so defs always come before uses.""
No changes to these patches, MRI needed to be notified when changing
uses into defs and vice versa.

llvm-svn: 161644
2012-08-10 00:21:30 +00:00
Jakob Stoklund Olesen acd27c9279 Revert r161633-161634 "Partition use lists so defs always come before uses."
These commits broke a number of buildbots.

llvm-svn: 161640
2012-08-09 23:31:36 +00:00
Jakob Stoklund Olesen df01e00710 Partition use lists so defs always come before uses.
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.

In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.

Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).

llvm-svn: 161634
2012-08-09 22:49:46 +00:00
Jakob Stoklund Olesen 7d7051ca3c Don't use pointer-pointers for the register use lists.
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.

The test changes are required because the register allocation hint is
using the use-list order to break ties.

llvm-svn: 161633
2012-08-09 22:49:42 +00:00
Jakob Stoklund Olesen 4238a89db8 Don't modify MO while use_iterator is still pointing to it.
llvm-svn: 161626
2012-08-09 22:08:24 +00:00
Chandler Carruth 4cbe653ace Teach the LLVM test makefile to run the extra Clang tools' test suites
as part of check-all.

llvm-svn: 161610
2012-08-09 20:26:41 +00:00
Jack Carter 120a30a732 Another 32 to 64 bit sign extension bug.
The fields in the td definition were switched.

llvm-svn: 161607
2012-08-09 19:43:18 +00:00
Arnold Schwaighofer 81b2eec1ab Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 161581
2012-08-09 15:25:52 +00:00
Nadav Rotem e0f84d31c8 Fix the legalization of ExtLoad on ARM. ExpandUnalignedLoad did not properly
handle the cases where the memory value type was illegal. 
PR 13111. 

llvm-svn: 161565
2012-08-09 01:56:44 +00:00
Bob Wilson 4c65c505e0 Add test triples to fix win32 failures. Revert workaround from r161292.
I don't have a win32 system to test, so hopefully I got them all fixed here.

llvm-svn: 161519
2012-08-08 20:31:37 +00:00
NAKAMURA Takumi ec934cd64d llvm/test/MC/COFF/seh.s: Fixup corresponding to r161487.
llvm-svn: 161489
2012-08-08 13:27:04 +00:00
Bill Wendling ce4fe41d21 Add `.pushsection', `.popsection', and `.previous' directives to Darwin ASM.
There are situations where inline ASM may want to change the section -- for
instance, to create a variable in the .data section. However, it cannot do this
without (potentially) restoring to the wrong section. E.g.:

  asm volatile (".section __DATA, __data\n\t"
                ".globl _fnord\n\t"
                "_fnord: .quad 1f\n\t"
                ".text\n\t"
                "1:" :::);

This may be wrong if this is inlined into a function that has a "section"
attribute. The user should use `.pushsection' and `.popsection' here instead.

The addition of `.previous' is added for completeness.
<rdar://problem/12048387>

llvm-svn: 161477
2012-08-08 06:30:30 +00:00
Eli Friedman 08ec0a8122 isAllocLikeFn is allowed to return true for functions which read memory; make
sure we account for that correctly in DeadStoreElimination.  Fixes a regression
from r158919.  PR13547.

llvm-svn: 161468
2012-08-08 02:17:32 +00:00
Manman Ren 1be131ba27 X86: enable CSE between CMP and SUB
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.

Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.

rdar://11873276

llvm-svn: 161462
2012-08-08 00:51:41 +00:00
Dan Gohman b948736002 Avoid recomputing the unique exit blocks and their insert points when doing
multiple scalar promotions on a single loop. This also has the effect of
preserving the order of stores sunk out of loops, which is aesthetically
pleasing, and it happens to fix the testcase in PR13542, though it doesn't
fix the underlying problem.

llvm-svn: 161459
2012-08-08 00:00:26 +00:00
Bob Wilson 61f3ad5759 Fix a serious typo in InstCombine's optimization of comparisons.
An unsigned value converted to floating-point will always be greater than
a negative constant.  Unfortunately InstCombine reversed the check so that
unsigned values were being optimized to always be greater than all positive
floating-point constants.  <rdar://problem/12029145>

llvm-svn: 161452
2012-08-07 22:35:16 +00:00
Evan Cheng fbdd25c135 X86 cmp lowering is looking past truncate on the condition node. It should only
do so when the high bits are known zero. This caused a subtle miscompilation.

rdar://12027825 

llvm-svn: 161451
2012-08-07 22:21:00 +00:00
Alexey Samsonov 947228c4f7 Fix the representation of debug line table in DebugInfo LLVM library,
and "instruction address -> file/line" lookup.

Instead of plain collection of rows, debug line table for compilation unit is now
treated as the number of row ranges, describing sequences (series of contiguous machine
instructions). The sequences are not always listed in the order of increasing
address, so previously used std::lower_bound() sometimes produced wrong results.
Now the instruction address lookup consists of two stages: finding the correct
sequence, and searching for address in range of rows for this sequence.

llvm-svn: 161414
2012-08-07 11:46:57 +00:00
Benjamin Kramer c99d0e9186 PR13095: Give an inline cost bonus to functions using byval arguments.
We give a bonus for every argument because the argument setup is not needed
anymore when the function is inlined. With this patch we interpret byval
arguments as a compact representation of many arguments. The byval argument
setup is implemented in the backend as an inline memcpy, so to model the
cost as accurately as possible we take the number of pointer-sized elements
in the byval argument and give a bonus of 2 instructions for every one of
those. The bonus is capped at 8 elements, which is the number of stores
at which the x86 backend switches from an expanded inline memcpy to a real
memcpy. It would be better to use the real memcpy threshold from the backend,
but it's not available via TargetData.

This change brings the performance of c-ray in line with gcc 4.7. The included
test case tries to reproduce the c-ray problem to catch regressions for this
benchmark early, its performance is dominated by the inline decision of a
specific call.

This only has a small impact on most code, more on x86 and arm than on x86_64
due to the way the ABI works. When building LLVM for x86 it gives a small
inline cost boost to virtually any function using StringRef or STL allocators,
but only a 0.01% increase in overall binary size. The size of gcc compiled by
clang actually shrunk by a couple bytes with this patch applied, but not
significantly.

llvm-svn: 161413
2012-08-07 11:13:19 +00:00
Chandler Carruth 2f6cf4884c Fix PR13412, a nasty miscompile due to the interleaved
instsimplify+inline strategy.

The crux of the problem is that instsimplify was reasonably relying on
an invariant that is true within any single function, but is no longer
true mid-inline the way we use it. This invariant is that an argument
pointer != a local (alloca) pointer.

The fix is really light weight though, and allows instsimplify to be
resiliant to these situations: when checking the relation ships to
function arguments, ensure that the argumets come from the same
function. If they come from different functions, then none of these
assumptions hold. All credit to Benjamin Kramer for coming up with this
clever solution to the problem.

llvm-svn: 161410
2012-08-07 10:59:59 +00:00
Chandler Carruth 881d0a7966 Add a much more conservative strategy for aligning branch targets.
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.

As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.

Fixes PR13265.

llvm-svn: 161409
2012-08-07 09:45:24 +00:00
Manman Ren cb36b8c2e6 MachineCSE: Update the heuristics for isProfitableToCSE.
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.

rdar://11393714 and rdar://11819721

llvm-svn: 161396
2012-08-07 06:16:46 +00:00
Jack Carter f4946cfbb9 The define for 64 bit sign extension neglected to
initialize fields of the class that it used.

The result was nonsense code.

Before:
0000000000000000 <foo>:
   0:    00441100     0x441100
   4:    03e00008     jr    ra
   8:    00000000     nop

After:
0000000000000000 <foo>:
   0:    00041000     sll    v0,a0,0x0
   4:    03e00008     jr    ra
   8:    00000000     nop 

llvm-svn: 161377
2012-08-07 00:35:22 +00:00
Jack Carter 612c66314c The Mips64InstrInfo.td definitions DynAlloc64 LEA_ADDiu64
were using a class defined for 32 bit instructions and 
thus the instruction was for addiu instead of daddiu.

This was corrected by adding the instruction opcode as a 
field in the  base class to be filled in by the defs.

llvm-svn: 161359
2012-08-06 23:29:06 +00:00
Jack Carter 84491abb20 Mips relocations R_MIPS_HIGHER and R_MIPS_HIGHEST.
These 2 relocations gain access to the 
highest and the second highest 16 bits
of a 64 bit object.

R_MIPS_HIGHER %higher(A+S)
The %higher(x) function is [ (((long long) x + 0x80008000LL) >> 32) & 0xffff ]. 

R_MIPS_HIGHEST %highest(A+S)
The %highest(x) function is [ (((long long) x + 0x800080008000LL) >> 48) & 0xffff ]. 

llvm-svn: 161348
2012-08-06 21:26:03 +00:00
Hal Finkel 33e529d56b MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Craig Topper ab47fe4e16 Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.
llvm-svn: 161318
2012-08-06 06:22:36 +00:00
Craig Topper 812005e562 Update test to check for r161305
llvm-svn: 161307
2012-08-05 09:06:28 +00:00
Hal Finkel 70381a7b18 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Anton Korobeynikov 218aaf6d04 Add stack spill / reload instructions for DTriple and DQuad register classes, which
were missed for no reason. This fixes PR13377

llvm-svn: 161299
2012-08-04 13:16:12 +00:00
Bob Wilson 874886cd66 Refactor and check "onlyReadsMemory" before optimizing builtins.
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.

llvm-svn: 161282
2012-08-03 23:29:17 +00:00
Akira Hatanaka 22bec282e9 1. Redo mips16 instructions to avoid multiple opcodes for same instruction.
Change these to patterns.
2. Add another 16 instructions.

Patch by Reed Kotler.

llvm-svn: 161272
2012-08-03 22:57:02 +00:00
Bob Wilson fa59485b94 Fix memcmp code-gen to honor -fno-builtin.
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin.  I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.

llvm-svn: 161262
2012-08-03 21:26:18 +00:00
Bob Wilson 3e6fa462f3 Fall back to selection DAG isel for calls to builtin functions.
Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>

llvm-svn: 161232
2012-08-03 04:06:28 +00:00
Jush Lu 4705da9020 [arm-fast-isel] Add support for shl, lshr, and ashr.
llvm-svn: 161230
2012-08-03 02:37:48 +00:00
NAKAMURA Takumi edac6ee6aa [CMake] Add yaml2obj to check-llvm.
llvm-svn: 161229
2012-08-03 00:45:32 +00:00
Matt Beaumont-Gay 99fc2e19a6 Move test yaml files under Inputs until they are converted to be the actual
test files.

llvm-svn: 161219
2012-08-02 21:52:49 +00:00