Commit Graph

18635 Commits

Author SHA1 Message Date
Sanjay Patel 19792fb270 [X86, AVX] replace vinsertf128 intrinsics with generic shuffles
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088

Differential Revision: http://reviews.llvm.org/D8086

llvm-svn: 231794
2015-03-10 16:08:36 +00:00
Rafael Espindola 3c066d1da3 Remove effectively dead code.
Switching back and forth between sections does nothing (other than producing
larger .s files).

llvm-svn: 231790
2015-03-10 14:48:01 +00:00
Daniel Sanders 2db94ba0bc The operand flag word used in ISD::INLINEASM is an i32 not a pointer. NFC.
Summary:
This is part of the work to support memory constraints that behave
differently to 'm'. The subsequent patches will expand on the existing
encoding (which is a 32-bit int) and as a result in some flag words will no
longer fit into an i16. This problem only affected the MSP430 target which
appears to have 16-bit pointers.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D8168

llvm-svn: 231783
2015-03-10 10:42:59 +00:00
Rafael Espindola c63a98a662 Move variable into assert to fix -Asserts builds.
llvm-svn: 231753
2015-03-10 04:28:09 +00:00
Rafael Espindola 760bf9520a Remove incredibly confusing isBaseAddressKnownZero.
When referring to a symbol in a dwarf section on ELF we should use

.long foo

instead of

.long foo - .debug_something

because ELF is unaware of the content of the sections and therefore needs
relocations. This has nothing to do with optimizing a -0.

llvm-svn: 231751
2015-03-10 04:11:52 +00:00
Rafael Espindola fcc2821882 Use a better name for compile unit labels.
They mark the start of a compile unit, so name them .Lcu_*. Using
Section->getLabelBeginName() makes it looks like they mark the start of the
section.

While at it, switch to createTempSymbol to avoid collisions with labels
created in inline assembly. Not sure if a "don't crash" test is worth it.

With this getLabelBeginName is dead, delete it.

llvm-svn: 231750
2015-03-10 03:58:36 +00:00
Frederic Riss 44a219f0a0 DwarfAccelTable: remove unneeded bucket terminators.
Last commit fixed the handling of hash collisions, but it introdcuced
unneeded bucket terminators in some places. The generated table was
correct, it can just be a tiny bit smaller. As the previous table was
correct, the test doesn't need updating. If we really wanted to test
this, I could add the section size to the dwarf dump and test for a
precise value there. IMO the correctness test is sufficient.

llvm-svn: 231748
2015-03-10 03:47:55 +00:00
Rafael Espindola 3c31114824 Move label creation close to emission. NFC.
llvm-svn: 231744
2015-03-10 03:11:11 +00:00
Mehdi Amini a28d91d81b DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Frederic Riss 0e9a50f5b5 DwarfAccelTable: Fix handling of hash collisions.
It turns out accelerator tables where totally broken if they contained
entries with colliding hashes. The failure mode is pretty bad, as it not
only impacted the colliding entries, but would basically make all the
entries after the first hash collision pointing in the wrong place.

The testcase uses the symbol names that where found to collide during a
clang build.

From a performance point of view, the patch adds a sort and a linear
walk over each bucket contents. While it has a measurable impact on the
accelerator table emission, it's not showing up significantly in clang
profiles (and I'd argue that correctness is priceless :-)).

llvm-svn: 231732
2015-03-10 00:46:31 +00:00
Ahmed Bougacha c809761dc0 [CodeGen] Replace the reused stores' chain for extractelt expansion.
This fixes a subtle issue that was introduced in r205153.

When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.

To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.

rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180

llvm-svn: 231721
2015-03-09 22:51:05 +00:00
Reid Kleckner be0a05060f Reland r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
Fix the double-deletion of AnalysisResolver when delegating through to
Dwarf EH preparation by creating one from scratch. Hopefully the new
pass manager simplifies this.

This reverts commit r229952.

llvm-svn: 231719
2015-03-09 22:45:16 +00:00
Rafael Espindola 4f4ef15ade Use a MapVector instead of an extra sort.
This also has the advantage of not depending on the brittle getLabelBeginName.

llvm-svn: 231714
2015-03-09 22:08:37 +00:00
Frederic Riss 2ad5188f2d DwarfAccelTable: fix obvious typo.
I have a test for that issue, but I didn't include it in the commit as it's
a 200KB file for a pretty minor issue. (The reason the file is so big is
that it needs > 1024 variables/functions to trigger and that with debug
information.

The issue/fix on the other side is totally trivial. If poeple want the test
commited, I can do that. It just didn't seem worth it to me.

llvm-svn: 231701
2015-03-09 21:09:50 +00:00
Rafael Espindola 14862d3e37 Don't prime the section map.
This was just creating unused labels for .text when the module had no
functions.

llvm-svn: 231694
2015-03-09 20:09:58 +00:00
Rafael Espindola a60017902c Print jump tables before exception tables.
In the case where just tables are part of the function section, this produces
more readable assembly by avoiding switching to the eh section and back
to .text.

This would also break with non unique section names, as trying to switch to
a unique section actually creates a new one.

llvm-svn: 231677
2015-03-09 18:29:12 +00:00
Rafael Espindola ef142e407a Don't repeat name in comment. NFC.
llvm-svn: 231676
2015-03-09 18:11:42 +00:00
Rafael Espindola 8773b00036 Remove dummy method implementations.
These are pure virtual in the base class, so the compiler checks that they
are implemented.

llvm-svn: 231673
2015-03-09 17:58:49 +00:00
David Blaikie dc3f01e9cf Simplify expressions involving boolean constants with clang-tidy
Patch by Richard (legalize at xmission dot com).

Differential Revision: http://reviews.llvm.org/D8154

llvm-svn: 231617
2015-03-09 01:57:13 +00:00
Benjamin Kramer 57a3d084cd Make static variables const if possible. Makes them go into a read-only section.
Or fold them into a initializer list which has the same effect. NFC.

llvm-svn: 231598
2015-03-08 16:07:39 +00:00
Simon Pilgrim 8c58c066b7 [DAGCombiner] Add a shuffle mask commutation helper function. NFCI.
We have an increasing number of cases where we are creating commuted shuffle masks - all implementing nearly the same code.

This patch adds a static helper function - ShuffleVectorSDNode::commuteMask() and replaces a number of cases to use it.

Differential Revision: http://reviews.llvm.org/D8139

llvm-svn: 231581
2015-03-07 22:33:11 +00:00
Benjamin Kramer 867bfc53ee Make constant arrays that are passed to functions as const.
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.

llvm-svn: 231571
2015-03-07 17:41:00 +00:00
Simon Pilgrim 2dcbe74dfd Use SDValue bool check to tidyup some possible combines. NFC.
llvm-svn: 231569
2015-03-07 16:34:55 +00:00
Andrea Di Biagio c9d79e8103 [DAGCombiner] Fix wrong folding of AND dag nodes.
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))

An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.

Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.

Added X86 test and-load-fold.ll

Differential Revision: http://reviews.llvm.org/D8085

llvm-svn: 231563
2015-03-07 12:24:55 +00:00
Simon Pilgrim bede80a440 [DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLE
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.

This prevents many cases of spilling scalar data between the gpr + simd registers. 

At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).

Differential Revision: http://reviews.llvm.org/D8132

llvm-svn: 231554
2015-03-07 05:52:42 +00:00
Matthias Braun 898d11e864 DAGCombiner: Canonicalize select(and/or,x,y) depending on target.
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))

Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.

Differential Revision: http://reviews.llvm.org/D7622

llvm-svn: 231507
2015-03-06 19:49:10 +00:00
Matthias Braun 3ecb557739 DAGCombiner: Factor out some and/or combines.
This is in preparation for changing visitSELECT to normalize towards
select(Cond0, select(Cond1, X, Y), Y);
select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of
the conditions.

The factored function contains all DAGCombine rules which reduce two values
combined by an And/Or operation to a single value. This does not include rules
involving constants as visitSELECT already handles that case.

Differential Revision: http://reviews.llvm.org/D8026

llvm-svn: 231506
2015-03-06 19:49:06 +00:00
Matthias Braun 046318b87e ExecutionDepsFix: Indizes -> Indices.
Translate german to english.

llvm-svn: 231500
2015-03-06 18:56:20 +00:00
Eric Christopher 6a8bfe7198 Fix typo.
llvm-svn: 231495
2015-03-06 18:20:23 +00:00
Bruno Cardoso Lopes 618c67a018 [AsmPrinter][TLOF] 32-bit MachO support for replacing GOT equivalents
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.

-- before

    _extgotequiv:
       .long _extfoo

    _delta:
       .long _extgotequiv-_delta

-- after

    _delta:
       .long L_extfoo$non_lazy_ptr-_delta

       .section __IMPORT,__pointers,non_lazy_symbol_pointers
    L_extfoo$non_lazy_ptr:
       .indirect_symbol _extfoo
       .long 0

llvm-svn: 231475
2015-03-06 13:49:05 +00:00
Bruno Cardoso Lopes 52b1391df6 [AsmPrinter][TLOF] ARM64 MachO support for replacing GOT equivalents
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:

-- before

   .globl  _foo
  _foo:
   .long   42

   .globl  _gotequivalent
  _gotequivalent:
   .quad   _foo

   .globl  _delta
  _delta:
   .long   _gotequivalent-_delta

-- after

   .globl  _foo
  _foo:
   .long   42

   .globl  _delta
  Ltmp3:
   .long _foo@GOT-Ltmp3

llvm-svn: 231474
2015-03-06 13:48:45 +00:00
Michael Zolotukhin 03dd1082ad LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant.
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.

llvm-svn: 231443
2015-03-06 01:13:01 +00:00
Benjamin Kramer fb0abceb5c SelectionDAGBuilder: Merge 3 copies of the limited precision exp2 emission code.
NFC intended.

llvm-svn: 231406
2015-03-05 21:13:08 +00:00
Andrew Kaylor 05ee8bd4e3 Fix uninitialized memory references in WinEHPrepare
llvm-svn: 231405
2015-03-05 21:06:42 +00:00
Benjamin Kramer c54c38e090 SDAG: Merge the meat of two ExpandAtomic implementations.
The copies already diverged, don't let them become any worse. Reduce
redundancy in code with a little macro metaprogramming.

llvm-svn: 231401
2015-03-05 20:04:29 +00:00
Rafael Espindola 092b619e55 Use the correct func begin symbol in all places in ppc.
I missed an occurrence of the old symbol in my previous patch.

llvm-svn: 231398
2015-03-05 19:47:50 +00:00
Rafael Espindola 86bd6a1202 Use the generic Lfunc_begin label on ppc.
This removes yet another custom label to mark the start of a function.

llvm-svn: 231390
2015-03-05 18:55:50 +00:00
David Majnemer 71b9b6be1b X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes
We know that the absolute symbol will be less than 2GB and thus will
always fit.

llvm-svn: 231389
2015-03-05 18:50:12 +00:00
Reid Kleckner cfb9ce53c1 Replace llvm.frameallocate with llvm.frameescape
Turns out it's pretty straightforward and simplifies the implementation.

Reviewers: andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D8051

llvm-svn: 231386
2015-03-05 18:26:34 +00:00
Simon Pilgrim 7189084bef [DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.

Differential Revision: http://reviews.llvm.org/D7939

llvm-svn: 231380
2015-03-05 17:14:04 +00:00
Igor Laevsky 8d0851f509 Revert change r231366 as it broke clang-native-arm-cortex-a9 Analysis/properties.m test.
llvm-svn: 231374
2015-03-05 15:41:14 +00:00
Elena Demikhovsky de05f10de2 AVX-512, SKX: Enabled masked_load/store operations for this target.
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.

llvm-svn: 231371
2015-03-05 15:11:35 +00:00
Igor Laevsky 1725997f14 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.

llvm-svn: 231366
2015-03-05 14:11:21 +00:00
Arnaud A. de Grandmaison d8ed0d372c [PBQP] Use a local bit-matrix to speedup searching an edge in the graph.
Build time (user time) for building llvm+clang+lldb in release mode:
 - default allocator: 9086 seconds
 - with PBQP: 9126 seconds
 - with PBQP + local bit matrix cache: 9097 seconds

llvm-svn: 231360
2015-03-05 09:12:59 +00:00
Frederic Riss 6e56345dbc Remove useless break after return.
Pointed out by Paul Robinson.

llvm-svn: 231353
2015-03-05 06:13:39 +00:00
Chandler Carruth 7a715dae05 [MBP] Use range based for-loops throughout this code. Several had
already been added and the inconsistency made choosing names and
changing code more annoying. Plus, wow are they better for this code!

llvm-svn: 231347
2015-03-05 03:19:05 +00:00
Chandler Carruth 2fc3fe1282 [MBP] NFC, run clang-format over this code and tweak things to make the
result reasonable.

This code predated clang-format and so there was a reasonable amount of
crufty formatting that had accumulated. This should ensure that neither
myself nor others end up with formatting-only changes sneaking into
other fixes.

llvm-svn: 231341
2015-03-05 02:35:31 +00:00
Chandler Carruth d0dced58ab [MBP] This is no longer 'block-placement2'. ;] The old variants are long
gone, update this code to reflect that.

llvm-svn: 231340
2015-03-05 02:28:25 +00:00
Rafael Espindola 07c03d316d Use the existing begin and end symbol for debug info.
llvm-svn: 231338
2015-03-05 02:05:42 +00:00
Chandler Carruth af7e99f2f4 [MBP] Revert r231238 which attempted to fix a nasty bug where MBP is
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).

This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.

llvm-svn: 231332
2015-03-05 01:07:03 +00:00
Paul Robinson 49e38965dc Turn off .debug_pubnames/pubtypes for PS4.
Differential Revision: http://reviews.llvm.org/D8067

llvm-svn: 231322
2015-03-05 00:08:27 +00:00
Frederic Riss ee17fb9b0e Teach DIEInteger to emit FORM_strp and FORM_ref_addr attributes.
To be used/tested by llvm-dsymutil. (llvm-dsymutil does a 'static' link,
no need for relocations for most things, so it'll just emit raw integers
for most attributes)

llvm-svn: 231298
2015-03-04 22:07:36 +00:00
Paul Robinson 78cc0821f0 Support standard DWARF TLS opcode; Darwin and PS4 use it.
Differential Revision: http://reviews.llvm.org/D8018

llvm-svn: 231286
2015-03-04 20:55:11 +00:00
Mehdi Amini 46a43556db Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
Wei Mi 4d9347993b Revert the test commit.
llvm-svn: 231264
2015-03-04 17:44:22 +00:00
Wei Mi 20401eecd6 Test commit. It will be reverted in the next commit.
llvm-svn: 231262
2015-03-04 17:41:17 +00:00
Adrian Prantl 0f61579602 Fix DwarfExpression::AddMachineRegExpression so it doesn't read past the
end of an expression that ends with DW_OP_plus.
Caught by the ASAN build bots.

llvm-svn: 231260
2015-03-04 17:39:33 +00:00
JF Bastien f14889ee34 Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded.
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.

This does not represent a behavioural change and as such no tests were added.

Patch by: Richard Diamond.

Reviewers: jfb

Reviewed By: jfb

Subscribers: jfb, aemerson, t.p.northover, llvm-commits

Differential Revision: http://reviews.llvm.org/D7713

llvm-svn: 231250
2015-03-04 15:47:57 +00:00
Chandler Carruth 9a53fbe243 [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind
a flag for now.

First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.

The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:

  if (a1()) { if (b1()) c1(); else d1(); f1(); }
  if (a2()) { if (b2()) c2(); else d2(); f2(); }
  done();

Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:

  a1, a2, done;

And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:

  a1, a2, done; b1, c1;

At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.

... wait for it ...

Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!

  a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;

So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.

It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.

Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.

llvm-svn: 231238
2015-03-04 12:18:08 +00:00
Daniel Jasper 471e856f49 Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.

LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.

Review: http://reviews.llvm.org/D7719
llvm-svn: 231230
2015-03-04 11:05:34 +00:00
Michael Kuperstein fb95697c88 [DAGCombine] Fix a bug in a BUILD_VECTOR combine
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector. 
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.

Differential Revision: http://reviews.llvm.org/D8040

llvm-svn: 231219
2015-03-04 07:27:39 +00:00
Frederic Riss 9412d63f68 Move emitDIE and emitAbbrevs to AsmPrinter. NFC.
(They are called emitDwarfDIE and emitDwarfAbbrevs in their new home)

llvm-dsymutil wants to reuse that code, but it doesn't have a DwarfUnit or
a DwarfDebug object to call those. It has access to an AsmPrinter though.

Having emitDIE in the AsmPrinter also removes the DwarfFile dependency
on DwarfDebug, and thus the patch drops that field.

Differential Revision: http://reviews.llvm.org/D8024

llvm-svn: 231210
2015-03-04 02:30:17 +00:00
Frederic Riss cd04434cd5 Constify AsmPrinter passed to DIE methods.
llvm-svn: 231209
2015-03-04 02:30:08 +00:00
Mehdi Amini 367bfa42d8 Use report_fatal_error instead of unreachable for -fast-isel-abort
Suggestion by Andrea Di Biagio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231201
2015-03-04 01:48:39 +00:00
Rafael Espindola 310e4b592f Use the vanilla func_end symbol for .size.
No need to create yet another temp symbol.

llvm-svn: 231198
2015-03-04 01:35:23 +00:00
David Blaikie ed40025f37 Recommit r231168: unique_ptrify LiveRange::segmentSet
GCC 4.7's libstdc++ doesn't have std::map::emplace, but it does have
std::unordered_map::emplace, and the use case here doesn't appear to
need ordering. The container has been changed in a separate/precursor
patch, and now this patch should hopefully build cleanly even with
GCC 4.7.

& then I realized the order of the container did matter, so extra
handling of ordering was added in r231189.

Original commit message:
This makes LiveRange non-copyable, and LiveInterval is already
non-movable (due to the explicit dtor), so now it's non-copyable and
non-movable.

Fix the one case where we were relying on the (deprecated in C++11)
implicit copy ctor of LiveInterval (which happened to work because the
ctor created an object with a null segmentSet, so double-deleting the
null pointer was fine).

llvm-svn: 231192
2015-03-04 01:20:33 +00:00
David Blaikie 55c6222538 Recommit r231175: Change LiveStackAnalysis::SS2IntervalMap from std::map to std::unordered_map
The order of this container was needed at one point - so, at that point
create a temporary array of pointers, sort those, then iterate them.
This keeps lookup efficient (& the lesser issue, of allowing the use of
emplace... ), object identity preserved, and ordered iteration in the
one place that requires it.

While this has no functional change, I realize it does mean allocating
an extra data structure and performing a sort - so if this looks suspect
to anyone regarding perf characteristics, I'm all ears.

llvm-svn: 231189
2015-03-04 01:15:53 +00:00
Matthias Braun 9f0c91f0d9 RegisterCoalescer: Gracefully continue if subrange merging fails.
There is a known bug where the register coalescer fails to merge
subranges when multiple ranges end up in the "overflow" bit 32 of the
lanemasks. A proper fix for this is complicated so for now this is a
workaround which lets the register coalescer drop the subregister
liveness information (we just loose some precision by that) and
continue.

llvm-svn: 231186
2015-03-04 00:43:50 +00:00
Rafael Espindola 0ac5075f31 Drop the "eh_" from eh_func_begin and eh_func_end.
They will be used for more than eh tables.

llvm-svn: 231185
2015-03-04 00:27:43 +00:00
David Blaikie 90c59ccae6 Revert "unique_ptrify LiveRange::segmentSet"
Apparently something does care about ordering of LiveIntervals... so
revert all that stuff (r231175, r231176, r231177) & take some time to
re-evaluate.

llvm-svn: 231184
2015-03-04 00:15:02 +00:00
David Blaikie 19660f03be Recommit r231168: unique_ptrify LiveRange::segmentSet
GCC 4.7's libstdc++ doesn't have std::map::emplace, but it does have
std::unordered_map::emplace, and the use case here doesn't appear to
need ordering. The container has been changed in a separate/precursor
patch, and now this patch should hopefully build cleanly even with
GCC 4.7.

Original commit message:
This makes LiveRange non-copyable, and LiveInterval is already
non-movable (due to the explicit dtor), so now it's non-copyable and
non-movable.

Fix the one case where we were relying on the (deprecated in C++11)
implicit copy ctor of LiveInterval (which happened to work because the
ctor created an object with a null segmentSet, so double-deleting the
null pointer was fine).

llvm-svn: 231176
2015-03-03 23:53:03 +00:00
David Blaikie 923a25e957 Revert "unique_ptrify LiveRange::segmentSet"
GCC 4.7 *shakes fist* (doesn't have std::map::emplace... )

This reverts commit r231168.

llvm-svn: 231173
2015-03-03 23:44:07 +00:00
David Blaikie 5a0206a3ff unique_ptrify LiveRange::segmentSet
This makes LiveRange non-copyable, and LiveInterval is already
non-movable (due to the explicit dtor), so now it's non-copyable and
non-movable.

Fix the one case where we were relying on the (deprecated in C++11)
implicit copy ctor of LiveInterval (which happened to work because the
ctor created an object with a null segmentSet, so double-deleting the
null pointer was fine).

llvm-svn: 231168
2015-03-03 23:30:40 +00:00
Reid Kleckner 423665311d WinEH: Remove vestigial EH object
Ultimately, we'll need to leave something behind to indicate which
alloca will hold the exception, but we can figure that out when it comes
time to emit the __CxxFrameHandler3 catch handler table.

llvm-svn: 231164
2015-03-03 23:20:30 +00:00
Eric Christopher 2891913f1a Fix a problem where the TwoAddressInstructionPass which generate redundant register moves in a loop.
From:
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
  total = total + i / 2;
}
}

This is the kernel loop:

.LBB0_2: # %for.body

=>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".

The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body

Predecessors according to CFG: BB#1 BB#2
    %vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
    %vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
    %vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
    %vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
    %vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
    %vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
    %vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
    CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
    %vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
    %vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
    JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.

So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy.

Patch by Wei Mi.
http://reviews.llvm.org/D7806

llvm-svn: 231148
2015-03-03 22:03:03 +00:00
David Blaikie 49cfb81665 DAGCombiner::LoadedSlice: Remove explicit copy ctor in favor of the Rule of Zero
This way, the copy assignment operator can be used without hitting the
deprecated case in C++11.

llvm-svn: 231144
2015-03-03 21:50:47 +00:00
David Blaikie 7f1e0565b3 Revert "Remove the explicit SDNodeIterator::operator= in favor of the implicit default"
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.

This reverts commit r231135.

llvm-svn: 231136
2015-03-03 21:18:16 +00:00
David Blaikie bb8da4c08f Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.

llvm-svn: 231135
2015-03-03 21:17:08 +00:00
David Blaikie 0ef4488df2 Remove LatencyPriorityQueue::dump because it relies on an implicit copy ctor which is deprecated in C++11 (due to the presence of a user-declare dtor in the base class)
This type could be made copyable (= default a protected copy ctor in the
base class, and preferably make the derived class final to avoid risks
of providing a slicing copy operation to further derived classes) but it
seemed easier to avoid that complexity for a dump function that I assume
(by symmetry with ResourcePriorityQueue's dump, which was actively
buggy) not often used.

llvm-svn: 231133
2015-03-03 21:16:56 +00:00
David Blaikie 0a756e6ad5 unique_ptrify ResourcePriorityQueue::ResourceModel
llvm-svn: 231127
2015-03-03 20:49:08 +00:00
David Blaikie b8cd65c5a2 Remove ResourcePriorityQueue::dump as it relies on copying a non-copyable type which would result in a double-delete
llvm-svn: 231126
2015-03-03 20:49:05 +00:00
Andrew Kaylor e07b2a06d3 Fixing problem with field initialization order
llvm-svn: 231122
2015-03-03 20:22:09 +00:00
Adrian Prantl b283815a30 Fix PR22762. When emitting a DWARF expression check whether this is the
frame register before checking if there is a DWARF register number for it.

Thanks to H.J. Lu for diagnosing this and providing the testcase!

llvm-svn: 231121
2015-03-03 20:12:52 +00:00
Andrew Kaylor f0f5e46e07 Outline cleanup handlers for native Windows C++ exception handling
Differential Revision: http://reviews.llvm.org/D7865

llvm-svn: 231117
2015-03-03 20:00:16 +00:00
Eric Christopher 720ab84ba2 Add a comment above findRepresentativeClass explaining why it's
where it is so that future generations can understand.

llvm-svn: 231111
2015-03-03 19:47:14 +00:00
Dario Domizioli 5f7008a688 Fix PR22750: non-determinism causes assertion failure in DWARF generation
The cause of the issue is the interaction of two factors:
1) When generating a DW_TAG_imported_declaration DIE which imports another 
   imported declaration, the code in AsmPrinter/DwarfCompileUnit.cpp 
   asserts that the second imported declaration must already have a DIE.
2) There is a non-determinism in the order in which imported declarations
   within the same scope are processed.
Because of the non-determinism (2), it is possible that an imported 
declaration is processed before another one it depends on, breaking the 
assumption in (1).

The source of the non-determinism is that the imported declaration 
DIDescriptors are sorted by scope in DwarfDebug::beginModule(); however that
sort is not a stable_sort, therefore the order of the declarations within 
the same scope is not preserved. The attached patch changes the std::sort to
a std::stable_sort and it fixes the problem.

Test omitted due to it being non-deterministic and depending on the
implementation of std::sort.

llvm-svn: 231100
2015-03-03 18:40:53 +00:00
Daniel Jasper 8f239f83b0 During PHI elimination, split critical edges that move copies out of loops.
This prevents the behavior observed in llvm.org/PR22369. I am not sure
whether I am reading the code correctly, but the early exit based on
isLiveOutPastPHIs() seems to make the wrong assumption that
RegisterCoalescer won't be able to coalesce those copies later.

This change hides the new behavior behind -no-phi-elim-live-out-early-exit
as it currently breaks four tests:
 * Assertion in:
     CodeGen/Hexagon/hwloop-cleanup.ll
 * Worse code in:
     CodeGen/X86/coalescer-commute4.ll
     CodeGen/X86/phys_subreg_coalesce-2.ll
     CodeGen/X86/zlib-longest-match.ll
   The root cause here seems to be that the heuristic that determines
   the visitation order in RegisterCoalescer gets less lucky.

llvm-svn: 231064
2015-03-03 10:23:11 +00:00
Andrew Kaylor 72029c6f2f Remap arguments and non-alloca values used by outlined C++ exception handlers.
Differential Revision: http://reviews.llvm.org/D7844

llvm-svn: 231042
2015-03-03 00:41:03 +00:00
Adrian Prantl b846acc6c6 Revert "Revert "For the dwarf expression code get the subtarget off of the current""
This reapplies r230990 without modifications.

llvm-svn: 231024
2015-03-02 22:02:36 +00:00
Adrian Prantl 92da14b244 Refactor DebugLocDWARFExpression so it doesn't require access to the
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.

Ought to be NFC, but it does slightly alter the output format of the
textual assembly.

This reapplies 230930 without the assertion in DebugLocEntry::finalize()
because not all Machine registers can be lowered into DWARF register
numbers and floating point constants cannot be expressed.

llvm-svn: 231023
2015-03-02 22:02:33 +00:00
Rui Ueyama 3206b79d53 Use read{16,32,64}{le,be}() instead of *reinterpret_cast<u{little,big}{16,32,64}_t>().
llvm-svn: 231016
2015-03-02 21:19:12 +00:00
Adrian Prantl 2185aa179d Revert "Refactor DebugLocDWARFExpression so it doesn't require access to the"
This reverts commit 230975 to investigate buildbot breakage.

llvm-svn: 231004
2015-03-02 20:01:54 +00:00
Adrian Prantl abb9192652 Revert "For the dwarf expression code get the subtarget off of the current"
This reverts commit 230990 because also reverting 230975.

llvm-svn: 231003
2015-03-02 20:01:47 +00:00
Eric Christopher d8cacd2e97 For the dwarf expression code get the subtarget off of the current
MachineFunction.

llvm-svn: 230990
2015-03-02 19:01:47 +00:00
Adrian Prantl d50bca7314 Refactor DebugLocDWARFExpression so it doesn't require access to the
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.

Ought to be NFC, but it does slightly alter the output format of the
textual assembly.

This reapplies 230930 with a relaxed assertion in DebugLocEntry::finalize()
that allows for empty DWARF expressions for constant FP values.

llvm-svn: 230975
2015-03-02 17:21:06 +00:00
Benjamin Kramer 0b6742aeb5 Accidentaly inverted the condition again. Sorry.
llvm-svn: 230973
2015-03-02 16:45:08 +00:00
Benjamin Kramer f43de1879a Avoid assertion in MSVC 2013 debug builds.
llvm-svn: 230972
2015-03-02 16:42:56 +00:00
Benjamin Kramer 8008e9f624 Simplify code. NFC.
llvm-svn: 230948
2015-03-02 11:57:04 +00:00
Nico Weber 968ceddca9 Revert r230930, it caused PR22747.
llvm-svn: 230932
2015-03-02 04:37:11 +00:00
Adrian Prantl e2c9e64532 Refactor DebugLocDWARFExpression so it doesn't require access to the
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.

Ought to be NFC, but it does slightly alter the output format of the
textual assembly.

llvm-svn: 230930
2015-03-02 02:38:18 +00:00
Arnaud A. de Grandmaison a57ca81eb4 [PBQP] Address post-commit style comment for r230904. NFC.
Thanks David !

llvm-svn: 230908
2015-03-01 21:22:50 +00:00
Arnaud A. de Grandmaison 21fa09890c [PBQP] Do not add an edge between nodes with totally disjoint allowed registers
Such edges are zero matrix, and they bring no additional info to the
allocation problem, apart from contributing to nodes' degree. Removing
those edges is expected to improve allocation time.

Tune the spill cost comparison, as this gives better average performances
now that the nodes' degrees has changed.

llvm-svn: 230904
2015-03-01 20:39:34 +00:00
Sanjay Patel b8c907e2a7 avoid infinite looping when folding vector multiplies of constants (PR22698)
We were missing a check for the following fold in DAGCombiner:

// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))

If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever.

This should fix:
http://llvm.org/bugs/show_bug.cgi?id=22698

Differential Revision: http://reviews.llvm.org/D7917

llvm-svn: 230884
2015-03-01 00:09:35 +00:00
Benjamin Kramer 49a1132976 DwarfAccelTable: We know how many hashes we have in the output, just reserve the precise number
llvm-svn: 230865
2015-02-28 20:15:00 +00:00
Benjamin Kramer 48ea372d90 StackColoring: Move set instead of copying. NFC.
llvm-svn: 230864
2015-02-28 20:14:38 +00:00
Benjamin Kramer 4c5dcb0a83 LiveRange: Replace a creative vector erase loop with std::remove_if.
I didn't see this so far because it scans backwards, but that doesn't
make it any less quadratic. NFC.

llvm-svn: 230863
2015-02-28 20:14:27 +00:00
Mehdi Amini 04f0f5ba61 Fixup for recent -fast-isel-abort change: code didn't match description
Level 1 should abort for all instructions but call/terminators/args.
Instead it was aborting only if the level was > 2

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230861
2015-02-28 19:34:54 +00:00
Benjamin Kramer 5fbfe2ffdc Convert push_back loops into append calls.
No functionality change intended.

llvm-svn: 230849
2015-02-28 13:20:15 +00:00
Benjamin Kramer f1362f6196 ArrayRefize memory operand folding. NFC.
llvm-svn: 230846
2015-02-28 12:04:00 +00:00
Benjamin Kramer 4f6ac16292 Replace std::copy with a back inserter with vector append where feasible
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.

llvm-svn: 230845
2015-02-28 10:11:12 +00:00
Benjamin Kramer 012b1514b9 MachineDominators: Move applySplitCriticalEdges into the cpp file.
It's too big for inlining anyways. Also clean it up slightly. No functionality
change intended.

llvm-svn: 230806
2015-02-27 23:13:13 +00:00
Benjamin Kramer 4e3b903a95 Reduce double set lookups.
llvm-svn: 230798
2015-02-27 21:43:14 +00:00
Eric Christopher 3b94e33277 Remove the Forward Control Flow Integrity pass and its dependencies.
This work is currently being rethought along different lines and
if this work is needed it can be resurrected out of svn. Remove it
for now as no current work in ongoing on it and it's unused. Verified
with the authors before removal.

llvm-svn: 230780
2015-02-27 19:03:38 +00:00
Mehdi Amini 945a660cbc Change the fast-isel-abort option from bool to int to enable "levels"
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.

This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.

Reviewers: resistor, echristo

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D7941

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
2015-02-27 18:32:11 +00:00
Rafael Espindola 629cdbae94 Centralize handling of the eh_begin and eh_end labels.
This removes a bit of duplicated code and more importantly, remembers the
labels so that they don't need to be looked up by name.

This in turn allows for any name to be used and avoids a crash if the name
we wanted was already taken.

llvm-svn: 230772
2015-02-27 18:18:39 +00:00
Sanjoy Das b818676f6d Don't modify the DenseMap being iterated over from within the loop
that is iterating over it

Inserting elements into a `DenseMap` invalidated iterators pointing
into the `DenseMap` instance.

Differential Revision: http://reviews.llvm.org/D7924

llvm-svn: 230719
2015-02-27 02:24:16 +00:00
Eric Christopher 1cdefae9c4 Rewrite MachineOperand::print and MachineInstr::print to avoid
uses of TM->getSubtargetImpl and propagate to all calls.

This could be a debugging regression in places where we had a
TargetMachine and/or MachineFunction but don't have it as part
of the MachineInstr. Fixing this would require passing a
MachineFunction/Function down through the print operator, but
none of the existing uses in tree seem to do this.

llvm-svn: 230710
2015-02-27 00:11:34 +00:00
Rafael Espindola 4491d0d337 Put jump tables in distinct sections if -ffunction-sections is used.
A small regression in r230411 was that we were basing the decision on
-fdata-sections.

llvm-svn: 230707
2015-02-26 23:55:11 +00:00
Eric Christopher b9f0009b5a Remove DebugLoc::print(LLVMContext, raw_ostream), it was just
forwarding to the one that didn't take a context.

llvm-svn: 230700
2015-02-26 23:32:17 +00:00
Eric Christopher 11e4df73c8 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Eric Christopher d75c00c638 Add a TargetMachine argument to the AddressingModeMatcher, we'll
need this shortly to get a TargetRegisterInfo from the subtarget
for TargetLowering routines.

llvm-svn: 230698
2015-02-26 22:38:34 +00:00
Rafael Espindola e8fd00dab0 Simplify arange output.
Move SectionMap to its only user (emitDebugARanges) and
reorder to save a call to sort.

llvm-svn: 230693
2015-02-26 22:02:02 +00:00
Paul Robinson 093d6e1a70 When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.

Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.

Differential Revision: http://reviews.llvm.org/D7181

llvm-svn: 230659
2015-02-26 18:47:57 +00:00
Eric Christopher 23a3a7c871 Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.

llvm-svn: 230583
2015-02-26 00:00:24 +00:00
Eric Christopher 75dbd7ca3e Move TargetLoweringBase::getTypeConversion to the .cpp file from
the .h file. It's used in only one place (other than recursively)
and there's no need to include it everywhere.

Saves almost 900k from total llvm object file size.

llvm-svn: 230561
2015-02-25 22:41:30 +00:00
Andrew Kaylor b59b80b956 Fixing a problem with insert location in WinEH outlining
llvm-svn: 230535
2015-02-25 20:12:49 +00:00
Rafael Espindola 8bc9ccc60a Support SHF_MERGE sections in COMDATs.
This patch unifies the comdat and non-comdat code paths. By doing this
it add missing features to the comdat side and removes the fixed
section assumptions from the non-comdat side.

In ELF there is no one true section for "4 byte mergeable" constants.
We are better off computing the required properties of the section
and asking the context for it.

llvm-svn: 230411
2015-02-25 00:52:15 +00:00
David Majnemer 841e0d60ed PrologEpilogInserter: Clean up math in calculateFrameObjectOffsets
There is no need to open-code the alignment calculation, we have a
handy RoundUpToAlignment function which "Does The Right Thing (TM)".

llvm-svn: 230392
2015-02-24 23:08:13 +00:00
Simon Pilgrim d8820ae70c Reapplied D7816 & rL230177 & rL230278 - with an additional fix toensure that the smallest build vector input scalar type is always used. Additional (crash) test cases already committed.
llvm-svn: 230388
2015-02-24 22:08:56 +00:00
Andrew Kaylor 1476e6d1bb Fixing eol-style
llvm-svn: 230378
2015-02-24 20:49:35 +00:00
Eric Christopher af48495130 Revert:
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Feb 23 23:04:28 2015 +0000

    Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.

and

Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Sun Feb 22 18:17:28 2015 +0000

    [DagCombiner] Generalized BuildVector Vector Concatenation

    The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

    This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

    This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

    Differential Revision: http://reviews.llvm.org/D7816

as the root cause of PR22678 which is causing an assertion inside the DAG combiner.

I'll follow up to the main thread as well.

llvm-svn: 230358
2015-02-24 19:11:00 +00:00
Eric Christopher fe59972bbc Rename UpdateRegAllocHint to match style guidelines.
llvm-svn: 230357
2015-02-24 19:10:57 +00:00
Matthias Braun 00a4076e94 DAGCombiner: Move variable definitions closer to use; NFC
llvm-svn: 230354
2015-02-24 18:52:01 +00:00
Matthias Braun a8558ca2ed DAGCombiner: Move variable declaration closer to definiion; NFC
llvm-svn: 230353
2015-02-24 18:51:59 +00:00
Tim Northover e95c5b3236 ARM: treat [N x i32] and [N x i64] as AAPCS composite types
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.

Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.

llvm-svn: 230348
2015-02-24 17:22:34 +00:00
Hal Finkel cec70130ac [SDAG] Handle LowerOperation returning its input consistently
For almost all node types, if the target requested custom lowering, and
LowerOperation returned its input, we'd treat the original node as legal. This
did not work, however, for many loads and stores, because they follow
slightly different code paths, and we did not account for the possibility of
LowerOperation returning its input at those call sites.

I think that we now handle this consistently everywhere. At the call sites in
LegalizeDAG, we used to assert in this case, so there's no functional change
for any existing code there. For the call sites in LegalizeVectorOps, this
really only affects whether or not we set Changed = true, but I think makes the
semantics clearer.

No test case here, but it will be covered by an upcoming PowerPC commit adding
QPX support.

llvm-svn: 230332
2015-02-24 12:59:47 +00:00
Simon Pilgrim 662c1d2770 Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
llvm-svn: 230278
2015-02-23 23:04:28 +00:00
Andrea Di Biagio af3f397b10 [X86] Teach how to custom lower double-to-half conversions under fast-math.
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.

Added test CodeGen/X86/fastmath-float-half-conversion.ll

Differential Revision: http://reviews.llvm.org/D7832

llvm-svn: 230276
2015-02-23 22:59:02 +00:00
Bruno Cardoso Lopes 24492b057e [AsmPrinter] Access pointers to globals via pcrel GOT entries
Front-ends could use global unnamed_addr to hold pointers to other
symbols, like @gotequivalent below:

@foo = global i32 42
@gotequivalent = private unnamed_addr constant i32* @foo

@delta = global i32 trunc (i64 sub (i64 ptrtoint (i32** @gotequivalent to i64),
                                    i64 ptrtoint (i32* @delta to i64))
                           to i32)

The global @delta holds a data "PC"-relative offset to @gotequivalent,
an unnamed pointer to @foo. The darwin/x86-64 assembly output for this follows:

 .globl  _foo
_foo:
 .long   42

 .globl  _gotequivalent
_gotequivalent:
 .quad   _foo

 .globl  _delta
_delta:
 .long   _gotequivalent-_delta

Since unnamed_addr indicates that the address is not significant, only
the content, we can optimize the case above by replacing pc-relative
accesses to "GOT equivalent" globals, by a PC relative access to the GOT
entry of the final symbol instead. Therefore, "delta" can contain a pc
relative relocation to foo's GOT entry and we avoid the emission of
"gotequivalent", yielding the assembly code below:

 .globl  _foo
_foo:
 .long   42

 .globl  _delta
_delta:
 .long   _foo@GOTPCREL+4

There are a couple of advantages of doing this: (1) Front-ends that need
to emit a great deal of data to store pointers to external symbols could
save space by not emitting such "got equivalent" globals and (2) IR
constructs combined with this opt opens a way to represent GOT pcrel
relocations by using the LLVM IR, which is something we previously had
no way to express.

Differential Revision: http://reviews.llvm.org/D6922

rdar://problem/18534217

llvm-svn: 230264
2015-02-23 21:26:18 +00:00
Andrew Kaylor 982ea13c79 Removing unused private field.
llvm-svn: 230259
2015-02-23 21:03:30 +00:00
Andrew Kaylor 322236eed6 Second attempt to fix WinEHCatchDirector build failures.
llvm-svn: 230257
2015-02-23 20:44:34 +00:00
Andrew Kaylor 2e30b459ec Attempting to fix WinEHCatchDirector destructor related build failures.
llvm-svn: 230252
2015-02-23 20:19:15 +00:00
Andrew Kaylor f22fe4ae18 Remap frame variables for native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7770

llvm-svn: 230249
2015-02-23 20:01:56 +00:00
Eric Christopher ed47b22951 Rewrite the global merge pass to be subprogram agnostic for now.
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.

As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.

llvm-svn: 230245
2015-02-23 19:28:45 +00:00
Simon Pilgrim 4e30d9b6d8 [DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

Differential Revision: http://reviews.llvm.org/D7816

llvm-svn: 230177
2015-02-22 18:17:28 +00:00
Hal Finkel e2dd84e42f [DAGCombine] Don't assume integer-type legailty in reduceBuildVecConvertToConvertBuildVec
DAGCombine will rewrite an BUILD_VECTOR where all non-undef inputs some from
[US]INT_TO_FP, as a BUILD_VECTOR of integers with the conversion applied as a
vector operation. We check operation legality of the conversion, but fail to
check legality of the integer vector type itself. Because targets don't
normally override operation legality defaults for illegal types, we need to
check this also.

This came up in the context of the QPX vector entensions for PowerPC (which can
have legal floating-point vector types without corresponding legal integer
vector types). No in-tree test case for this yes, but one can be added once
the QPX support has been committed.

llvm-svn: 230176
2015-02-22 16:10:22 +00:00
Hal Finkel f5b957060b [SDAG] Use correct alignments on expanded vector trunc-store/ext-loads
When expanding a truncating store or extending load using vector extracts or
inserts and scalar stores and loads, we were giving each of these scalar stores
or loads the same alignment as the original vector operation. While this will
often be right (most vector operations, especially those produced by
autovectorization, have the alignment of the underlying scalar type), the
vector operation could certainly have a larger alignment.

No test case (yet); noticed by inspection.

llvm-svn: 230175
2015-02-22 15:58:04 +00:00
Benjamin Kramer 60c5bbff29 MachineInstr: Use range-based for loops. NFC.
llvm-svn: 230142
2015-02-21 17:08:08 +00:00
Benjamin Kramer 5c0e64fcd6 Calling memmove on a MachineOperand is totally safe.
While it's not POD due to the user-defined constructor, it's still a trivially
copyable type. No functional change.

llvm-svn: 230141
2015-02-21 16:22:48 +00:00
Eric Christopher 3f05e1a19b Unconditionally create a new MCInstrInfo in the asm printer for
asm parsing since it's not subtarget dependent and we can't depend
upon the one hanging off the MachineFunction's subtarget still
being around.

llvm-svn: 230135
2015-02-21 09:09:15 +00:00
David Majnemer d5ab35f265 X86: Call __main using the SelectionDAG
Synthesizing a call directly using the MI layer would confuse the frame
lowering code.  This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.

llvm-svn: 230129
2015-02-21 05:49:45 +00:00
Matthias Braun 876e7172ee LiveRangeCalc: Don't start liveranges of PHI instruction at the block begin.
Summary:
Letting them begin at the PHI instruction slightly simplifies the code
but more importantly avoids breaking the assumption that live ranges
starting at the block begin are also live at the end of the predecessor
blocks. The MachineVerifier checks that but was apparently never run in
the few instances where liveranges are calculated for machine-SSA
functions.

Reviewers: qcolombet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7779

llvm-svn: 230093
2015-02-20 23:43:14 +00:00
Rafael Espindola 9075f77064 Use short names for jumptable sections.
Also refactor code to remove some duplication.

llvm-svn: 230087
2015-02-20 23:28:28 +00:00
Eric Christopher d4e723f2cf Used the cached subtarget off of the MachineFunction.
llvm-svn: 230078
2015-02-20 22:36:11 +00:00
Matt Arsenault 0dc54c4dee Add generic fmad DAG node.
This allows sharing of FMA forming combines to work
with instructions that have the same semantics as a separate
multiply and add.

This is expand by default, and only formed post legalization
so it shouldn't have much impact on targets that do not want it.

llvm-svn: 230070
2015-02-20 22:10:33 +00:00
Eric Christopher 9ecaa174d6 Grab the DataLayout off of the TargetMachine since that's where
it's stored.

llvm-svn: 230059
2015-02-20 20:56:39 +00:00
Eric Christopher f734a8bae7 Get the function specific subtarget.
llvm-svn: 230038
2015-02-20 18:44:17 +00:00
Eric Christopher 1df0c519fc Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 230037
2015-02-20 18:44:15 +00:00
Igor Laevsky 7fc58a4ad8 Generalize statepoint lowering to use ImmutableStatepoint. Move statepoint lowering into a separate function 'LowerStatepoint' which uses ImmutableStatepoint instead of a CallInst. Also related utility functions are changed to receive ImmutableCallSite.
Differential Revision: http://reviews.llvm.org/D7756 

llvm-svn: 230017
2015-02-20 15:28:35 +00:00
Nick Lewycky b73c041005 Fix build with gcc. This has a -Wsequence-point error on 'MII', which is a good point.
llvm-svn: 229979
2015-02-20 07:17:40 +00:00
Eric Christopher a7249ec1a7 Remove more uses of TargetMachine::getSubtargetImpl from the
AsmPrinter.

getSubtargetInfo now asserts that the MachineFunction exists.
Debug printing of register naming now uses the register info
from MCAsmInfo as that's unchanging.

llvm-svn: 229978
2015-02-20 07:16:19 +00:00
Eric Christopher 78a3f6cc4d AsmPrinter::doFinalization is at the module level and so doesn't
have access to a target specific subtarget info. Grab the module
level MCSubtargetInfo for the JumpInstrTable output stubs.

llvm-svn: 229974
2015-02-20 06:59:48 +00:00
Eric Christopher 97ea7622b5 Remove the MCInstrInfo cached variable as it was only used in a
single place and replace calls to getSubtargetImpl with calls
to get the subtarget from the MachineFunction where valid.

llvm-svn: 229971
2015-02-20 06:35:21 +00:00
Chandler Carruth 301ed0c3b4 Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.

llvm-svn: 229952
2015-02-20 02:15:36 +00:00
Reid Kleckner 0b647e6cca EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.

Fixes PR20300.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7744

llvm-svn: 229944
2015-02-20 01:00:19 +00:00
Eric Christopher cd37bf5483 This needs to be a const variable so the two sides of the ternary
operator agree on type.

llvm-svn: 229938
2015-02-20 00:03:45 +00:00
Eric Christopher 2105ae98f6 Only use the initialized MCInstrInfo if it's been initialized already
during SetupMachineFunction. This is also the single use of MII
and it'll be changing to TargetInstrInfo (which is MachineFunction
based) in the next commit here.

llvm-svn: 229931
2015-02-19 23:52:35 +00:00
Eric Christopher 7330264146 Migrate away a use of the subtarget (and TargetMachine) from
AsmPrinterDwarf since the information is on the MCRegisterInfo
via the MCContext and MMI that we already have on the AsmPrinter.

llvm-svn: 229928
2015-02-19 23:29:42 +00:00
Ahmed Bougacha 4c2b0781a5 [CodeGen] Use ArrayRef instead of std::vector&. NFC.
The former lets us use SmallVectors.  Do so in ARM and AArch64.

llvm-svn: 229925
2015-02-19 23:13:10 +00:00
Eric Christopher cbdbf39881 MCTargetOptions reside on the TargetMachine that we always have via
TargetOptions.

llvm-svn: 229917
2015-02-19 21:29:51 +00:00
Eric Christopher 457864178f Remove a call to TargetMachine::getSubtarget from the inline
asm support in the asm printer. If we can get a subtarget from
the machine function then we should do so, otherwise we can
go ahead and create a default one since we're at the module
level.

llvm-svn: 229916
2015-02-19 21:24:23 +00:00
Eric Christopher 64d35be6d6 Remove unused argument from emitInlineAsmStart.
llvm-svn: 229907
2015-02-19 19:52:25 +00:00
Eric Christopher 504f388a84 Update and remove a few calls to TargetMachine::getSubtargetImpl
out of the asm printer.

llvm-svn: 229883
2015-02-19 18:46:23 +00:00
Benjamin Kramer ea68a944a1 Demote vectors to arrays. No functionality change.
llvm-svn: 229861
2015-02-19 15:26:17 +00:00
Chandler Carruth b89464a9b6 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Reid Kleckner 7bb0738d82 Add an IR-to-IR test for dwarf EH preparation using opt
This tests the simple resume instruction elimination logic that we have
before making some changes to it.

llvm-svn: 229768
2015-02-18 23:17:41 +00:00
Andrew Kaylor 179543bb9b Style and formatting fixes for r229715
llvm-svn: 229758
2015-02-18 22:52:18 +00:00
Reid Kleckner 4dd0304e34 dos2unix the WinEH file and tests
llvm-svn: 229735
2015-02-18 19:52:46 +00:00
David Blaikie 30f2f3fc98 Remove unused member variables (-Wunused-private-field)
llvm-svn: 229722
2015-02-18 18:52:49 +00:00
Andrew Kaylor 527c5dc68d Adding implementation to outline C++ catch handlers for native Windows 64 exception handling.
Differential Revision: http://reviews.llvm.org/D7363

llvm-svn: 229715
2015-02-18 18:31:51 +00:00
Michael Kuperstein af9befa6b7 Fixes two issue in SimplifyDemandedBits of sext_in_reg:
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.

Patch by Gil Rapaport <gil.rapaport@intel.com>

Differential Revision: http://reviews.llvm.org/D6949

llvm-svn: 229659
2015-02-18 09:43:40 +00:00
Daniel Jasper ed9eb7209e NFC: Use range-based for loops and more consistent naming.
No functional changes intended.

(I plan on doing some modifications to this function and would like to
have as few unrelated changes as possible in the patch)

llvm-svn: 229649
2015-02-18 08:19:16 +00:00
Daniel Jasper 4d7b04384e Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

llvm-svn: 229648
2015-02-18 08:18:07 +00:00
Matthias Braun 11042c8523 LiveRangeCalc: Rename some parameters from kill to use, NFC.
Those parameters did not necessarily describe kill points but just uses.

llvm-svn: 229601
2015-02-18 01:50:52 +00:00
Rafael Espindola 2d7ec9a860 Twines should be passed by const ref.
llvm-svn: 229590
2015-02-17 23:44:22 +00:00
Rafael Espindola df19519800 Add r228939 back with a fix.
The problem in the original patch was not switching back to .text after printing
an eh table.

Original message:

On ELF, put PIC jump tables in a non executable section.

Fixes PR22558.

llvm-svn: 229586
2015-02-17 23:34:51 +00:00
Duncan P. N. Exon Smith 57bab0bc97 AsmPrinter: Take range in DwarfExpression::AddExpression(), NFC
Previously `DwarfExpression::AddExpression()` relied on
default-constructing the end iterators for `DIExpression` -- once the
operands are represented explicitly via `MDExpression` (instead of via
the strange `StringRef` navigator in `DIHeaderIterator`) this won't
work.  Explicitly take an iterator for the end of the range.

llvm-svn: 229572
2015-02-17 22:30:56 +00:00
Rafael Espindola 68fa249cb5 Add r228980 back.
Add support for having multiple sections with the same name and comdat.

Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.

llvm-svn: 229541
2015-02-17 20:48:01 +00:00
Eric Christopher a49d68e078 Make the ARM AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate Emit{Start|End}OfAsmFile to either use attributes on the
TargetMachine or get information from the subtarget we'd use
for assembling. One bit (getISAEncoding) touched the general
AsmPrinter and the debug output. Handle this one by passing
the function for the subprogram down and updating all callers
and users.

The top-level-ness of the ARM attribute output for assembly is,
by nature, contrary to how we'd want to do this for an LTO
situation where we have multiple cpu architectures so this
solution is good enough for now.

llvm-svn: 229528
2015-02-17 20:02:32 +00:00
Eric Christopher ffc5ff32d1 80-column fixups.
llvm-svn: 229527
2015-02-17 20:02:28 +00:00
Sanjay Patel ab7e86e5be Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389

llvm-svn: 229511
2015-02-17 16:54:32 +00:00
Benjamin Kramer 6cd780ff21 Prefer SmallVector::append/insert over push_back loops.
Same functionality, but hoists the vector growth out of the loop.

llvm-svn: 229500
2015-02-17 15:29:18 +00:00
Duncan P. N. Exon Smith 752d6df22d AsmPrinter: Use DIExpression default constructor, NFC
llvm-svn: 229464
2015-02-17 02:42:45 +00:00
Duncan P. N. Exon Smith b474937929 AsmPrinter: Stop creating DebugLocs
While looking at a heap profile of a clang LTO bootstrap with -g, I
noticed that 2.2% of memory in an `llvm-lto` of clang is from calling
`DebugLoc::get()` in `collectVariableInfo()` (accounting for ~40% of
memory used for `MDLocation`s).

I suspect this was introduced by r226736, whose goal was to prevent
uniquing of `DebugLoc`s (goal achieved, if so).

There's no reason we need a `DebugLoc` here at all -- it was just being
used for (in)convenient API -- so the fix is to pass the scope and
inlined-at directly to `LexicalScopes::findInlinedScope()`.

llvm-svn: 229459
2015-02-17 00:02:27 +00:00
Matthias Braun 15635c5f85 RegisterCoalescer: Don't rematerialize subregister definitions.
We cannot simply rematerialize instructions which only defining a
subregister, as the final value also depends on the previous
instructions.

This fixes test/CodeGen/R600/subreg-coalescer-bug.ll with subreg
liveness enabled.

llvm-svn: 229444
2015-02-16 22:05:17 +00:00
Matthias Braun 1b901a4435 RegisterCoalescer: Do not look for regclass of IMPLICIT_DEF.
IMPLICIT_DEF is a generic instruction and has no (fixed) output register
class defined. The rematerialization code of the register coalescer
should not scan the instruction description for a register class.

This fixes a problem showing up in
test/CodeGen/R600/subreg-coalescer-crash.ll with subregister liveness
enabled.

llvm-svn: 229443
2015-02-16 22:05:12 +00:00
Mehdi Amini 3e0023b8f6 SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229438
2015-02-16 21:47:58 +00:00
Matthias Braun 2eab0e30e1 RegisterCoalescer: Improve previous fix for wrong def after.
The previous fix in r225503 was needlessly complicated. The problem goes
away as well if the arguments to MergeValueNumberInto are supplied in the
correct order.
This was previously missed because the existing code already had the
wrong order but an additional later Merge was hiding the bug for the
main liverange VNI.

llvm-svn: 229424
2015-02-16 19:34:27 +00:00
Aaron Ballman f17583ee9c MSVC 2013 supports std::forward_as_tuple, while MSVC 2012 did not; so we can move to using the improved API.
llvm-svn: 229414
2015-02-16 18:21:19 +00:00
Andrew Trick 05938a5481 AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

llvm-svn: 229413
2015-02-16 18:10:47 +00:00
Michael Kuperstein fc3e626752 Fix quoting of #pragma comment for MS compat, LLVM part.
For #pragma comment(linker, ...) MSVC expects the comment string to be quoted, but for #pragma comment(lib, ...) the compiler itself quotes the library name.
Since this distinction disappears by the time the directive reaches the backend, move quoting for the "lib" version to the frontend.

Differential Revision: http://reviews.llvm.org/D7652

llvm-svn: 229375
2015-02-16 11:57:17 +00:00
Aaron Ballman f9a1897c72 Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition.
llvm-svn: 229340
2015-02-15 22:54:22 +00:00
Chandler Carruth 1b5285dd57 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

llvm-svn: 229308
2015-02-15 12:18:12 +00:00
Chandler Carruth 499d7332c5 [x86] Fix PR22377, a regression with the new vector shuffle legality
test.

This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.

llvm-svn: 229285
2015-02-15 07:01:10 +00:00
Duncan P. N. Exon Smith 70eb9c5ae5 CodeGen: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

Also, add `Function::getFnStackAlignment()`, and canonicalize:

getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()

llvm-svn: 229208
2015-02-14 01:44:41 +00:00
Matthias Braun 33cc10724d Revert "On ELF, put PIC jump tables in a non executable section."
This reverts commit r228939.

The commit broke something in the output of exception handling tables on
darwin x86-64.

llvm-svn: 229203
2015-02-14 01:16:54 +00:00
Reid Kleckner 2d5fb68ee0 Unify the two EH personality classification routines I wrote
We only need one.

llvm-svn: 229193
2015-02-14 00:21:02 +00:00
Andrea Di Biagio b14ae8692d [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.

This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.

Differential Revision: http://reviews.llvm.org/D7608

llvm-svn: 229105
2015-02-13 14:15:48 +00:00
Arnaud A. de Grandmaison a7c90d8487 [PBQP] Conservativelly allocatable nodes can be spilled and give a better solution
Although such nodes are allocatable, the cost of spilling may be less than
allocating to register, so spilling the node may provide a better solution.
The assert does not account for this case, so remove it for now.

llvm-svn: 229103
2015-02-13 12:04:42 +00:00
Chandler Carruth 30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Chandler Carruth 71f308adb7 Re-sort #include lines using my handy dandy ./utils/sort_includes.py
script. This is in preparation for changes to lots of include lines.

llvm-svn: 229088
2015-02-13 09:09:03 +00:00
Chandler Carruth d99f427e31 Revert a series of commits starting at r228886 which is triggering some
regressions for LLDB on Linux. Rafael indicated on lldb-dev that we
should just go ahead and revert these but that he wasn't at a computer.
The patches backed out are as follows:

r228980: Add support for having multiple sections with the name and ...
r228889: Invert the section relocation map.
r228888: Use the existing SymbolTableIndex intsead of doing a lookup.
r228886: Create the Section -> Rel Section map when it is first needed.

These patches look pretty nice to me, so hoping its not too hard to get
them re-instated. =D

llvm-svn: 229080
2015-02-13 07:52:39 +00:00
Rafael Espindola b6a812ebb1 Add support for having multiple sections with the same name and comdat.
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.

llvm-svn: 228980
2015-02-12 23:29:51 +00:00
Hal Finkel 271e9f2870 [SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotions
The PowerPC backend has long promoted some floating-point vector operations
(such as select) to integer vector operations. Unfortunately, this behavior was
broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check
that both the old and new types are floating-point types. Otherwise, we must
use BITCAST as we did prior to r216555 for everything.

llvm-svn: 228969
2015-02-12 22:43:52 +00:00
Rafael Espindola 203c5b9f39 On ELF, put PIC jump tables in a non executable section.
Fixes PR22558.

llvm-svn: 228939
2015-02-12 17:46:49 +00:00
Rafael Espindola 29786d4c16 Put each jump table in an independent section if the function is too.
This allows the linker to GC both, fixing pr22557.

llvm-svn: 228937
2015-02-12 17:16:46 +00:00
Benjamin Kramer 5f6a907288 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

llvm-svn: 228930
2015-02-12 15:35:40 +00:00
Ahmed Bougacha 24433a7005 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571

llvm-svn: 228911
2015-02-12 06:15:29 +00:00
Jonas Paulsson bf8d0cc699 Fix SelectionDAG compile time issue with alias analysis.
Add new token factor node and its users to worklist if alias analysis is
turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause
a lot of new token factors to be inserted into the DAG, and they need to
be optimized to avoid significant slow-downs.

Reviewed by Hal Finkel.

llvm-svn: 228841
2015-02-11 16:10:31 +00:00
Rafael Espindola 25d2c20c0c Don't repeat name in comment and clang-format a function.
llvm-svn: 228831
2015-02-11 14:44:17 +00:00
Arnaud A. de Grandmaison de79026d5e [PBQP] Cautiously update edge costs in the solver
The NodeMetadata are maintained in an incremental way. When an edge between
2 nodes has its cost updated, in the course of graph reduction for example,
the NodeMetadata need first to have the old edge cost removed, then the new
edge cost added. Only once the NodeMetadata have been fully updated, it
becomes safe to consider promoting the nodes to the
ConservativelyAllocatable or OptimallyReducible sets. Previously, this
promotion was occuring right after the removing the old cost, and this was
breaking the assumption that a ConservativelyAllocatable should not be
spilled.

This patch also adds asserts to:
 - enforces the invariant that a node's reduction can not be downgraded,
 - only not provably allocatable or optimally reducible nodes can be spilled.

llvm-svn: 228816
2015-02-11 08:25:36 +00:00
Zachary Turner 3bd47cee78 Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects.
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.

Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman

llvm-svn: 228798
2015-02-11 03:28:02 +00:00
Reid Kleckner 96d011315a Don't promote asynch EH invokes of nounwind functions to calls
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.

Also add some landingpads to invalid LLVM IR test cases that lack them.

Over-the-shoulder reviewed by David Majnemer.

llvm-svn: 228782
2015-02-11 01:23:16 +00:00
Petar Jovanovic d9f52043b1 Fix makeLibCall argument (signed) in SoftenFloatRes_XINT_TO_FP function
The isSigned argument of makeLibCall function was hard-coded to false
(unsigned). This caused zero extension on MIPS64 soft float.
As the result SingleSource/Benchmarks/Stanford/FloatMM test and
SingleSource/UnitTests/2005-07-17-INT-To-FP test failed. 
The solution was to use the proper argument.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D7292

llvm-svn: 228765
2015-02-10 23:30:14 +00:00
Adrian Prantl ca7e470221 Debug Info: Support variables that are described by more than one MMI
table entry. This happens when SROA splits up an alloca and the resulting
allocas cannot be lowered to SSA values because their address is passed
to a function.

Fixes PR22502.

llvm-svn: 228764
2015-02-10 23:18:28 +00:00
Adrian Prantl d49691f779 Fix indentation.
llvm-svn: 228763
2015-02-10 23:18:15 +00:00
Andrew Kaylor 78b53dbcc1 Adding support for llvm.eh.begincatch and llvm.eh.endcatch intrinsics and beginning the documentation of native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7398

llvm-svn: 228733
2015-02-10 19:52:43 +00:00
Jonas Paulsson a25a3f4fea Two comment typo fixes in lib/CodeGen/SelectionDAG/DAGCombiner.cpp.
llvm-svn: 228700
2015-02-10 15:34:29 +00:00
Jonas Paulsson afa6813816 Bugfix for missed dependency from store to load in buildSchedGraph().
Background: When handling underlying objects for a store, the vector
of previous mem uses, mapped to the same Value, is afterwards cleared
(regardless of ThisMayAlias). This means that during handling of the
next store using the same Value, adjustChainDeps() must be called,
otherwise a dependency might be missed.

For example, three spill/reload (NonAliasing) memory accesses using
the same Value 'a', with different offsets:

    SU(2): store  @a
    SU(1): store  @a, Offset:1
    SU(0): load   @a

In this case we have:

* SU(1) does not need a dep against SU(0). Therefore,SU(0) ends up in
  RejectMemNodes and is removed from the mem-uses list (AliasMemUses
  or NonAliasMemUses), as this list is cleared.

* SU(2) needs a dep against SU(0). Therefore, SU(2) must check
  RejectMemNodes by calling adjustChainDeps().

Previously, for store SUs, adjustChainDeps() was only called if
MayAlias was true, missing the S(2) to S(0) dependency in the case
above. The fix is to always call adjustChainDeps(), regardless of
MayAlias, since this applies both for AliasMemUses and
NonAliasMemUses.

No testcase found for any in-tree target.

llvm-svn: 228686
2015-02-10 13:03:32 +00:00
Chandler Carruth b65d61a2e8 [x86] Fix PR22524: the DAG combiner was incorrectly handling illegal
nodes when folding bitcasts of constants.

We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.

This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.

(Also as a note, this does not appear to be in 3.6, no backport needed)

llvm-svn: 228656
2015-02-10 02:25:56 +00:00
Adrian Prantl 27bd01f71c Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.

Part of PR22495.

llvm-svn: 228631
2015-02-09 23:57:15 +00:00
Duncan P. N. Exon Smith b407bb2789 DebugInfo: Remove DW_TAG_constant
Remove handling for DW_TAG_constant.  We started producing it in
r110656, but reverted that in r110876 without dropping the support.
Finish the job.

llvm-svn: 228623
2015-02-09 22:48:04 +00:00
Benjamin Kramer a9591b5880 Move DebugLocs around instead of copying.
llvm-svn: 228491
2015-02-07 12:28:15 +00:00
Bruce Mitchener 7e575ed1ea Add more DWARF 5 language constants.
Differential Revision: http://reviews.llvm.org/D7430

llvm-svn: 228487
2015-02-07 06:35:30 +00:00
Quentin Colombet a8cb36ea43 [LiveIntervalAnalysis] Speed up creation of live ranges for physical registers
by using a segment set.

The patch addresses a compile-time performance regression in the LiveIntervals
analysis pass (see http://llvm.org/bugs/show_bug.cgi?id=18580). This regression
is especially critical when compiling long functions. Our analysis had shown
that the most of time is taken for generation of live intervals for physical
registers. Insertions in the middle of the array of live ranges cause quadratic
algorithmic complexity, which is apparently the main reason for the slow-down. 

Overview of changes:
- The patch introduces an additional std::set<Segment>* member in LiveRange for
  storing segments in the phase of initial creation. The set is used if this
  member is not NULL, otherwise everything works the old way. 
- The set of operations on LiveRange used during initial creation (i.e. used by
  createDeadDefs and extendToUses) have been reimplemented to use the segment
  set if it is available.
- After a live range is created the contents of the set are flushed to the
  segment vector, because the set is not as efficient as the vector for the
  later uses of the live range. After the flushing, the set is deleted and
  cannot be used again.
- The set is only for live ranges computed in
  LiveIntervalAnalysis::computeLiveInRegUnits() and getRegUnit() but not in
  computeVirtRegs(), because I did not bring any performance benefits to
  computeVirtRegs() and for some examples even brought a slow down.

Patch by Vaidas Gasiunas <vaidas.gasiunas@sap.com>

Differential Revision: http://reviews.llvm.org/D6013

llvm-svn: 228421
2015-02-06 18:42:41 +00:00
Matthias Braun 7dc96dea0a LiveInterval: Fix SubRange memory leak.
llvm-svn: 228405
2015-02-06 17:28:47 +00:00
Arnaud A. de Grandmaison 77f62d45ef [PBQP] Fix comment wording. NFC
llvm-svn: 228390
2015-02-06 11:28:16 +00:00
Daniel Jasper 4bb224decb Small cleanup of MachineLICM.cpp
Specifically:
- Calculate the loop pre-header once at the stat of HoistOutOfLoop, so:
  - We don't-DFS walk the MachineDomTree if we aren't going to do anything
  - Don't call getCurPreheader for each Scope
- Don't needlessly use a do-while loop
- Use early exit for Scopes.size() == 0

No functional changes intended.

llvm-svn: 228350
2015-02-05 22:39:46 +00:00
Ahmed Bougacha e892d13d90 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904

llvm-svn: 228325
2015-02-05 18:31:02 +00:00
Rafael Espindola f8d662aa4b Add a FIXME.
Thanks to Eric for the suggestion.

llvm-svn: 228300
2015-02-05 14:57:47 +00:00
Rafael Espindola a092f17580 Don' try to make sections in comdats SHF_MERGE.
Parts of llvm were not expecting it and we wouldn't print
the entity size of the section.

Given what comdats are used for, having SHF_MERGE sections would be
just a small improvement, so just disable it for now.

Fixes pr22463.

llvm-svn: 228196
2015-02-04 21:27:24 +00:00
Matthias Braun 26e7ea6267 MachineCSE: Clear dead-def flag on CSE.
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.

This fixes PR22439 and the corresponding rdar://19694987

Differential Revision: http://reviews.llvm.org/D7395

llvm-svn: 228178
2015-02-04 19:35:16 +00:00
Michael Kuperstein cd63c5fa73 Fixes a bug in vector load legalization that confused bits and bytes.
Differential Revision: http://reviews.llvm.org/D7400

llvm-svn: 228168
2015-02-04 18:54:01 +00:00
Owen Anderson 21b1788ad0 Remove a gross usage of environment variables in MachineVerifier, replacing it with support for setting the -verify-machineinstrs flag via an environment variable in LIT.
This preserves the handy functionality of force-enabling the MachineVerifier, without the need to embed usage of environment variables in LLVM client applications.

llvm-svn: 228079
2015-02-04 00:02:59 +00:00
Arnaud A. de Grandmaison 10797c5707 [PBQP] Provide more information in the debug prints
Based on a patch by Jonas Paulsson

llvm-svn: 228068
2015-02-03 23:40:24 +00:00
Eric Christopher 36fe028a2a Only access TLOF via the TargetMachine, not TargetLowering.
llvm-svn: 227949
2015-02-03 07:22:52 +00:00
Lang Hames d48bf3f912 [PBQP Regalloc] Pre-spill vregs that have no legal physregs.
The PBQP::RegAlloc::MatrixMetadata class assumes that matrices have at least two
rows/columns (for the spill option plus at least one physreg). This patch
ensures that that invariant is met by pre-spilling vregs that have no physreg
options so that no node (and no corresponding edges) need be added to the PBQP
graph.

This fixes a bug in an out-of-tree target that was identified by Jonas Paulsson.
Thanks for tracking this down Jonas!

llvm-svn: 227942
2015-02-03 06:14:06 +00:00
Rafael Espindola 6ffb1d7e3c Move simple case earlier and use a continue.
llvm-svn: 227841
2015-02-02 19:22:51 +00:00
Adrian Prantl 9013c4b3b1 Debug Info: Relax assertion in isUnsignedDIType() to allow floats to be
described by integer constants. This is a bit ugly, but if the source
language allows arbitrary type casting, the debug info must follow suit.

For example:
  void foo() {
    float a;
    *(int *)&a = 0;
  }
For the curious: SROA replaces the float alloca with an i32 alloca, which
is then optimized away and described via dbg.value(i32 0, ...).

llvm-svn: 227827
2015-02-02 18:31:58 +00:00
Michael Kuperstein 13fbd45263 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

(Re-commit of r227728)

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227752
2015-02-01 16:56:04 +00:00
Michael Kuperstein e86aa9a8a4 Revert r227728 due to bad line endings.
llvm-svn: 227746
2015-02-01 16:15:07 +00:00
Chandler Carruth c956ab6603 [multiversion] Switch the TTI queries from TargetMachine to Subtarget
now that we have a correct and cached subtarget specific to the
function.

Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.

The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.

This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.

llvm-svn: 227738
2015-02-01 14:22:17 +00:00
Chandler Carruth c340ca839c [multiversion] Remove the cached TargetMachine pointer from the
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.

Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.

This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.

llvm-svn: 227737
2015-02-01 14:01:15 +00:00
Chandler Carruth 8b04c0d26a [multiversion] Switch all of the targets over to use the
TargetIRAnalysis access path directly rather than implementing getTTI.

This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.

As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.

llvm-svn: 227735
2015-02-01 13:20:00 +00:00
Chandler Carruth 5ec2b1d11a [multiversion] Implement the old pass manager's TTI wrapper pass in
terms of the new pass manager's TargetIRAnalysis.

Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.

The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.

One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.

llvm-svn: 227731
2015-02-01 12:26:09 +00:00
Chandler Carruth fdb9c573f7 [multiversion] Thread a function argument through all the callers of the
getTTI method used to get an actual TTI object.

No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.

The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.

llvm-svn: 227730
2015-02-01 12:01:35 +00:00
Michael Kuperstein bd57186c76 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227728
2015-02-01 11:44:44 +00:00
Chandler Carruth 93dcdc47db [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

llvm-svn: 227685
2015-01-31 11:17:59 +00:00
Chandler Carruth 705b185f90 [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

llvm-svn: 227669
2015-01-31 03:43:40 +00:00
Alexey Samsonov 964f4dbaae Fix memory leak in WinEHPrepare introduced in r227405.
This leak was detected by ASan bootstrap of LLVM.

llvm-svn: 227625
2015-01-30 22:07:05 +00:00
Reid Kleckner 2a2235087f Update comments to use unreachable instead of llvm.trap, as implemented now
llvm-svn: 227502
2015-01-29 22:32:26 +00:00
Rafael Espindola ba31e27f0a Compute the ELF SectionKind from the flags.
Any code creating an MCSectionELF knows ELF and already provides the flags.

SectionKind is an abstraction used by common code that uses a plain
MCSection.

Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.

llvm-svn: 227476
2015-01-29 17:33:21 +00:00
Rafael Espindola 093dcc43f7 Use isMergeableConst now that it is sane.
llvm-svn: 227441
2015-01-29 14:23:28 +00:00
Rafael Espindola 33804cac96 Remove MergeableConst.
Only the specific ones (MergeableConst4, MergeableConst8, MergeableConst16) are
handled specially.

llvm-svn: 227440
2015-01-29 14:12:41 +00:00
Benjamin Kramer eb63e4d6f8 EHPrepare: Remove leftover initialization code for DomTrees.
While there modernize some loops. NFC.

llvm-svn: 227436
2015-01-29 13:26:50 +00:00
Rafael Espindola e2d4b2df39 Use enum values. NFC.
llvm-svn: 227435
2015-01-29 13:25:44 +00:00
Rafael Espindola fad3901095 Don't create multiple mergeable sections with -fdata-sections.
ELF has support for sections that can be split into fixed size or
null terminated entities.

Since these sections can be split by the linker, it is not necessary
to split them in codegen.

This reduces the combined .o size in a llvm+clang build from
202,394,570 to 173,819,098 bytes.

The time for linking clang with gold (on a VM, on a laptop) goes
from 2.250089985 to 1.383001792 seconds.

The flip side is the size of rodata in clang goes from 10,926,785
to 10,929,345 bytes.

The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902.

llvm-svn: 227431
2015-01-29 12:43:28 +00:00
Chandler Carruth a1a622746e Remove an unused private field added r227405 to fix a Clang warning.
llvm-svn: 227415
2015-01-29 02:44:53 +00:00
Reid Kleckner 8fcb0ad8a6 Remove unused variable
llvm-svn: 227408
2015-01-29 00:55:44 +00:00
Reid Kleckner 1185fced3d Add a Windows EH preparation pass that zaps resumes
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.

Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors.  Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception.  Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7216

llvm-svn: 227405
2015-01-29 00:41:44 +00:00
Manuel Jacob 6f508c578b Add nullptr checks for TargetSelectionDAGInfo in SelectionDAG.
TSI is not guaranteed be non-null in SelectionDAG.

llvm-svn: 227397
2015-01-28 23:50:40 +00:00
Philip Reames 23cf2e2f97 Remove gc.root's performCustomLowering
This is a refactoring to restructure the single user of performCustomLowering as a specific lowering pass and remove the custom lowering hook entirely.

Before this change, the LowerIntrinsics pass (note to self: rename!) was essentially acting as a pass manager, but without being structured in terms of passes. Instead, it proxied calls to a set of GCStrategies internally. This adds a lot of conceptual complexity (i.e. GCStrategies are stateful!) for very little benefit. Since there's been interest in keeping the ShadowStackGC working, I extracting it's custom lowering pass into a dedicated pass and just added that to the pass order. It will only run for functions which opt-in to that gc.

I wasn't able to find an easy way to preserve the runtime registration of custom lowering functionality. Given that no user of this exists that I'm aware of, I made the choice to just remove that. If someone really cares, we can look at restoring it via dynamic pass registration in the future.

Note that despite the large diff, none of the lowering code actual changes. I added the framing needed to make it a pass and rename the class, but that's it.

Differential Revision: http://reviews.llvm.org/D7218

llvm-svn: 227351
2015-01-28 19:28:03 +00:00
Rafael Espindola a05b3b73a4 Simplify code. NFC.
llvm-svn: 227333
2015-01-28 17:54:19 +00:00
Hal Finkel 34c94d5caa Correct the AggressiveAntiDepBreaker's handling of subregisters defining super registers
As the AggressiveAntiDepBreaker iterated backward through a scheduling region,
we must leave super registers live through subregister definitions so that all
relevant subregister definitions are renamed together. The problem was that we
were also discarding sub-register use locations as the sub-registers are
redefined. The result is that we'd rename the super register along with some,
but not all, subregister definitions.

  R0_D = {R0_L, R1_L}
  R0_L = {R0_S, R1_S}

  %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
  %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
  %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
  %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S

  Anti:   %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S
   Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L)
   Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use)
  Anti:   %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D)
   Antidep reg: R0_L (real dependency)
   Use Groups: R0_L=g210->g224(last-use) R0_S->g225(last-use) R1_S->g226(last-use)
  Anti:   %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R1_L=g219->g210(via R0_D)
   Antidep reg: R1_L (real dependency)
   Use Groups: R1_L=g210->g229(last-use)
  Anti:   %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
   Def Groups: R0_L=g224->g225(via R0_S)->g210(via R0_D)->g226(via R1_S)->g226(via R0_D)
   Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g226->g230(last-use) R0_L=g226->g231(last-use) R1_S->g232(last-use)
  Anti:   %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
   Def Groups: R1_L=g229->g226(via R0_D)
   Antidep reg: R1_L Use Groups: R1_L=g226->g233(last-use) R0_S=g230
  Anti:   %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
   Def Groups: R0_L=g231->g230(via R0_S)->g226(via R0_D)->g232(via R1_S)->g232(via R0_D)
   Antidep reg: R0_L
   Rename Candidates for Group g232:
    R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D
    R0_L: elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S
   Find Registers: [R12_D: R12_D R12_L R12_S]
   Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(2 refs) R0_S->R12_S(2 refs)
   Use Groups:
  ...

  %R12_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R1_L<def> = LSRLrr %R1_L<kill>, %R12_S, pred:8, pred:%noreg
  %R0_L<def> = LSRLrr %R2_L<kill>, %R12_S, pred:8, pred:%noreg, %R12_L<imp-use>
  %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
  %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R12_D<kill>, %R6_S

With this change, we now produce:

  Anti:   %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S
   Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L)
   Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use)
  Anti:   %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D)
   Antidep reg: R0_L (real dependency)
   Use Groups: R0_L=g210
  Anti:   %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R1_L=g219->g210(via R0_D)
   Antidep reg: R1_L (real dependency)
   Use Groups: R1_L=g210
  Anti:   %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
   Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D)
   Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g210 R0_L=g210
  Anti:   %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
   Def Groups: R1_L=g210->g210(via R0_D)
   Antidep reg: R1_L Use Groups: R1_L=g210 R0_S=g210
  Anti:   %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
   Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D)
   Antidep reg: R0_L
   Rename Candidates for Group g210:
    R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D
    R0_L: elcIntRegs elcIntAIRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R1_L: elcIntRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S
   Find Registers: [R12_D: R12_D R12_L R13_L R12_S]
   Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(7 refs) R1_L->R13_L(5 refs) R0_S->R12_S(2 refs)
   Use Groups:
  ...

  %R12_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R13_L<def> = LSRLrr %R13_L<kill>, %R12_S, pred:8, pred:%noreg
  %R12_L<def> = LSRLrr %R2_L<kill>, %R12_S<kill>, pred:8, pred:%noreg, %R12_L<imp-use,kill>
  %R13_L<def> = ANDLri %R13_L<kill>, 2047, pred:8, pred:%noreg
  %R12_L<def> = ANDLri %R12_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R12_D, %R6_S, %R12_L<imp-def>, %R12_S<imp-def>, %R13_S<imp-def>

As demonstrated by this example, this is also somewhat unfortunate, because
there is actually no need to rename the super register in this case (it is
fully covered by later subregister definitions), but we don't seem to track
enough information here to exploit that either.

Thanks to Daniil Troshkov for reporting the issue. The debug outputs in this
commit message are from Daniil.

llvm-svn: 227311
2015-01-28 14:44:14 +00:00
Chandler Carruth b81dfa6378 [LPM] Stop using the string based preservation API. It is an
abomination.

For starters, this API is incredibly slow. In order to lookup the name
of a pass it must take a memory fence to acquire a pointer to the
managed static pass registry, and then potentially acquire locks while
it consults this registry for information about what passes exist by
that name. This stops the world of LLVMs in your process no matter
how little they cared about the result.

To make this more joyful, you'll note that we are preserving many passes
which *do not exist* any more, or are not even analyses which one might
wish to have be preserved. This means we do all the work only to say
"nope" with no error to the user.

String-based APIs are a *bad idea*. String-based APIs that cannot
produce any meaningful error are an even worse idea. =/

I have a patch that simply removes this API completely, but I'm hesitant
to commit it as I don't really want to perniciously break out-of-tree
users of the old pass manager. I'd rather they just have to migrate to
the new one at some point. If others disagree and would like me to kill
it with fire, just say the word. =]

llvm-svn: 227294
2015-01-28 04:57:56 +00:00
David Blaikie e245228903 Add description to assert
llvm-svn: 227291
2015-01-28 02:43:15 +00:00
David Blaikie fa1a3c7cf5 PR22356: DebugInfo: Handle the size of a member where the type of that member is a typedef (or other sugar) of a declaration.
llvm-svn: 227290
2015-01-28 02:34:53 +00:00
Quentin Colombet 308b171318 Revert r227242 - Merge vector stores into wider vector stores (PR21711).
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).

llvm-svn: 227272
2015-01-27 23:58:01 +00:00
Sanjay Patel b1ca4e48d4 remove function names from comments; NFC
llvm-svn: 227256
2015-01-27 22:26:56 +00:00
Sanjay Patel 6b280777b7 fix typos; NFC
llvm-svn: 227253
2015-01-27 22:16:52 +00:00
Sanjay Patel bcf62f2fa2 Merge vector stores into wider vector stores (PR21711)
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector. 

Instead of producing this:

vmovaps %xmm0, (%rdi)
vextractf128    $1, %ymm0, 16(%rdi)

This patch merges the 128-bit stores into a single 256-bit store:

vmovups %ymm0, (%rdi)

Differential Revision: http://reviews.llvm.org/D7208

llvm-svn: 227242
2015-01-27 20:50:27 +00:00
Manuel Jacob 8220addad7 Add a FIXME in SelectionDAGBuilder before an assert that is valid only on X86.
When lowering memcpy, memset or memmove, this assert checks whether the pointer
operands are in an address space < 256 which means "user defined address space"
on X86.  However, this notion of "user defined address space" does not exist
for other targets.

llvm-svn: 227191
2015-01-27 13:14:35 +00:00
Eric Christopher 337262068f Replace some uses of getSubtargetImpl with the cached version
off of the MachineFunction or with the version that takes a
Function reference as an argument.

llvm-svn: 227185
2015-01-27 08:48:42 +00:00
Eric Christopher 7592b0c5e6 Have the PBQP register allocator use the subtarget on the MachineFunction.
(and remove an extraneous private).

llvm-svn: 227181
2015-01-27 08:27:06 +00:00
Eric Christopher ad6bedb43e Fix build failure with pointer vs reference.
NB: Saving files after editing helps.
llvm-svn: 227178
2015-01-27 08:00:42 +00:00
Eric Christopher 2c63549386 Update a few calls to getSubtarget<> to either be getSubtargetImpl
when we didn't need the cast to the base class or the cached version
off of the subtarget.

llvm-svn: 227176
2015-01-27 07:54:39 +00:00
Eric Christopher 3d4276f053 The subtarget is cached on the MachineFunction. Access it directly.
llvm-svn: 227173
2015-01-27 07:31:29 +00:00
Eric Christopher 349d5886e5 MachineRegisterInfo can access TII off of the MachineFunction's
subtarget and so doesn't need the TargetMachine or to access via
getSubtargetImpl. Update all callers.

llvm-svn: 227160
2015-01-27 01:15:16 +00:00
Eric Christopher 4e048eeb2a Migrate AtomicExpandPass and DwarfEHPrepare to using a Function-ized getSubtargetImpl.
llvm-svn: 227159
2015-01-27 01:04:42 +00:00
Eric Christopher fccff37b53 Migrate CodeGenPrepare to use the Function based getSubtarget
code.

llvm-svn: 227157
2015-01-27 01:01:38 +00:00
Eric Christopher 2b214e7ad3 Grab the TargetLowering info from the DAG rather than querying for
a subtarget.

llvm-svn: 227156
2015-01-27 01:01:36 +00:00
Eric Christopher b11a1b7b2c Cache the lookup of TargetLowering in the atomic expand pass.
llvm-svn: 227121
2015-01-26 19:45:40 +00:00
Ahmed Bougacha 9a9e1a59ce [SelectionDAG] Fix assert message copypasta. NFC.
llvm-svn: 227119
2015-01-26 19:31:42 +00:00
Eric Christopher 8b7706517c Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Philip Reames 56a03938f7 Revert GCStrategy ownership changes
This change reverts the interesting parts of 226311 (and 227046).  This change introduced two problems, and I've been convinced that an alternate approach is preferrable anyways.

The bugs were:
- Registery appears to require all users be within the same linkage unit.  After this change, asking for "statepoint-example" in Transform/ would sometimes get you nullptr, whereas asking the same question in CodeGen would return the right GCStrategy.  The correct long term fix is to get rid of the utter hack which is Registry, but I don't have time for that right now.  227046 appears to have been an attempt to fix this, but I don't believe it does so completely.
- GCMetadataPrinter::finishAssembly was being called more than once per GCStrategy.  Each Strategy was being added to the GCModuleInfo multiple times.

Once I get time again, I'm going to split GCModuleInfo into the gc.root specific part and a GCStrategy owning Analysis pass.  I'm probably also going to kill off the Registry.  Once that's done, I'll move the new GCStrategyAnalysis and all built in GCStrategies into Analysis.  (As original suggested by Chandler.)  This will accomplish my original goal of being able to access GCStrategy from Transform/  without adding all of the builtin GCs to IR/.  

llvm-svn: 227109
2015-01-26 18:26:35 +00:00
Adrian Prantl 40cb819c6f Debug info: Fix PR22296 by omitting the DW_AT_location if we lost the
physical register that is described in a DBG_VALUE.

In the testcase the DBG_VALUE describing "p5" becomes unavailable
because the register its address is in is clobbered and we (currently)
aren't smart enough to realize that the value is rematerialized immediately
after the DBG_VALUE and/or is actually a stack slot.

llvm-svn: 227056
2015-01-25 19:04:08 +00:00
Elena Demikhovsky a3232f764e Implemented cost model for masked load/store operations.
llvm-svn: 227035
2015-01-25 08:44:46 +00:00
Saleem Abdulrasool fc07c72674 CodeGen: drive-by formatting clean ups
Minor tweaks to whitespace formatting that I noticed was off.  NFC.

llvm-svn: 227014
2015-01-24 20:19:45 +00:00
Andrea Di Biagio 8381475a75 [DAG] Fix wrong canonicalization performed on shuffle nodes.
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.

Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.

llvm-svn: 227002
2015-01-24 11:54:29 +00:00
Reid Kleckner 3d4638b391 Fix assertion when C++ EH filters are present in functions using SEH
Should fix PR22305.

llvm-svn: 226969
2015-01-23 23:51:25 +00:00
Adrian Prantl 70f2a736db Address more review comments for DIExpression::iterator.
- input_iterator
- define an operator->
- make constructors private were possible

llvm-svn: 226967
2015-01-23 23:40:47 +00:00
Adrian Prantl 4bd0a0c3ed Move the accessor functions from DIExpression::iterator into a wrapper
DIExpression::Operand, so we can write range-based for loops.

Thanks to David Blaikie for the idea.

llvm-svn: 226939
2015-01-23 21:24:41 +00:00
Reid Kleckner 5cc1569c54 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

llvm-svn: 226920
2015-01-23 18:49:01 +00:00
Adrian Prantl 577feba44b Debug Info / PR22309: Allow union types to be emitted as unsigned constants.
llvm-svn: 226919
2015-01-23 18:01:39 +00:00
Mehdi Amini 5059813c2d DAGCombine: always constant fold FMA when target disable FP exceptions
Summary: When trying to constant fold an FMA in the DAG, getNode()
fails to fold the FMA if an operand is not finite. In this case this
patch allows the constant folding if !TLI->hasFloatingPointExceptions()

Reviewers: resistor

Reviewed By: resistor

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D6912

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 226901
2015-01-23 07:07:20 +00:00
Jan Vesely 6269e3ca2f SelectionDAG: Add KnownBits and SignBits computation for EXTRACT_ELEMENT
v2: use getZExtValue
    add missing break
    codestyle

v3: add few more comments

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226880
2015-01-22 23:42:41 +00:00
Ramkumar Ramachandra 75a4f35b26 Intrinsics: introduce llvm_any_ty aka ValueType Any
Specifically, gc.result benefits from this greatly. Instead of:

gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...

We now have a gc.result.* that can specialize to literally any type.

Differential Revision: http://reviews.llvm.org/D7020

llvm-svn: 226857
2015-01-22 20:14:38 +00:00
Sanjay Patel 37c41c1d2c merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. 
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.

The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.

This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.

Differential Revision: http://reviews.llvm.org/D6850
 

llvm-svn: 226845
2015-01-22 18:21:26 +00:00
David Blaikie e7d473461e Revert "PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site."
The underlying bug has been fixed in r226736 so there's no need to
workaround this anymore.

This reverts commit r220923.

llvm-svn: 226842
2015-01-22 17:49:59 +00:00
Adrian Prantl 2585a98d38 Rename DIExpressionIterator to DIExpression::iterator.
Addresses review feedback from Duncan.

llvm-svn: 226835
2015-01-22 16:55:20 +00:00
Michael Kuperstein 25e34d11f3 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

Fixed recommit of r226811.

llvm-svn: 226816
2015-01-22 13:07:28 +00:00
Michael Kuperstein ff74032018 Revert r226811, MSVC accepts code sane compilers don't.
llvm-svn: 226814
2015-01-22 12:48:07 +00:00
Michael Kuperstein 84fad3e5c9 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

llvm-svn: 226811
2015-01-22 12:37:23 +00:00
Elena Demikhovsky 150d9f3187 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.

llvm-svn: 226808
2015-01-22 12:07:59 +00:00
Elena Demikhovsky 94cfbbab33 Fixed a comment
llvm-svn: 226806
2015-01-22 10:01:36 +00:00
Elena Demikhovsky 9c26462a27 Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).

Added a test provided by David (dag@cray.com)

llvm-svn: 226805
2015-01-22 09:39:08 +00:00
Reid Kleckner f690f50519 Win64 SEH: Emit the constant 1 for catch-all into xdata
llvm-svn: 226767
2015-01-22 02:27:44 +00:00
Adrian Prantl 531641a0c6 Make DwarfExpression use the new DIExpressionIterator. NFC.
llvm-svn: 226748
2015-01-22 00:00:59 +00:00
Tim Northover 3007ba0ab3 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

llvm-svn: 226740
2015-01-21 23:17:19 +00:00
Matthias Braun c1988f384c LiveIntervalAnalysis: Mark subregister defs as undef when we determined they are only reading a dead superregister value
This was not necessary before as this case can only be detected when the
liveness analysis is at subregister level.

llvm-svn: 226733
2015-01-21 22:55:13 +00:00
Matthias Braun 311730ac78 LiveIntervalAnalysis: Factor out code to update liveness on vreg def removal
This cleans up code and is more in line with the general philosophy of
modifying LiveIntervals through LiveIntervalAnalysis instead of changing
them directly.

This also fixes a case where SplitEditor::removeBackCopies() would miss
the subregister ranges.

llvm-svn: 226690
2015-01-21 19:02:30 +00:00
Matthias Braun cfb8ad29b5 LiveIntervalAnalysis: Factor out code to update liveness on physreg def removal
This cleans up code and is more in line with the general philosophy of
modifying LiveIntervals through LiveIntervalAnalysis instead of changing
them directly.

llvm-svn: 226687
2015-01-21 18:50:21 +00:00
Matthias Braun 1002baf7b9 LiveIntervalAnalysis: Remove unused pruneValue() variant.
llvm-svn: 226686
2015-01-21 18:45:57 +00:00
Tim Northover cf3d80fedb Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

llvm-svn: 226665
2015-01-21 15:48:52 +00:00
Tim Northover 85cd2791c9 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
llvm-svn: 226663
2015-01-21 15:43:28 +00:00
Daniel Jasper 6b77455f81 Prevent binary-tree deterioration in sparse switch statements.
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.

This is not a complete solution but works around the most pressing
issue.

Review: http://reviews.llvm.org/D7070
llvm-svn: 226600
2015-01-20 19:43:33 +00:00
Daniel Jasper d106b734cf Factor out a splitSwitchCase() function so that it can be reused.
This is in preparation for a fix to llvm.org/PR22262. One of the ideas
here is to first find a good jump table range first and then split
before and after it. Thereby, we don't need to use the
split-based-on-density heuristic at all, which can make the "binary
tree" deteriorate in various cases.

Also some minor cleanups.

No functional changes.

llvm-svn: 226551
2015-01-20 08:57:44 +00:00
Chandler Carruth 10f28f26fd [PM] Replace the Pass argument in MergeBasicBlockIntoOnlyPred with
a DominatorTree argument as that is the analysis that it wants to
update.

This removes the last non-loop utility function in Utils/ which accepts
a raw Pass argument.

llvm-svn: 226537
2015-01-20 01:37:09 +00:00
Adrian Prantl 5883af3faa Remove support for DIVariable's FlagIndirectVariable and expect
frontends to use a DIExpression with a DW_OP_deref instead.

This is not only a much more natural place for this informationl; there
is also a technical reason: The FlagIndirectVariable is used to mark a
variable that is turned into a reference by virtue of the calling
convention; this happens for example to aggregate return values.
The inliner, for example, may actually need to undo this indirection to
correctly represent the value in its new context. This is impossible to
implement because the DIVariable can't be safely modified. We can however
safely construct a new DIExpression on the fly.

llvm-svn: 226476
2015-01-19 17:57:29 +00:00
Rafael Espindola 12ca34f53f Bring r226038 back.
No change in this commit, but clang was changed to also produce trivial comdats when
needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226467
2015-01-19 15:16:06 +00:00
Chandler Carruth 37df2cfbf8 [PM] Remove the Pass argument from all of the critical edge splitting
APIs and replace it and numerous booleans with an option struct.

The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.

The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.

This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.

llvm-svn: 226456
2015-01-19 12:09:11 +00:00
Michael Kuperstein 54c61edee7 [MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local
This fixes PR21792.

Differential Revision: http://reviews.llvm.org/D6823

llvm-svn: 226433
2015-01-19 07:30:47 +00:00
David Blaikie 9459832ebd std::unique_ptrify the MCStreamer argument to createAsmPrinter
llvm-svn: 226414
2015-01-18 20:29:04 +00:00
Mehdi Amini 37f316afaf Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 226360
2015-01-17 01:35:56 +00:00
Matthias Braun 7618b2b23d RegisterCoalescer: Cleanup and improved comment for a subtle detail.
llvm-svn: 226353
2015-01-17 00:33:13 +00:00
Matthias Braun 0eb940aed0 RegisterCoalescer: Cleanup by factoring out a common expression
llvm-svn: 226352
2015-01-17 00:33:11 +00:00
Matthias Braun e2fa081615 RegisterCoalescer: Cleanup comment style
- Consistenly put comments above the function declaration, not the
  definition. To achieve this some duplicate comments got merged and
  some comment parts describing implementation details got moved into their
  functions.
- Consistently use doxygen comments above functions.
- Do not use doxygen comments inside functions.

llvm-svn: 226351
2015-01-17 00:33:09 +00:00
Matthias Braun fc6ef3a270 RegisterCoalescer: Drive-by typo + whitespace fix
llvm-svn: 226350
2015-01-17 00:33:06 +00:00
Philip Reames 287987ca13 Update a comment
Be a bit more explicit about the fact that addrspace(1) is not reserved.

llvm-svn: 226344
2015-01-16 23:21:07 +00:00
Philip Reames 36319538d0 clang-format all the GC related files (NFC)
Nothing interesting here...

llvm-svn: 226342
2015-01-16 23:16:12 +00:00
Philip Reames 2b45395876 Move ownership of GCStrategy objects to LLVMContext
Note: This change ended up being slightly more controversial than expected.  Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions.

Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance.

In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.)

Differential Revision: http://reviews.llvm.org/D6811

llvm-svn: 226311
2015-01-16 20:07:33 +00:00
Philip Reames 7de640a876 Remove gc.root's findCustomSafePoints mechanism
Searching all of the existing gc.root implementations I'm aware of (all three of them), there was exactly one use of this mechanism, and that was to implement a performance improvement that should have been applied to the default lowering.

Having this function is requiring a dependency on a CodeGen class (MachineFunction), in a class which is otherwise completely independent of CodeGen. I could solve this differently, but given that I see absolutely no value in preserving this mechanism, I going to just get rid of it.

Note: Tis is the first time I'm intentionally breaking previously supported gc.root functionality. Given 3.6 has branched, I believe this is a good time to do this.

Differential Revision: http://reviews.llvm.org/D7004

llvm-svn: 226305
2015-01-16 19:33:28 +00:00
Timur Iskhodzhanov 60b721363c Revert r226242 - Revert Revert Don't create new comdats in CodeGen
This breaks AddressSanitizer (ninja check-asan) on Windows

llvm-svn: 226251
2015-01-16 08:38:45 +00:00
Rafael Espindola 67a79e72f5 Revert "Revert Don't create new comdats in CodeGen"
This reverts commit r226173, adding r226038 back.

No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226242
2015-01-16 02:22:55 +00:00
Hal Finkel 5ef58eb86d Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers""
Reapply r226071 with fixes. Two fixes:

 1. We need to manually remove the old and create the new 'deaf defs'
    associated with physical register definitions when we move the definition of
    the physical register from the copy point to the point of the original vreg def.

    This problem was picked up by the machinstr verifier, and could trigger a
    verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
    turned on the verifier in the tests.

 2. When moving the def point of the phys reg up, we need to make sure that it
    is neither defined nor read in between the two instructions. We don't, however,
    extend the live ranges of phys reg defs to cover uses, so just checking for
    live-range overlap between the pair interval and the phys reg aliases won't
    pick up reads. As a result, we manually iterate over the range and check for
    reads.

    A test soon to be committed to the PowerPC backend will test this change.

Original commit message:

[RegisterCoalescer] Remove copies to reserved registers

This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226200
2015-01-15 20:32:09 +00:00
Philip Reames 66c9fb0d52 Style cleanup of old gc.root lowering code
Use static functions for helpers rather than static member functions.  a) this changes the linking (minor at best), and b) this makes it obvious no object state is involved.

llvm-svn: 226198
2015-01-15 19:49:25 +00:00
Philip Reames b87144160e clang-format GCStrategy.cpp & GCRootLowering.cpp (NFC)
llvm-svn: 226196
2015-01-15 19:39:17 +00:00
Philip Reames f27f373895 Split GCStrategy.cpp into two files (NFC)
This preparation for an update to http://reviews.llvm.org/D6811.  GCStrategy.cpp will hopefully be moving into IR/, where as the lowering logic needs to stay in CodeGen/

llvm-svn: 226195
2015-01-15 19:29:42 +00:00
Timur Iskhodzhanov f5adf13fac Revert Don't create new comdats in CodeGen
It breaks AddressSanitizer on Windows.

llvm-svn: 226173
2015-01-15 16:14:34 +00:00
Mehdi Amini fa546b29a0 Fix SelectionDAG -view-*-dags filtering
llvm-svn: 226163
2015-01-15 12:03:32 +00:00
Alexander Kornienko 8c0809c7f8 Replace size method call of containers to empty method where appropriate
This patch was generated by a clang tidy checker that is being open sourced.
The documentation of that checker is the following:

/// The emptiness of a container should be checked using the empty method
/// instead of the size method. It is not guaranteed that size is a
/// constant-time function, and it is generally more efficient and also shows
/// clearer intent to use empty. Furthermore some containers may implement the
/// empty method but not implement the size method. Using empty whenever
/// possible makes it easier to switch to another container in the future.

Patch by Gábor Horváth!

llvm-svn: 226161
2015-01-15 11:41:30 +00:00
Chandler Carruth b98f63dbdb [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
Hal Finkel dd669615dd Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"
Reverting this while I investigate some bad behavior this is causing. As a
possibly-related issue, adding -verify-machineinstrs to one of the test cases
now fails because of this change:

  llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs

*** Bad machine code: No instruction at def index ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
Valno #3 is defined at 624r

*** Bad machine code: Live segment doesn't end at a valid instruction ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
[624r,624d:3)
LLVM ERROR: Found 2 machine code errors.

where 624r corresponds exactly to the interval combining change:

624B    %RSP<def> = COPY %vreg16; GR64:%vreg16
        Considering merging %vreg16 with %RSP
                RHS = %vreg16 [608r,624r:0)  0@608r
                updated: 608B   %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1]
        Success: %vreg16 -> %RSP
        Result = %RSP

llvm-svn: 226086
2015-01-15 03:08:59 +00:00
Chandler Carruth 62d4215baa [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
NAKAMURA Takumi 95b3880dd0 Win64Exception.cpp: Try to fix crash for x64 EH. "Per" might be null there.
llvm-svn: 226077
2015-01-15 02:15:21 +00:00
Hal Finkel 8299646236 [RegisterCoalescer] Remove copies to reserved registers
This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226071
2015-01-15 01:25:28 +00:00
Ramkumar Ramachandra dba7329ebb [GC] CodeGenPrep transform: simplify offsetable relocate
The transform is somewhat involved, but the basic idea is simple: find
derived pointers that have been offset from the base pointer using gep
and replace the relocate of the derived pointer with a gep to the
relocated base pointer (with the same offset).

llvm-svn: 226060
2015-01-14 23:27:07 +00:00
Reid Kleckner e80a0a7572 Use MMI->getPersonality() instead of MMI->getPersonalities()[MMI->getPersonalityIndex()]
Also nuke the comment about supporting multiple personalities in a
single function, aka PR1414. That's just crazy.

llvm-svn: 226052
2015-01-14 22:47:54 +00:00
Matthias Braun 96a319588a MachineVerifier: Allow undef reads if a matching superreg is defined.
Summary:
Some pseudo instruction expansions break down a wide register use into
multiple uses of smaller sub registers. If the super register was
partially undefined the broken down sub registers may be completely
undefined now leading to MachineVerifier complaints. Unfortunately
liveness information to add the required dead flags is not easily
(cheaply) available when expanding pseudo instructions.

This commit changes the verifier to be quiet if there is an additional
implicit use of a super register. Pseudo instruction expanders can use
this to mark cases where partially defined values get potentially broken
into completely undefined ones.

Differential Revision: http://reviews.llvm.org/D6973

llvm-svn: 226047
2015-01-14 22:25:14 +00:00
Rafael Espindola fad1639a12 Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226038
2015-01-14 20:55:48 +00:00
Chandler Carruth e3288147f0 [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.

A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.

In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.

Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.

Differential Revision: http://reviews.llvm.org/D6969

llvm-svn: 226034
2015-01-14 20:19:29 +00:00
Reid Kleckner 9b5eaf0d5a Emit the Itanium LSDA for unknown EH personalities on Win64
This fixes lots of generic CodeGen tests that use __gcc_personality_v0.
This suggests that using ExceptionHandling::MSVC was a mistake, and we
should instead classify each function by personality function. This
would, for example, allow us to LTO a binary containing uses of SEH and
Itanium EH.

llvm-svn: 226019
2015-01-14 18:50:10 +00:00
Reid Kleckner b57c1dc0f7 Remove dead code for llvm.eh.selector in the old EH model
llvm-svn: 226018
2015-01-14 18:49:39 +00:00
Chandler Carruth d9903888d9 [cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.

I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.

llvm-svn: 225974
2015-01-14 11:23:27 +00:00
Mehdi Amini d8976b8ed3 SelectionDAG: add a -filter-view-dags option to llc
This option takes the name of the basic block you want to visualize
with -view-*-dags

Differential Revision: http://reviews.llvm.org/D6948

llvm-svn: 225953
2015-01-14 06:03:18 +00:00
Mehdi Amini 648eff1695 DAG Combiner: Fold SelectCC When Cond is UNDEF
In case folding a node end up with a NaN as operand for the select, 
the folding of the condition of the selectcc node returns "UNDEF".

Differential Revision: http://reviews.llvm.org/D6889

llvm-svn: 225952
2015-01-14 05:45:24 +00:00
Mehdi Amini 7b068f6ba4 Add assertions for out of bound index in ComputeLinearIndex
llvm-svn: 225951
2015-01-14 05:38:48 +00:00
Mehdi Amini 8923cc5470 Fold a loop for array processing in ComputeLinearIndex
When processing an array, every Elt has the same layout, it is
useless to recursively call each ComputeLinearIndex on each element.
Just do it once and multiply by the number of elements.
    
Differential Revision: http://reviews.llvm.org/D6832

llvm-svn: 225949
2015-01-14 05:33:01 +00:00
JF Bastien eeea8970b4 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

llvm-svn: 225948
2015-01-14 05:24:33 +00:00
Matt Arsenault bd22342322 Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.

This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.

This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.

llvm-svn: 225925
2015-01-14 01:35:17 +00:00
JF Bastien dcdd5ad252 Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.

Command line options:
  -noop-insertion // Enable noop insertion.
  -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
  -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.

In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
  -fdiversify

This is the llvm part of the patch.
clang part: D3393

http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)

llvm-svn: 225908
2015-01-14 01:07:26 +00:00
Hal Finkel 665026838b Adjust ScheduleDAGSDNodes::RegDefIter for patchpoints
PATCHPOINT is a strange pseudo-instruction. Depending on how it is used, and
whether or not the AnyReg calling convention is being used, it might or might
not define a value. However, its TableGen definition says that it defines one
value, and so when it doesn't, the code in ScheduleDAGSDNodes::RegDefIter
becomes confused and the code that uses the RegDefIter will try to get the
register class of the MVT::Other type associated with the PATCHPOINT's chain
result (under certain circumstances).

This will be covered by the PPC64 PatchPoint test cases once that support is
re-committed.

llvm-svn: 225907
2015-01-14 01:07:03 +00:00
Reid Kleckner 0a57f65514 CodeGen support for x86_64 SEH catch handlers in LLVM
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.

The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.

Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
  outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH

In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6300

llvm-svn: 225904
2015-01-14 01:05:27 +00:00
Adrian Prantl 7813d9c979 Debug Info: Implement DwarfCompileUnit::addComplexAddress() using
DIEDwarfExpression (and get rid of a bunch of redundant code).

NFC

llvm-svn: 225900
2015-01-14 01:01:30 +00:00
Adrian Prantl ad768c3719 Debug Info: Emitting a register in DwarfExpression may fail. Report the
status in a bool and let the users deal with the error.

NFC.

llvm-svn: 225899
2015-01-14 01:01:28 +00:00
Adrian Prantl 658676c3ea Debug Info: Move DIEDwarfExpression into DwarfExpression.h because it
needs to be accessed from both DwarfCompileUnit.cpp and DwarfUnit.cpp.

NFC.

llvm-svn: 225898
2015-01-14 01:01:22 +00:00
Eric Christopher 6e30cd95cb Migrate ABIName to MCTargetOptions so that it can be shared between
the TargetMachine level and the MC level.

llvm-svn: 225891
2015-01-14 00:50:31 +00:00
Adrian Prantl 8efadbf868 Debug Info: Don't bother emitting DW_AT_frame_base if the function has
no frame register. "Tested" via an assertion triggered by DwarfExpression.

llvm-svn: 225858
2015-01-14 00:15:16 +00:00
Adrian Prantl 1411577ad9 Revert "Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a"
This reverts commit r225852, it was a bad idea.

MachineReg should always be a physical register. If it isn't this DebugLoc
shouldn't have been created in the first place.

llvm-svn: 225857
2015-01-14 00:15:12 +00:00
Adrian Prantl e8e0bac270 Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a
physical register. The call to getMinimalPhysRegClass() later on asserts
on this condition.

llvm-svn: 225852
2015-01-13 23:39:15 +00:00
Adrian Prantl 092d9489ed Debug Info: Move the complex expression handling (=the remainder) of
emitDebugLocValue() into DwarfExpression.

Ought to be NFC, but it actually uncovered a bug in the debug-loc-asan.ll
testcase. The testcase checks that the address of variable "y" is stored
at [RSP+16], which also lines up with the comment.
It also check(ed) that the *value* of "y" is stored in RDI before that,
but that is actually incorrect, since RDI is the very value that is
stored in [RSP+16]. Here's the assembler output:

	movb	2147450880(%rcx), %r8b
	#DEBUG_VALUE: bar:y <- RDI
	cmpb	$0, %r8b
	movq	%rax, 32(%rsp)          # 8-byte Spill
	movq	%rsi, 24(%rsp)          # 8-byte Spill
	movq	%rdi, 16(%rsp)          # 8-byte Spill
.Ltmp3:
	#DEBUG_VALUE: bar:y <- [RSP+16]

Fixed the comment to spell out the correct register and the check to
expect an address rather than a value.

Note that the range that is emitted for the RDI location was and is still
wrong, it claims to begin at the function prologue, but really it should
start where RDI is first assigned.

llvm-svn: 225851
2015-01-13 23:39:11 +00:00
Adrian Prantl 0a3bfdbd37 cleanup.
llvm-svn: 225848
2015-01-13 23:11:51 +00:00
Adrian Prantl 172ab66a11 Document, cleanup, and clang-format DwarfExpression.h
llvm-svn: 225847
2015-01-13 23:11:07 +00:00
Adrian Prantl 8995f5c92f Debug Info: Turn DIExpression::getFrameRegister() into an isFrameRegister()
function.

NFC.

llvm-svn: 225846
2015-01-13 23:10:43 +00:00
Matthias Braun f50ab43214 DAGCombiner: simplify by using condition variables; NFC
llvm-svn: 225836
2015-01-13 22:17:46 +00:00
Matt Arsenault bf0db918b2 R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.

llvm-svn: 225828
2015-01-13 20:53:23 +00:00
Hal Finkel c4ee2c5188 [StackMaps] Use CurrentFnSymForSize
When computing the call-site offset, use AP.CurrentFnSymForSize instead of
AP.CurrentFnSym. There should be no change for other targets, but this is
necessary for generating valid expressions for PPC64/ELF.

llvm-svn: 225807
2015-01-13 17:48:07 +00:00
Hal Finkel 0ad96c818c [StackMaps] Mark in CallLoweringInfo when lowering a patchpoint
While, generally speaking, the process of lowering arguments for a patchpoint
is the same as lowering a regular indirect call, on some targets it may not be
exactly the same. Targets may not, for example, want to add additional register
dependencies that apply only to making cross-DSO calls through linker stubs,
may not want to load additional registers out of function descriptors, and may
not want to add additional side-effect-causing instructions that cannot be
removed later with the call itself being generated.

The PowerPC target will use this in a future commit (for all of the reasons
stated above).

llvm-svn: 225806
2015-01-13 17:48:04 +00:00
Hal Finkel df87f9383b [StackMaps] Allow the target to pre-process the live-out mask
Some targets, PowerPC for example, have pseudo-registers (such as that used to
represent the rounding mode), that don't have DWARF register numbers or a
register class. These are used only for internal dependency tracking, and
should not appear in the recorded live-outs. This adds a callback allowing the
target to pre-process the live-out mask in order to remove these kinds of
registers so that the StackMaps code does not complain about them and/or
attempt to include them in the output.

This will be used by the PowerPC target in a future commit.

llvm-svn: 225805
2015-01-13 17:47:59 +00:00
Olivier Sallenave 325096980b Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
llvm-svn: 225795
2015-01-13 15:06:36 +00:00
Mehdi Amini 22e59748ef Peephole opt needs optimizeSelect() to keep track of newly created MIs
Peephole optimizer is scanning a basic block forward. At some point it 
needs to answer the question "given a pointer to an MI in the current 
BB, is it located before or after the current instruction".
To perform this, it keeps a set of the MIs already seen during the scan, 
if a MI is not in the set, it is assumed to be after.
It means that newly created MIs have to be inserted in the set as well.

This commit passes the set as an argument to the target-dependent 
optimizeSelect() so that it can properly update the set with the 
(potentially) newly created MIs.

llvm-svn: 225772
2015-01-13 07:07:13 +00:00
Reid Kleckner 3542ace6ef Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

llvm-svn: 225752
2015-01-13 01:51:34 +00:00
Reid Kleckner e9b8931873 Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

llvm-svn: 225746
2015-01-13 00:48:10 +00:00
Matt Arsenault a982e4f82b Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

llvm-svn: 225744
2015-01-13 00:43:00 +00:00
Adrian Prantl 66f2595845 Debug Info: Move support for constants into DwarfExpression.
Move the declaration of DebugLocDwarfExpression into DwarfExpression.h
because it needs to be accessed from AsmPrinterDwarf.cpp and DwarfDebug.cpp

NFC.

llvm-svn: 225734
2015-01-13 00:04:06 +00:00
Adrian Prantl a4c30d6509 Make DwarfExpression store the AsmPrinter instead of the TargetMachine.
NFC.

llvm-svn: 225731
2015-01-12 23:36:56 +00:00
Adrian Prantl 9cffbd8daa remove extra semicolon
llvm-svn: 225730
2015-01-12 23:36:50 +00:00
Reid Kleckner bba20f06de musttail: Only set the inreg flag for fastcall and vectorcall
Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and
stdcall thunks, leaving us with no scratch registers for indirect call
targets.

Fixes PR22052.

llvm-svn: 225729
2015-01-12 23:28:23 +00:00
Adrian Prantl 337e360279 Run clang-format on the parts of AsmPrinterDwarf where it improves the
readability.

llvm-svn: 225726
2015-01-12 23:03:23 +00:00
Adrian Prantl 0fec811d7b Debug Info: Add a virtual destructor to DwarfExpression.
Thanks Chandler for noticing!

llvm-svn: 225724
2015-01-12 22:59:28 +00:00
Adrian Prantl 0d5df0ac1c Untwine this expression. Thanks to David for noticing!
llvm-svn: 225720
2015-01-12 22:39:14 +00:00
Adrian Prantl 0e6ffb9d0d Debug Info: Implement DwarfUnit::addRegisterOpPiece() using DwarfExpression.
NFC.

llvm-svn: 225717
2015-01-12 22:37:16 +00:00
Adrian Prantl 00dbc2a7d3 Debug Info: Implement DwarfUnit::addRegisterOffset using DwarfExpression.
No functional change.

llvm-svn: 225707
2015-01-12 22:19:26 +00:00
Adrian Prantl b16d9ebb0c Debug info: Factor out the creation of DWARF expressions from AsmPrinter
into a new class DwarfExpression that can be shared between AsmPrinter
and DwarfUnit.

This is the first step towards unifying the two entirely redundant
implementations of dwarf expression emission in DwarfUnit and AsmPrinter.

Almost no functional change — Testcases were updated because asm comments
that used to be on two lines now appear on the same line, which is
actually preferable.

llvm-svn: 225706
2015-01-12 22:19:22 +00:00
Matthias Braun f5d931f716 RegisterCoalescer: Turn some impossible conditions into asserts
This is a fixed version of reverted r225500. It fixes the too early
if() continue; of the last patch and adds a comment to the unorthodox
loop.

llvm-svn: 225652
2015-01-12 19:10:17 +00:00
Ahmed Bougacha e03bef7543 [SimplifyLibCalls] Factor out fortified libcall handling.
This lets us remove CGP duplicate.

Differential Revision: http://reviews.llvm.org/D6541

llvm-svn: 225640
2015-01-12 17:22:43 +00:00
Joerg Sonnenberger 8a36a8e5d4 Revert r225500, it leads to infinite loops.
llvm-svn: 225590
2015-01-10 21:49:36 +00:00
Lang Hames 1e923ec122 Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision
introduced.

A test case for the bug was already committed in r225385.

Patch by Rafael Espindola.

llvm-svn: 225534
2015-01-09 18:55:42 +00:00
Matthias Braun 7e87384592 RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.

llvm-svn: 225503
2015-01-09 03:01:31 +00:00
Matthias Braun ea399e59cf RegisterCoalescer: Some cleanup in removeCopyByCommutingDef(), NFC
llvm-svn: 225502
2015-01-09 03:01:28 +00:00
Matthias Braun 55586a2f2d RegisterCoalescer: No need to set kill flags, they are recompute later anyway
llvm-svn: 225501
2015-01-09 03:01:26 +00:00
Matthias Braun 6588b145fc RegisterCoalescer: Turn some impossible conditions into asserts
llvm-svn: 225500
2015-01-09 03:01:23 +00:00
Hal Finkel 0ce7f372e5 [DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA,
we need to extend the incoming operands so that the resulting node will really
be legal. This is currently enabled only for PowerPC, and it happens to work
there regardless, but this should fix the functionality for everyone else
should anyone else wish to use it.

llvm-svn: 225492
2015-01-09 01:29:29 +00:00
Hal Finkel 33ead6f901 Partial fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), there are two things wrong with this code.
First, it adds patterns which elide FP extends when forming FMAs, and that might
not be profitable on all targets (it belongs behind the pre-existing
aggressive-FMA-formation flag). This is fixed by this change.

Second, the resulting nodes might have operands of different types (the
extensions need to be re-added). That will be fixed in the follow-up commit.

llvm-svn: 225485
2015-01-09 00:45:54 +00:00
Hal Finkel 0709f5160f [MachineLICM] A command-line option to hoist even cheap instructions
Add a command-line option to enable hoisting even cheap instructions (in
low-register-pressure situations). This is turned off by default, but has
proved useful for testing purposes.

llvm-svn: 225470
2015-01-08 22:10:48 +00:00
Duncan P. N. Exon Smith e90f1165d8 CodeGen: Use handy new-fangled post-increment, NFC
Drive-by cleanup; I noticed this when reviewing the patch that became
r225466.

llvm-svn: 225468
2015-01-08 21:07:55 +00:00
Duncan P. N. Exon Smith 5914a97af8 CodeGen: Use range-based for loops, NFC
Patch by Ramkumar Ramachandra!

llvm-svn: 225466
2015-01-08 20:44:33 +00:00
Elena Demikhovsky 285fbd551a Masked Load/Store - fixed a bug in type legalization.
llvm-svn: 225441
2015-01-08 12:29:19 +00:00
Michael Kuperstein 698ea3b488 Fix include ordering, NFC.
llvm-svn: 225439
2015-01-08 11:59:43 +00:00
Michael Kuperstein 8c65e31a5a Move SPAdj logic from PEI into the targets (NFC)
PEI tries to keep track of how much starting or ending a call sequence adjusts the stack pointer by, so that it can resolve frame-index references. Currently, it takes a very simplistic view of how SP adjustments are done - both FrameStartOpcode and FrameDestroyOpcode adjust it exactly by the amount written in its first argument.

This view is in fact incorrect for some targets (e.g. due to stack re-alignment, or because it may want to adjust the stack pointer in multiple steps). However, that doesn't cause breakage, because most targets (the only in-tree exception appears to be 32-bit ARM) rely on being able to simplify the call frame pseudo-instructions earlier, so this code is never hit. 

Moving the computation into TargetInstrInfo allows targets to override the way the adjustment is computed if they need to have a non-zero SPAdj.

Differential Revision: http://reviews.llvm.org/D6863

llvm-svn: 225437
2015-01-08 11:04:38 +00:00
Quentin Colombet a799e2e014 [RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.


** Context **

Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.

This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B

Whereas we could simply have done:
def B
...
def A
use A
...
use B


** Proposed Solution **

A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.

Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.

The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.


** Example **

Consider the following example:
BB1:
  a =
  b =
BB2:
  ...
   = b
   = a
Let us assume b gets split:
BB1:
  a =
  b =
BB2:
  c = b
  ...
  d = c
  = d
  = a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
  a =
  st a, SpillSlot
  b =
BB2:
  c = b
  ...
  d = c
  = d
  e = ld SpillSlot
  = e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.


** Performances **

Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.

<rdar://problem/18312047>

llvm-svn: 225422
2015-01-08 01:16:39 +00:00
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Matthias Braun 9d7bc0874c RegisterCoalescer: Do not remove IMPLICIT_DEFS if they are required for subranges.
The register coalescer used to remove implicit_defs when they are
covered by the main range anyway. With subreg liveness tracking we can't
do that anymore in places where the IMPLICIT_DEF is required as begin of
a subregister liverange.

llvm-svn: 225416
2015-01-08 00:21:23 +00:00
Matthias Braun d55e6ddacf RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.
I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a
subregister index in SrcReg/DstReg, but they are actually subregister
indices of the coalesced register that get you back to SrcReg/DstReg
when applied.

Fixed the bug, improved comments and simplified code accordingly.

Testcase by Tom Stellard!

llvm-svn: 225415
2015-01-07 23:58:38 +00:00
Matthias Braun 4fe686af00 LiveInterval: Implement feedback by Quentin Colombet.
llvm-svn: 225413
2015-01-07 23:35:11 +00:00
Adrian Prantl d88af278b9 Update a comment.
llvm-svn: 225399
2015-01-07 21:35:13 +00:00
Ahmed Bougacha 67dd2d25a3 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.

llvm-svn: 225392
2015-01-07 21:27:10 +00:00
Olivier Sallenave 0451532996 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Adrian Prantl 3dd48c6fde Debug info: Allow aggregate types to be described by constants.
llvm-svn: 225378
2015-01-07 20:48:58 +00:00
Olivier Sallenave e64ad7cedd Test commit
llvm-svn: 225368
2015-01-07 19:45:17 +00:00
Philip Reames 352fb93773 Add a missing file from 225365
llvm-svn: 225366
2015-01-07 19:13:28 +00:00
Philip Reames 4ac17a3026 Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)

Most of the validation logic added here as proof of concept will soon move in to the Verifier.  That move is dependent on http://reviews.llvm.org/D6811

There was discussion in the review thread about addrspace(1) being reserved for something.  I'm going to follow up on a seperate llvmdev thread.  If needed, I'll update all the code at once.

Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.

Differential Revision: http://reviews.llvm.org/D6808

llvm-svn: 225365
2015-01-07 19:07:50 +00:00
Jonas Paulsson fcf0cba88c New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp.
Used to iterate over previously added memory dependencies in
adjustChainDeps() and iterateChainSucc().

SDep::isCtrl() was previously used in these places, that also gave
anti and output edges. The code may be worse if these are followed,
because MisNeedChainEdge() will conservatively return true since a
non-memory instruction has no memory operands, and a false chain dep
will be added. It is also unnecessary since all memory accesses of
interest will be reached by memory dependencies, and there is a budget
limit for the number of edges traversed.

This problem was found on an out-of-tree target with enabled alias
analysis. No test case for an in-tree target has been found.

Reviewed by Hal Finkel.

llvm-svn: 225351
2015-01-07 13:38:29 +00:00
Jonas Paulsson bf408bbe38 Fix typos in comment and option help texts.
For -enable-aa-sched-mi and -use-tbaa-in-sched-mi.

llvm-svn: 225350
2015-01-07 13:20:57 +00:00
Lang Hames 66f755f84f Revert r224935 "Refactor duplicated code. No intended functionality change."
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.

llvm-svn: 225311
2015-01-06 23:04:36 +00:00
Mehdi Amini f3721bf619 SelectionDAGBuilder: move constant initialization out of loop
No semantic change intended.

Reviewers: resistor

Differential Revision: http://reviews.llvm.org/D6834

llvm-svn: 225278
2015-01-06 18:20:04 +00:00
Andrea Di Biagio f807a6f297 [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.

The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.

With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.

Added new test cases to CodeGen/X86/cttz-ctlz.ll

Differential Revision: http://reviews.llvm.org/D6853

llvm-svn: 225274
2015-01-06 17:41:18 +00:00
Frederic Riss e541e0b327 Make DIE.h a public CodeGen header.
dsymutil would like to use all the AsmPrinter/MCStreamer infrastructure
to stream out the DWARF. In order to do so, it will reuse the DIE object
and so this header needs to be public.

The interface exposed here has some corners that cannot be used without a
DwarfDebug object, but clients that want to stream Dwarf can just avoid
these.

Differential Revision: http://reviews.llvm.org/D6695

llvm-svn: 225208
2015-01-05 21:29:41 +00:00
Craig Topper d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Hal Finkel 5772566ed6 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

llvm-svn: 225117
2015-01-03 17:58:24 +00:00
Alexey Samsonov 553185ee4b Revert "merge consecutive stores of extracted vector elements"
This reverts commit r224611. This change causes crashes
in X86 DAG->DAG Instruction Selection.

llvm-svn: 225031
2014-12-31 00:40:28 +00:00
David Blaikie aeaa5bf55e DebugInfo: Omit is_stmt from line table entries on the same line.
GCC does this for non-zero discriminators and since GCC doesn't produce
column info, that was the only place it comes up there. For LLVM, since
we can emit discriminators and/or column info, it makes more sense to
invert the condition and just test for changes in line number.

This should resolve at least some of the GDB 7.5 test suite failures
created by recent Clang changes that increase the location fidelity
(which, since Clang defaults to including column info on Linux by
default created a bunch of cases that confused GDB).

In theory we could do this better/differently by grouping actual source
statements together in a similar manner to the way lexical scopes are
handled but given that GDB isn't really in a position to consume that (&
users are probably somewhat used to different lines being different
'statements') this seems the safest and cheapest change. (I'm concerned
that doing this 'right' would bloat the debugloc data even further -
something Duncan's working hard to address)

llvm-svn: 225011
2014-12-30 22:47:13 +00:00
Peter Collingbourne 7ef497b1f5 x86_64: Fix calls to __morestack under the large code model.
Under the large code model, we cannot assume that __morestack lives within
2^31 bytes of the call site, so we cannot use pc-relative addressing. We
cannot perform the call via a temporary register, as the rax register may
be used to store the static chain, and all other suitable registers may be
either callee-save or used for parameter passing. We cannot use the stack
at this point either because __morestack manipulates the stack directly.

To avoid these issues, perform an indirect call via a read-only memory
location containing the address.

This solution is not perfect, as it assumes that the .rodata section
is laid out within 2^31 bytes of each function body, but this seems to
be sufficient for JIT.

Differential Revision: http://reviews.llvm.org/D6787

llvm-svn: 225003
2014-12-30 20:05:19 +00:00
Michael Kuperstein c43b063358 [COFF] Don't try to add quotes to already quoted linker directives
If a linker directive is already quoted, don't try to quote it again, otherwise it creates a mess.
This pops up in places like:
#pragma comment(linker,"\"/foo bar'\"")

Differential Revision: http://reviews.llvm.org/D6792

llvm-svn: 224998
2014-12-30 19:23:48 +00:00
Rafael Espindola bed67f3adc Refactor duplicated code.
No intended functionality change.

llvm-svn: 224935
2014-12-29 15:18:31 +00:00
Andrea Di Biagio 22ee3f63b9 [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.

Example:
\code
entry:
  %cmp = icmp eq i64 %val, 0
  br i1 %cmp, label %end.bb, label %then.bb

then.bb:
  %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
  br label %end.bb

end.bb:
  %cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code

In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.

With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.

Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.

Differential Revision: http://reviews.llvm.org/D6728

llvm-svn: 224899
2014-12-28 11:07:35 +00:00
Elena Demikhovsky 87700a734d Scalarizer for masked load and store intrinsics.
Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks.

http://reviews.llvm.org/D6436

llvm-svn: 224897
2014-12-28 08:54:45 +00:00
Timur Iskhodzhanov b6fa52f274 Band-aid fix for PR22032: don't emit DWARF debug info if AddressSanitizer is enabled on Windows
llvm-svn: 224860
2014-12-26 17:00:51 +00:00
David Majnemer 25b383ac66 Silence GCC's -Wparentheses warning
No functionality change intended.

llvm-svn: 224833
2014-12-25 10:03:23 +00:00
Elena Demikhovsky fb81b93e17 Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.

llvm-svn: 224829
2014-12-25 07:49:20 +00:00
David Majnemer 2913eca4e2 CodeGen: Error on redefinitions instead of asserting
It's possible to have a prior definition of a symbol in module asm.
Raise an error instead of crashing.

llvm-svn: 224828
2014-12-24 23:06:55 +00:00
David Majnemer 8e92dfee20 CodeGen: Allow aliases to be overridden by variables
llvm-svn: 224827
2014-12-24 22:44:29 +00:00
David Majnemer 58cb80c940 MC: Label definitions are permitted after .set directives
.set directives may be overridden by other .set directives as well as
label definitions.

This fixes PR22019.

llvm-svn: 224811
2014-12-24 10:27:50 +00:00
Matthias Braun 51ca510094 LiveInterval: Remove accidentally committed debug code.
llvm-svn: 224807
2014-12-24 02:35:07 +00:00
Matthias Braun dbcca0dbb4 LiveInterval: Introduce createMainRangeFromSubranges().
This function constructs the main liverange by merging all subranges if
subregister liveness tracking is available. This should be slightly
faster to compute instead of performing the liveness calculation again
for the main range. More importantly it avoids cases where the main
liverange would cover positions where no subrange was live. These cases
happened for partial definitions where the actual defined part was dead
and only the undefined parts used later.

The register coalescing requires that every part covered by the main
live range has at least one subrange live.

I also expect this function to become usefull later for places where the
subranges are modified in a way that it is hard to correctly fix the
main liverange in the machine scheduler, we can simply reconstruct it
from subranges then.

llvm-svn: 224806
2014-12-24 02:11:51 +00:00
Matthias Braun 7030dda8d5 RegisterCoalescer: With subrange liveness there may be no RedefVNI for unused lanes.
llvm-svn: 224805
2014-12-24 02:11:48 +00:00
Matthias Braun 36768c684f LiveRangeEdit: Check for completely empy subranges after removing ValNos.
Completely empty subranges are not allowed and must be removed when
subreg liveness is enabled.

llvm-svn: 224804
2014-12-24 02:11:46 +00:00
Matthias Braun f603c88d13 LiveIntervalAnalysis: Fix performance bug that I introduced in r224663.
Without a reference the code did not remember when moving the iterators
of the subranges/registerunit ranges forward and instead would scan from
the beginning again at the next position.

llvm-svn: 224803
2014-12-24 02:11:43 +00:00
Adrian Prantl 3026a54aa2 Debug Info: In symmetry to DW_TAG_pointer_type, do not emit the byte size
of a DW_TAG_ptr_to_member_type.
This restores the behavior from before r224780-r224781.

llvm-svn: 224799
2014-12-24 01:17:51 +00:00
Mehdi Amini d38920891e Always assert in DAGCombine and not only when -debug is enabled
Right now in DAG Combine check the validity of the returned type 
only when -debug is given on the command line. However usually 
the test cases in the validation does not use -debug. 
An Assert build should always check this.

llvm-svn: 224779
2014-12-23 18:59:02 +00:00
Michael Kuperstein f4536ea6e8 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Reid Kleckner ce0093344f Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

llvm-svn: 224745
2014-12-22 23:58:37 +00:00
Quentin Colombet 84f89ccd45 [CodeGenPrepare] Handle properly the promotion of operands when this does not
generate instructions.

Fixes PR21978.
Related to <rdar://problem/18310086>

llvm-svn: 224717
2014-12-22 18:11:52 +00:00
Rafael Espindola 366e5c1bf1 The leak detector is dead, long live asan and valgrind.
In resent times asan and valgrind have found way more memory management bugs
in llvm than the special purpose leak detector.

llvm-svn: 224703
2014-12-22 13:00:36 +00:00
Saleem Abdulrasool 90c224a143 CodeGen: minor style tweaks to SSP
Clean up some style related things in the StackProtector CodeGen.  NFC.

llvm-svn: 224693
2014-12-21 21:52:38 +00:00
Matt Arsenault 22b4c256e1 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

llvm-svn: 224691
2014-12-21 16:48:42 +00:00
Saleem Abdulrasool 57b5fe57e3 CodeGen: constify and use range loop for SSP
Use range-based for loop and constify the iterators.  NFC.

llvm-svn: 224683
2014-12-20 21:37:51 +00:00
Matthias Braun 714c494ca1 LiveIntervalAnalysis: No kill flags for partially undefined uses.
We must not add kill flags when reading a vreg with some undefined
subregisters, if subreg liveness tracking is enabled.  This is because
the register allocator may reuse these undefined subregisters for other
values which are not killed.

llvm-svn: 224664
2014-12-20 01:54:50 +00:00
Matthias Braun 7f8dece1d7 LiveIntervalAnalysis: cleanup addKills(), NFC
- Use more const modifiers
- Use references for things that can't be nullptr
- Improve some variable names

llvm-svn: 224663
2014-12-20 01:54:48 +00:00
Reid Kleckner f2acbbaf22 EH: Sink computation of local PadMap variable into function that uses it
No functionality change.

llvm-svn: 224635
2014-12-19 22:30:08 +00:00
Reid Kleckner 93acac6cfc Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.

None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass.  No functionality change for other targets.

llvm-svn: 224625
2014-12-19 22:19:48 +00:00
Sanjay Patel 0428a5786e merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Matthias Braun aeb50b3805 RegisterCoalescer: rewrite eliminateUndefCopy().
This also fixes problems with undef copies of subregisters. I can't
attach a testcase for that as none of the targets in trunk has
subregister liveness tracking enabled.

llvm-svn: 224560
2014-12-19 01:39:46 +00:00
Adrian Prantl cf44e7870b Explain why LLVM is emitting a DW_AT_containing_type inside of a class.
llvm-svn: 224555
2014-12-19 00:01:20 +00:00
Matthias Braun 15abf3743c LiveIntervalAnalysis: Cleanup computeDeadValues
- This also fixes a bug introduced in r223880 where values were not
  correctly marked as Dead anymore.
- Cleanup computeDeadValues(): split up SubRange code variant, simplify
  arguments.

llvm-svn: 224538
2014-12-18 19:58:52 +00:00
Eric Christopher 661f2d1ca1 Add a new string member to the TargetOptions struct for the name
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.

Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.

llvm-svn: 224492
2014-12-18 02:20:58 +00:00
Matthias Braun 0a410f6243 RegisterCoalescer: Fix stripCopies() picking up main range instead of subregister range
This fixes a problem where stripCopies() would switch to values in the
main liverange when it crossed a copy instruction. However when joining
subranges we need to stay in the respective subregister ranges.

llvm-svn: 224461
2014-12-17 21:25:20 +00:00
Matthias Braun 8142efa8ea ExecutionDepsFix: Correctly handle wide registers.
The ExecutionDepsFix previously mapped each register to 1 or zero
registers of the register class it was called with and therefore
simulating liveness for.  This was problematic for cases involving wider
registers like Q0 on ARM where ExecutionDepsFix gets invoked for the Dxx
registers. In these cases the wide register would get mapped to the last
matching D register, while it should have been all matching D registers.
This commit changes the AliasMap to use a SmallVector to map registers
to potentially multiple destination regclass registers. This is required
to avoid regressions with subregister liveness tracking enabled.

llvm-svn: 224447
2014-12-17 19:13:47 +00:00
Michael Kuperstein 047b1a0400 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Toma Tabacu a23f13c3b0 [mips] Set GCC-compatible MIPS asssembler options before inline asm blocks.
Summary:
When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives,
while GCC uses the default options if an assembly-level function contains inline assembly code.

This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example).

This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6637

llvm-svn: 224425
2014-12-17 10:56:16 +00:00
Matthias Braun f4a72cd06e RegisterCoalescer: Sprinkle some const modifiers.
llvm-svn: 224409
2014-12-17 02:18:13 +00:00
Quentin Colombet fc2201e922 [CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.

Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.

This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224402
2014-12-17 01:36:17 +00:00
David Blaikie 8b979f01c6 PR21875: codegen for non-type template parameters of nullptr_t type
llvm-svn: 224399
2014-12-17 00:43:22 +00:00
Reid Kleckner 04b69f89aa Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building
ICU.

llvm-svn: 224397
2014-12-17 00:29:23 +00:00
Hans Wennborg 224cb82a39 SelectionDAG switch lowering: use 'unsigned' to count destination popularity
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.

Also fix a missing CHECK in the test case.

llvm-svn: 224393
2014-12-16 23:41:59 +00:00
Sanjay Patel 7129c10cae merge consecutive loads that are offset from a base address
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 

This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.

The codegen change in the new test case improves from:

vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0

To:

vmovups	32(%rdi), %ymm0

An existing test case is also improved from:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3

To:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1

This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).

Differential Revision: http://reviews.llvm.org/D6642

llvm-svn: 224379
2014-12-16 21:57:18 +00:00
Matt Arsenault dd3b77d64c Move lowerConstant to AsmPrinter
This was a static function before, and NVPTX duplicated it
because it wasn't exposed.

llvm-svn: 224354
2014-12-16 19:16:14 +00:00
Quentin Colombet d5e57b731f [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224351
2014-12-16 19:09:03 +00:00
Aaron Ballman 0d6a010c13 Fixing -Wsign-compare warnings; NFC.
llvm-svn: 224337
2014-12-16 14:04:11 +00:00
Matthias Braun 1aed6ffa35 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224313
2014-12-16 04:03:38 +00:00
Adrian Prantl b9fa945d51 ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294
2014-12-16 00:20:49 +00:00
Matthias Braun c3a72c2e5f Revert "LiveRangeCalc: Rewrite subrange calculation"
Revert until I find out why non-subreg enabled targets break.

This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7.

llvm-svn: 224278
2014-12-15 21:36:35 +00:00
Matthias Braun 0352201e15 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224272
2014-12-15 21:16:21 +00:00
Matthias Braun 42fab34ffb LiveRangeCalc: use more range based for loops; NFC
llvm-svn: 224263
2014-12-15 19:40:46 +00:00
Michael Ilseman addddc441f Silence more static analyzer warnings.
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.

llvm-svn: 224255
2014-12-15 18:48:43 +00:00
Akira Hatanaka 7ba78302b5 Rename argument strings of codegen passes to avoid collisions with command line
options.

This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.

This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.

#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));

rdar://problem/19212448

llvm-svn: 224186
2014-12-13 04:52:04 +00:00
Andrea Di Biagio d65fd9facd Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g."
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.

llvm-svn: 224126
2014-12-12 15:09:58 +00:00
Andrea Di Biagio 5634a54efc Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
Test 'misched-code-difference-with-debug.ll' was failing on some buildbots.

llvm-svn: 224121
2014-12-12 13:34:03 +00:00
Andrea Di Biagio 01236e3eca [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.

The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.

This patch fixes the issue in MISched and provides a test.

Patch by Russell Gallop!

llvm-svn: 224118
2014-12-12 12:41:22 +00:00
Ekaterina Romanova 90ff20d8f5 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. 
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. 
Added DW_OP_stack_value to the stack. 

Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881) 
is being investigated. 

llvm-svn: 224098
2014-12-12 05:11:47 +00:00
Philip Reames 60de8b29f7 Comment and minor code cleanup for GCStrategy (NFC)
Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works.

llvm-svn: 224086
2014-12-12 00:49:03 +00:00
Matt Arsenault 810cb62962 Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith d6f8e4b03c CodeGen: Stop using LeakDetector for MachineInstr
Since `MachineInstr` is required to have a trivial destructor, it cannot
remove itself from `LeakDetection`.  Remove the calls.

As it happens, this requirement is because `MachineFunction` allocates
all `MachineInstr`s in a custom allocator; when the `MachineFunction` is
destroyed they're dropped of the edge.  There's no benefit to detecting
leaks.

llvm-svn: 224061
2014-12-11 21:51:37 +00:00
Matthias Braun 7e37a5f523 [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.

llvm-svn: 224059
2014-12-11 21:26:47 +00:00
Rafael Espindola 01c73610d0 This reverts commit r224043 and r224042.
check-llvm was failing.

llvm-svn: 224045
2014-12-11 20:03:57 +00:00
Matthias Braun a7c82a9f1d [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

llvm-svn: 224042
2014-12-11 19:42:05 +00:00
Matthias Braun a4e932db16 [CodeGen] Let MachineVerifierPass own its banner string
llvm-svn: 224041
2014-12-11 19:41:51 +00:00
Patrik Hagglund cb06a36c9a Bugfix in InlineSpiller::traceSiblingValue().
Properly determine whether or not a phi was added by splitting.
Check against the current VNInfo of OrigLI instead of against the
OrigVNI argument.

Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet.

llvm-svn: 224009
2014-12-11 10:40:17 +00:00
Ekaterina Romanova 75fd123967 Reverting commit 223981, because the test that I added (incorrect-variable-debugloc1.ll) failed for llvm-ppc64.
The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change.  

llvm-svn: 224000
2014-12-11 06:22:35 +00:00
Philip Reames 1e30897497 GCStrategy should not own GCFunctionInfo
This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes.

There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well.

p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc.

Differential Revision: http://reviews.llvm.org/D6587

llvm-svn: 223994
2014-12-11 01:47:23 +00:00
Matthias Braun 09afa1ea74 LiveInterval: Use range based for loops for subregister ranges.
llvm-svn: 223991
2014-12-11 00:59:06 +00:00
Ekaterina Romanova ceeaba7932 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address.
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value.

Added DW_OP_stack_value to the stack.

-This line, and those below, will be ignored--

M    lib/CodeGen/AsmPrinter/DwarfDebug.cpp
A    test/DebugInfo/incorrect-variable-debugloc1.ll

llvm-svn: 223981
2014-12-10 23:19:56 +00:00
Matthias Braun 96761959d4 LiveInterval: Use more range based for loops for value numbers and segments.
llvm-svn: 223978
2014-12-10 23:07:54 +00:00
Aaron Ballman e5a2a0c9a8 Silencing a -Wsequence-point warning, and the resulting undefined behavior. NFC.
llvm-svn: 223926
2014-12-10 14:14:54 +00:00
Matthias Braun 96d7732b08 MachineVerifier: Allow physreg use if just a subreg is defined.
We can't mark partially undefined registers, so we have to allow reading
a register in the machine verifier if just parts of a register are
defined.

llvm-svn: 223896
2014-12-10 01:13:13 +00:00
Matthias Braun 21554d9b30 MachineVerifier: Allow LiveInterval segments to end at a partial write.
In the subregister liveness tracking case we do not create implicit
reads on partial register writes anymore, still we need to produce a new
SSA value for partial writes so the live segment has to end.

llvm-svn: 223895
2014-12-10 01:13:11 +00:00
Matthias Braun 279f83645c VirtRegMap: Improve block live-in info if subregister liveness is available.
llvm-svn: 223894
2014-12-10 01:13:08 +00:00
Matthias Braun d70caaf5a5 VirtRegMap: No implicit defs/uses for super registers with subreg liveness tracking.
Adding the implicit defs/uses to the superregisters is semantically questionable
but was not dangerous before as the register allocator never assigned the same
register to two overlapping LiveIntervals even when the actually live
subregisters do not overlap. With subregister liveness tracking enabled this
does actually happen and leads to subsequent bugs if we don't stop adding
the superregister defs/uses.

llvm-svn: 223892
2014-12-10 01:13:04 +00:00
Matthias Braun 587e27415d LiveRegMatrix: Respect subregister liveness when allocating registers.
llvm-svn: 223891
2014-12-10 01:13:01 +00:00
Matthias Braun a0f0c1f013 LiveIntervalUnion: Allow specification of liverange when unifying/extracting.
This allows it to add subregister ranges into the union.

llvm-svn: 223890
2014-12-10 01:12:59 +00:00
Matthias Braun 14f764c872 RegisterCoalescer: Preserve subregister liveranges.
llvm-svn: 223888
2014-12-10 01:12:52 +00:00
Matthias Braun 2079aa9140 LiveInterval: Add removeEmptySubRanges().
llvm-svn: 223887
2014-12-10 01:12:40 +00:00
Matthias Braun 8970d847c4 LiveIntervalAnalysis: Add subregister aware variants pruneValue().
llvm-svn: 223886
2014-12-10 01:12:36 +00:00
Matthias Braun e3d3b88cb9 Add a flag to enable/disable subregister liveness.
llvm-svn: 223884
2014-12-10 01:12:30 +00:00
Matthias Braun e5f861b781 LiveIntervalAnalysis: Adapt repairIntervalsInRange() to subregister liveness.
llvm-svn: 223883
2014-12-10 01:12:26 +00:00
Matthias Braun fe896c703c LiveRangeEdit: Adapt eliminateDeadDef() to subregister liveness.
llvm-svn: 223882
2014-12-10 01:12:23 +00:00
Matthias Braun 7044d69e87 LiveIntervalAnalysis: Adapt handleMove() to subregister ranges.
llvm-svn: 223881
2014-12-10 01:12:20 +00:00
Matthias Braun 20e1f38a41 LiveIntervalAnalysis: Update SubRanges in shrinkToUses().
llvm-svn: 223880
2014-12-10 01:12:18 +00:00
Matthias Braun 2f66232bde LiveIntervalAnalysis: Compute subregister ranges.
llvm-svn: 223878
2014-12-10 01:12:12 +00:00
Matthias Braun 3f1d8fdd33 LiveInterval: Add support to track liveness of subregisters.
This code adds the required data structures. Algorithms to compute it follow.

llvm-svn: 223877
2014-12-10 01:12:10 +00:00
Matthias Braun e62c207092 LiveInterval: Add a 'covers' operation to LiveRange.
llvm-svn: 223876
2014-12-10 01:12:06 +00:00
Philip Reames de226055ca Remove the Module pointer from GCStrategy and GCMetadataPrinter
In the current implementation, GCStrategy is a part of the ownership structure for the gc metadata which describes a Module. It also contains a reference to the module in question. As a result, GCStrategy instances are essentially Module specific.

I plan to transition away from this design. Instead, a GCStrategy will be owned by the LLVMContext. It will be a lightweight policy object which contains no information about the Modules or Functions involved, but can be easily reached given a Function.

The first step in this transition is to remove the direct Module reference from GCStrategy. This also requires removing the single user of this reference, the GCMetadataPrinter hierarchy. In theory, this will allow the lifetime of the printers to be scoped to the LLVMContext as well, but in practice, I'm not actually changing that. (Yet?)

An alternate design would have been to move the direct Module reference into the GCMetadataPrinter and change the keying of the owning maps to explicitly key off both GCStrategy and Module. I'm open to doing it that way instead, but didn't see much value in preserving the per Module association for GCMetadataPrinters.

The next change in this sequence will be to start unwinding the intertwined ownership between GCStrategy, GCModuleInfo, and GCFunctionInfo.

Differential Revision: http://reviews.llvm.org/D6566

llvm-svn: 223859
2014-12-09 23:57:54 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Juergen Ributzka 8bda738221 [CGP] Rewrite pattern match for splitBranchCondition to work with Values instead.
Rewrite the pattern match code to work also with Values instead with
Instructions only. Also remove the no longer need matcher (m_Instruction).

llvm-svn: 223797
2014-12-09 17:50:10 +00:00
Juergen Ributzka 194350a936 Revert "Move function to obtain branch weights into the BranchInst class. NFC."
This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare.

llvm-svn: 223795
2014-12-09 17:32:12 +00:00
Juergen Ributzka c1bbcbbd32 [CodeGenPrepare] Split branch conditions into multiple conditional branches.
This optimization transforms code like:
bb1:
  %0 = icmp ne i32 %a, 0
  %1 = icmp ne i32 %b, 0
  %or.cond = or i1 %0, %1
  br i1 %or.cond, label %TrueBB, label %FalseBB

into a multiple branch instructions like:

bb1:
  %0 = icmp ne i32 %a, 0
  br i1 %0, label %TrueBB, label %bb2
bb2:
  %1 = icmp ne i32 %b, 0
  br i1 %1, label %TrueBB, label %FalseBB

This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.

Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.

Reviewed by Jim Grosbach

This fixes rdar://problem/19034919.

llvm-svn: 223786
2014-12-09 16:36:13 +00:00
Owen Anderson 558012a3fc Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64.
llvm-svn: 223760
2014-12-09 06:50:39 +00:00
Hal Finkel c8cf2b88bc Handle early-clobber registers in the aggressive anti-dep breaker
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.

Fixes PR21452.

llvm-svn: 223727
2014-12-09 01:00:59 +00:00
Tom Stellard 3e01d47d98 MISched: Fix moving stores across barriers
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.

llvm-svn: 223717
2014-12-08 23:36:48 +00:00
Justin Bogner 61ba2e3996 InstrProf: An intrinsic and lowering for instrumentation based profiling
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.

The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.

Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.

Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.

llvm-svn: 223672
2014-12-08 18:02:35 +00:00
Hans Wennborg 08de833c1c SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.

Differential Revision: http://reviews.llvm.org/D6510

llvm-svn: 223566
2014-12-06 01:28:50 +00:00
Eric Christopher d1fb7e4590 These two calls were grabbing the same register info. Unify them.
llvm-svn: 223502
2014-12-05 19:23:55 +00:00
Ahmed Bougacha 55e3c2d9cf [CodeGenPrepare] Use variables for reused values. NFC.
llvm-svn: 223491
2014-12-05 18:04:40 +00:00
Hal Finkel 66d7791176 Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps"
Reverting this because, while it fixes the problem in the reduced test case, it
does not fix the problem in the full test case from the bug report.

llvm-svn: 223442
2014-12-05 02:07:35 +00:00
Hal Finkel d013d99fe0 Consider subregs when calling MI::registerDefIsDead for phys deps
The scheduling dependency graph is built bottom-up within each scheduling
region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti
dependencies, based on physical registers, to the SUs for instructions
based on those that come before them.

In the test case, we start before post-RA scheduling with a block that looks
like this:

...
	INLINEASM <...
andc $0,$0,$2
stdcx. $0,0,$3
bne- 1b
> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>>
	...
	%X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def>
	...
	%R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill>

where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a
subregister of %CR0. However, for post-RA scheduling, no dependency was added
to prevent the INLINEASM from being scheduled in between the ANDIo8 and the
ISEL (which communicate via the %CR0GT register).

In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd
iterate over all of its aliases (which include %CC itself and also %CR0), and
look for previously-encountered defs of those registers. We'd find the ANDIo8,
but decide not to add a dependency between the INLINEASM and the ANDIo8 because
both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is
dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a
subregister of %CR0, and thus a dependency still must exist.

To fix this problem, when calling registerDefIsDead on the SU with the def, we
also check all subregisters for possible non-dead defs, and add the dependency
if any are found.

Fixes PR21742.

llvm-svn: 223440
2014-12-05 01:57:22 +00:00
Adrian Prantl ab255fcd09 Cleanup: Calls to getDwarfRegNum() may actually fail, if there is
no DWARF register number mapping, or if the register was a virtual
register that was never materialized. Previously, we would just emit a
bogus location, after this patch we don't emit a location at all by
doing an early exit.

After my bugfix in r223401 today, this doesn't actually happen on any
target that I tested this with, but it's still preferable to make the
possibility of a failure explicit.

llvm-svn: 223428
2014-12-05 01:02:46 +00:00
Adrian Prantl da7e03f1bf Simplify implementation and testcase of r223401 based on feedback from dblaikie.
llvm-svn: 223405
2014-12-04 22:58:41 +00:00
Adrian Prantl a3ae0b3b5b Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() is
eliminating all uses of a vreg, update any DBG_VALUE describing that vreg
to point to the rematerialized register instead.

llvm-svn: 223401
2014-12-04 22:29:04 +00:00
Patrik Hagglund d06de4b954 Use DomTree in MachineSink to sink over diamonds.
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:

  x = computation
  if () {} else {}
  use x

The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.

Modified test added in r204522, due to one spill less present.

Minor fixes in comments.

Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.

llvm-svn: 223350
2014-12-04 10:36:42 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky f1de34b84d Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Matt Arsenault 4e27343eec Allow target to specify prefix for labels
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.

llvm-svn: 223323
2014-12-04 00:06:57 +00:00
Quentin Colombet 079aba733a [RegAllocFast] Handle implicit definitions conservatively.
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.

This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.

Fixes PR21700.

llvm-svn: 223317
2014-12-03 23:38:08 +00:00
Peter Collingbourne 51d2de7b9e Prologue support
Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

llvm-svn: 223189
2014-12-03 02:08:38 +00:00
Hal Finkel bbdee93638 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Philip Reames 72fbe7a6f0 Restructure some assertion checking based on post commit feedback by Aaron and Tom.
llvm-svn: 223150
2014-12-02 21:01:48 +00:00
Philip Reames f814a511da Appease a build bot complaining about an unused variable that's used in an assertion.
llvm-svn: 223142
2014-12-02 19:28:57 +00:00
Philip Reames 1a1bdb22bf [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Ahmed Bougacha 54b7d334c7 [MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
  subs  ... %NZCV<imp-def>        <- CSMI
  csinc ... %NZCV<imp-use,kill>   <- this kill flag isn't valid anymore
  subs  ... %NZCV<imp-def>        <- MI, to be eliminated
  csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.

Also, add an exhaustive testcase for the motivating example.

Reviewed by: Juergen Ributzka <juergen@apple.com>

llvm-svn: 223133
2014-12-02 18:09:51 +00:00
Philip Reames 0365f1a376 [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)

The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.

Reviewed by: atrick, ributzka

llvm-svn: 223085
2014-12-01 22:52:56 +00:00
Ahmed Bougacha fb6eeb74c5 [MachineVerifier] Accept a MBB with a single landing pad successor.
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.

llvm-svn: 223059
2014-12-01 18:43:53 +00:00
Hans Wennborg 5bef5b522b Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg 1571336fb2 SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

llvm-svn: 223049
2014-12-01 17:08:32 +00:00
Akira Hatanaka b9991a2656 [stack protector] Set edge weights for newly created basic blocks.
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766

llvm-svn: 222987
2014-12-01 04:27:03 +00:00
Hans Wennborg 6dfb041ffc Switch lowering: reformat some for loops etc. NFC
llvm-svn: 222962
2014-11-29 21:24:12 +00:00
Hans Wennborg 6c42d1a5de Switch lowering: Fix broken 'Figure out which block is next' code
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

llvm-svn: 222961
2014-11-29 21:17:05 +00:00
Simon Pilgrim 2bfd9129f4 Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.

llvm-svn: 222960
2014-11-29 19:18:21 +00:00
Duncan P. N. Exon Smith 9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky 6de72e5b0c Converted back to Unix format (after my last commit 222632)
llvm-svn: 222636
2014-11-23 15:21:53 +00:00
Elena Demikhovsky 9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Manman Ren f0a582bada Debug Info: revert r222195, r222210 and r222239.
This is no longer needed after David's fix at r222377 + r222485.
rdar://18958417

llvm-svn: 222563
2014-11-21 19:55:23 +00:00
Manman Ren c98ec0e70a [Objective-C] Support a new special module flag that will be put into the
objc_imageinfo struct.

rdar://17954668

llvm-svn: 222558
2014-11-21 19:24:55 +00:00
Sanjay Patel eb4a4d5aeb Don't repeat class/function/variable names in comments. NFC.
llvm-svn: 222555
2014-11-21 18:58:38 +00:00
Sanjay Patel b06441aded Less space; NFC
llvm-svn: 222546
2014-11-21 18:05:59 +00:00
Andrea Di Biagio 0225b5bf6f [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Andrea Di Biagio 26e8f4d166 [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu 44e5d7a131 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Matthias Braun 745ea43687 RegisterCoalescer: Improve debug messages
- Show "Considering..." message after flipping so you actually see the final
  destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
  a list of which register got merged with which.

llvm-svn: 222382
2014-11-19 19:46:17 +00:00
Matthias Braun d2f4c77800 Add a print and verify pass after the RegisterCoalescer
llvm-svn: 222381
2014-11-19 19:46:15 +00:00
Matthias Braun 47760d9667 MachineVerifier: Report register for bad liveranges
llvm-svn: 222380
2014-11-19 19:46:13 +00:00
Matthias Braun 9f87d75060 Introduce register dump helper
llvm-svn: 222379
2014-11-19 19:46:11 +00:00
Simon Pilgrim 3ac3b251a9 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
David Blaikie 5106ce7897 Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)

Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.

llvm-svn: 222319
2014-11-19 05:49:42 +00:00
Owen Anderson b5a259935c Fix an incorrect chain operand when expanding INSERT_VECTOR operations through the stack.
Patch by Daniil Troshkov!

llvm-svn: 222254
2014-11-18 20:50:19 +00:00
Frederic Riss fdccfc1e19 Allow DwarfCompileUnit::constructImportedEntityDIE to instanciate a GlobalVariable DIE.
Usually global variables are in a retain list and instanciated before
any call to constructImportedEntityDIE is made. This isn't true for
forward declarations though.
The testcase for this change is generated by a clang patched to emit
such forward declarations (patch at http://reviews.llvm.org/D6173
which will land soon). The updated testcase tests more than just
global variables, it now tests every type of 'using' clause we
support.

llvm-svn: 222217
2014-11-18 02:46:11 +00:00
Manman Ren 554865da5b Debug Info: In DIBuilder, the context field of a global variable is updated to
use DIScopeRef.

A paired commit at clang will follow to show cases where we will use an
identifer for the context of a global variable.

rdar://18958417

llvm-svn: 222195
2014-11-18 00:29:08 +00:00
Oliver Stannard d29db9b949 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Craig Topper f98c606479 Add missing semicolon from r222118.
llvm-svn: 222119
2014-11-17 05:58:26 +00:00
Craig Topper cf0444ba2a Move register class name strings to a single array in MCRegisterInfo to reduce static table size and number of relocation entries.
Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table.

llvm-svn: 222118
2014-11-17 05:50:14 +00:00
Craig Topper 6438fc3d05 Replace a couple asserts with static_asserts.
llvm-svn: 222114
2014-11-17 00:26:50 +00:00
Craig Topper 7f416c8acb Convert some EVTs to MVTs where only a SimpleValueType is needed.
llvm-svn: 222109
2014-11-16 21:17:18 +00:00
Andrea Di Biagio e13a0b81f4 [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Reid Kleckner c2291f3905 Rename EH related stuff to be more precise
Summary:
The current "WinEH" exception handling type is more about Itanium-style
LSDA tables layered on top of the Windows native unwind info format
instead of .eh_frame tables or EHABI unwind info. Use the name
"ItaniumWinEH" to better reflect the hybrid nature of the design.

Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions,
since the LSDA is part of the Itanium C++ ABI document, and not the
DWARF standard.

Reviewers: echristo

Subscribers: llvm-commits, compnerd

Differential Revision: http://reviews.llvm.org/D6279

llvm-svn: 222062
2014-11-14 23:31:07 +00:00
Reid Kleckner 283bc2ed28 Allow the use of functions as typeinfo in landingpad clauses
This is one step towards supporting SEH filter functions in LLVM.

llvm-svn: 221954
2014-11-14 00:35:50 +00:00
Reid Kleckner 971c3ea67b Use nullptr instead of NULL for variadic sentinels
Windows defines NULL to 0, which when used as an argument to a variadic
function, is not a null pointer constant. As a result, Clang's
-Wsentinel fires on this code. Using '0' would be wrong on most 64-bit
platforms, but both MSVC and Clang make it work on Windows. Sidestep the
issue with nullptr.

llvm-svn: 221940
2014-11-13 22:55:19 +00:00
Aditya Nandakumar 3053155652 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Aditya Nandakumar a27193297f This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
2014-11-13 09:26:31 +00:00
Frederic Riss 0f7abef2cf Add an assert and a test that verify r221709's fix.
llvm-svn: 221854
2014-11-13 03:20:23 +00:00
Quentin Colombet f5485bb008 [CodeGenPrepare] Handle zero extensions in the TypePromotionHelper.
Prior to this patch the TypePromotionHelper was promoting only sign extensions.
Supporting zero extensions changes:
- How constants are extended.
- How sign extensions, zero extensions, and truncate are composed together.
- How the type of the extended operation is recorded. Now we need to know the
  kind of the extension as well as its type.

Each change is fairly small, unlike the diff.
Most of the diff are comments/variable renaming to say "extension" instead of
"sign extension".

The performance improvements on the test suite are within the noise.

Related to <rdar://problem/18310086>.

llvm-svn: 221851
2014-11-13 01:44:51 +00:00
Frederic Riss 3a6b354b3e Fix emission of Dwarf accelerator table when there are multiple CUs.
The DIE offset in the accel tables is an offset relative to the start
of the debug_info section, but we were encoding the offset to the
start of the containing CU.

llvm-svn: 221837
2014-11-12 23:48:14 +00:00
Ahmed Bougacha 026600d967 [CodeGenPrepare] Replace other uses of EVT::getEVT with TL::getValueType.
r221820 fixed a problem (PR21548) where an iPTR was used in TLI legality checks,
which isn't valid and resulted in a failed assertion.
The solution was to lower pointer types into the correct target's VT, by
using TL::getValueType instead of EVT::getEVT.

This commit changes 3 other uses of EVT::getEVT, but without any tests:
- One of these non-lowered EVTs is passed to allowsMisalignedMemoryAccesses,
which goes into target's TL implementation and doesn't cause any problem (yet.)
- Two others are passed to TLI.isOperationLegalOrCustom:
  - one only looks at extensions, so doesn't concern pointers.
  - one only looks at binary operators, so also isn't a problem.

The latter might some day be exposed to pointers and cause the same assert as
the original PR, because there's a comment hinting at also supporting cast ops.

For consistency, update all of them and be done with it.

llvm-svn: 221827
2014-11-12 23:05:03 +00:00
Ahmed Bougacha 0788d49a40 [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead.
Fixes PR21548.  Related to PR20474.

llvm-svn: 221820
2014-11-12 22:16:55 +00:00
Timur Iskhodzhanov 0e76a16200 Temporary fix for PR21528 - use mangled C++ function names in COFF debug info to un-break ASan on Windows
llvm-svn: 221813
2014-11-12 20:21:20 +00:00
Timur Iskhodzhanov a11b32b7e5 [COFF] Make it clearer that the symbols subsection holds function display name rather than just name
llvm-svn: 221812
2014-11-12 20:10:09 +00:00
Duncan P. N. Exon Smith de36e8040f Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Tom Roeder 6312f4a422 Fix build break: remove unused variable in FCFI.
llvm-svn: 221710
2014-11-11 21:26:33 +00:00
Frederic Riss 8ad4f498fb Totally forget deallocated SDNodes in SDDbgInfo.
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893

Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.

llvm-svn: 221709
2014-11-11 21:21:08 +00:00
Tom Roeder eb7a303d1b Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
2014-11-11 21:08:02 +00:00
Oliver Stannard 8c2c67e63c LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Saleem Abdulrasool d2c5d7f6da Transforms: address some late comments
We already use the llvm namespace.  Remove the unnecessary prefix.  Use the
StringRef::equals method to compare with C strings rather than instantiating
std::strings.

Addresses late review comments from David Majnemer.

llvm-svn: 221564
2014-11-08 00:00:50 +00:00
Saleem Abdulrasool 5898e09057 Transform: add SymbolRewriter pass
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.

It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.

The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.

llvm-svn: 221548
2014-11-07 21:32:08 +00:00
Lang Hames cdd9077f3a [RegAlloc] Kill off the trivial spiller - nobody is using it any more.
llvm-svn: 221474
2014-11-06 19:12:38 +00:00
Rafael Espindola 26cfbea738 Compute the correct jump table entries on 32 bit windows.
On 32 bit windows we use label differences and .set does not suppress
rolocations, a combination that was not used before r220256.

This fixes PR21497.

llvm-svn: 221456
2014-11-06 14:39:49 +00:00
Rafael Espindola 83f0ea8982 Add three other sections when L symbols are allowed.
llvm-svn: 221436
2014-11-06 05:01:21 +00:00
Rafael Espindola bf77ed6826 Allow L symbols in no_dead_strip sections.
If a section cannot be dead stripped, it is safe to use L symbols, since
the linker will keep all of it in the end.

llvm-svn: 221431
2014-11-06 02:42:03 +00:00
Duncan P. N. Exon Smith c5754a65e6 IR: MDNode => Value: NamedMDNode::getOperator()
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`.  To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change.  This
is part of PR21433.

Note that there's a follow-up patch to clang for the API change.

llvm-svn: 221375
2014-11-05 18:16:03 +00:00
Andrea Di Biagio ce46b97b48 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Craig Topper 12f0d9ef2c Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves.
llvm-svn: 221336
2014-11-05 06:43:02 +00:00
David Blaikie 3a443c29b9 Provide gmlt-like inline scope information in the skeleton CU to facilitate symbolication without needing the .dwo files
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.

A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.

The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.

But for now, these sizes seem small enough to go ahead with this.

Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.

llvm-svn: 221306
2014-11-04 22:12:25 +00:00
David Blaikie 9bfd7a9f43 Move cross-unit DIE caching to the DwarfFile level, so it doesn't interfere with fission-gmlt data and produce skeleton<>full unit cross referencing.
llvm-svn: 221305
2014-11-04 22:12:18 +00:00
Arnaud A. de Grandmaison a11cab3120 [PBQP] Callee saved regs should have a higher cost than scratch regs
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.

Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.

llvm-svn: 221293
2014-11-04 20:51:29 +00:00
Arnaud A. de Grandmaison 829dd81377 [PBQP] Tweak spill costs and coalescing benefits
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
 - coalescing (or any other "side effect" of reg alloc) is negative, and
   instead of being derived from a spill cost, they use the block
   frequency info.
 - spill costs are in the [MinSpillCost:+inf( range
 - register or interference costs are in [0.0:MinSpillCost( or +inf

The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.

The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.

llvm-svn: 221292
2014-11-04 20:51:24 +00:00
David Majnemer b925715c56 CodeGen: Enable DWARF emission for MS ABI targets
This is experimental, just barely enough to get things to not
immediately combust.

A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.

Even with this in mind, we believe we are having trouble with SECREL
relocations.

llvm-svn: 221245
2014-11-04 08:03:31 +00:00
Sanjoy Das e839965faa The patchpoint lowering logic would crash with live constants equal to
the tombstone or empty keys of a DenseMap<int64_t, T>.  This patch
fixes the issue (and adds a tests case).

llvm-svn: 221214
2014-11-04 00:59:21 +00:00
Sanjoy Das 429c9cae09 Change logic in StackMaps::recordStackMapOpers to use the isInt<32>
predicate instead of bitwise operations.

This is not a functional change.

llvm-svn: 221209
2014-11-04 00:06:57 +00:00
David Blaikie 5b02a19f90 Use common range handling for the CU's ranges
This generalizes the range handling for ranges in both the skeleton and
full unit, laying the foundation for the addition of more ranges (rather
than just the CU's special case) in the skeleton CU with fission+gmlt.

llvm-svn: 221202
2014-11-03 23:10:59 +00:00
David Blaikie 542616d47c Push the CURangeList down into the skeleton CU (where available) rather than the full CU
So that it may be shared between skeleton/full compile unit, for CU
ranges and other ranges to be added for fission+gmlt.

(at some point we might want some kind of object shared between the
skeleton and full compile units for all those things we only want one of
in that scope, rather than having the full unit always look through to
the skeleton... - alternatively, we might be able to have the skeleton
pointer (or another, separate pointer) point to the skeleton or to the
unit itself in non-fission, so we don't have to special case its
absence)

llvm-svn: 221186
2014-11-03 21:52:56 +00:00
David Blaikie ce343492ee Add DwarfCompileUnit::BaseAddress to track the base address used by relative addressing in debug_ranges and debug_loc
This is one of a few steps to generalize range handling to include the
CU range (thus the CU's range list will be moved into the range list
list, losing track of the base address in the process), which means
generalizing ranges from both the skeleton and full unit under fission.

And... then I can used that generalized support for ranges in
fission+gmlt where there'll be a bunch more ranges in the skeleton.

llvm-svn: 221182
2014-11-03 21:15:30 +00:00
Paul Robinson ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
David Blaikie 077ad48447 Cleanup some unused or trivial functions in DwarfCompileUnit
llvm-svn: 221164
2014-11-03 17:10:38 +00:00
David Blaikie bc532b44a0 Sink DwarfUnit::CURanges into DwarfCompileUnit
llvm-svn: 221161
2014-11-03 16:40:43 +00:00
Oliver Stannard cf6bfb1dd0 Revert r221150, as it broke sanitizer tests
llvm-svn: 221151
2014-11-03 12:19:03 +00:00
Oliver Stannard 652ec6ee89 Emit .eh_frame with relocations to functions, rather than sections
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):

  A symbol table entry with STB_LOCAL binding that is defined relative to one
  of a group's sections, and that is contained in a symbol table section that is
  not part of the group, must be discarded if the group members are discarded.
  References to this symbol table entry from outside the group are not allowed.

This means that we need to use the function symbol for the relocation, not a
temporary symbol.

There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.

llvm-svn: 221150
2014-11-03 12:02:51 +00:00
David Blaikie 89a26f012a Sink range list handling down from DwarfUnit into its only use, in DwarfCompileUnit.
llvm-svn: 221123
2014-11-03 02:41:49 +00:00
David Blaikie 4aa49b2054 Formatting
llvm-svn: 221095
2014-11-02 08:52:37 +00:00
David Blaikie cafd962d97 Add DwarfUnit::isDwoUnit and use it to generalize string creation
Currently we only need to emit skeleton strings into the CU header and
we do this by explicitly calling "addLocalString". With gmlt-in-fission,
we'll be emitting a bunch of other strings from other codepaths where
it's not statically known that these strings will be local or not.

Introduce a virtual function to indicate whether this unit is a DWO unit
or not (I'm not sure if we have a good term for this, the
opposite/alternative to 'skeleton' unit) and use that to generalize the
string emission logic so that strings can be correctly emitted in both
the skeleton and dwo unit when in split dwarf mode.

And to demonstrate that this works, switch the existing special callers
of addLocalString in the skeleton builder to addString - and they still
work. Yay.

llvm-svn: 221094
2014-11-02 08:51:37 +00:00
David Blaikie 279c451c0b Remove the last mention of LineTablesOnly from DwarfUnit, sinking it into DwarfCompileUnit
This is a useful distinction/invariant/delination to make because
LineTablesOnly mode is never relevant to type units, so it's clear that
we're not doing weird line-tables-only-with-types by making this API
choice.

It also lays the foundations nicely for adding gmlt-like data to fission
skeleton CUs while limiting the effects to CUs and not TUs.

llvm-svn: 221093
2014-11-02 08:18:06 +00:00
David Blaikie 3363a57c8e Sink DwarfUnit::applySubprogramAttributesToDefinition into DwarfCompileUnit
llvm-svn: 221092
2014-11-02 08:09:09 +00:00
David Blaikie 978020807a Sink DwarfUnit::addExpr into DwarfCompileUnit
llvm-svn: 221090
2014-11-02 07:11:55 +00:00
David Blaikie 8c485b5d74 Fix the build from the last commit
llvm-svn: 221089
2014-11-02 07:08:12 +00:00
David Blaikie 02a6333ba7 Sink DwarfUnit::applyVariableAttributes into DwarfCompileUnit
llvm-svn: 221088
2014-11-02 07:06:51 +00:00
David Blaikie 4bc0881ac7 Sink DwarfUnit::addLocationList down into DwarfCompileUnit
llvm-svn: 221087
2014-11-02 07:03:19 +00:00
David Blaikie 77895fb276 Sink DwarfUnit::addComplexAddress down into DwarfCompileUnit
llvm-svn: 221086
2014-11-02 06:58:44 +00:00
David Blaikie f7435ee6ce Push DwarfUnit::addAddress down into DwarfCompileUnit
llvm-svn: 221085
2014-11-02 06:46:40 +00:00
David Blaikie 7d48be2b7b Sink DwarfUnit::addVariableAddress into DwarfCompileUnit since type units don't have variables
llvm-svn: 221084
2014-11-02 06:37:23 +00:00
David Blaikie 192b45c1ef DebugInfo: Sink accelerator table lists down (GlobalNames/Types) into DwarfCompileUnit
llvm-svn: 221083
2014-11-02 06:16:39 +00:00
David Blaikie 98cf172175 Add DwarfUnit::addGlobalType to match DwarfUnit::addGlobalName
(these will shortly become virtual, with a null implementation in
DwarfUnit (since type units don't have accelerator tables in the current
schema) and the current implementation down in DwarfCompileUnit, moving
the actual maps there too)

llvm-svn: 221082
2014-11-02 06:06:14 +00:00
David Blaikie 871c2d9d63 DebugInfo: Refactor index type DIE initialization by rolling it into the accessor
llvm-svn: 221080
2014-11-02 03:09:13 +00:00
David Blaikie ce47366150 Be sure to initialize DwarfCompileUnit::LabelBegin now that it may be skipped in initSection
llvm-svn: 221079
2014-11-02 02:40:26 +00:00