Commit Graph

2322 Commits

Author SHA1 Message Date
Chad Rosier f7cd8ea71f [AArch64] This check is specific to merging instructions. NFC.
llvm-svn: 260283
2016-02-09 21:20:12 +00:00
Geoff Berry 173b14db7c [AArch64] AArch64LoadStoreOptimizer: fix bug in pre-inc check iterator
Summary:
Fix case where a pre-inc/dec load/store would not be formed if the
add/sub that forms the inc/dec part of the operation was the first
instruction in the block being examined.

Reviewers: mcrosier, jmolloy, t.p.northover, junbuml

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16785

llvm-svn: 260275
2016-02-09 20:47:21 +00:00
Chad Rosier cc5d61f98e [AArch64] Bail even earlier if the instructions modifieds the base register. NFC.
llvm-svn: 260274
2016-02-09 20:44:41 +00:00
Chad Rosier 1c44c598dd [AArch64] Simplify. NFC.
llvm-svn: 260273
2016-02-09 20:27:45 +00:00
Chad Rosier 87e3341ff6 [AArch64] Add an assert to ensure we don't scale an offset that can't be scaled.
llvm-svn: 260272
2016-02-09 20:18:07 +00:00
Chad Rosier 3f8b09da3f [AArch64] Add a FIXME about invalid KILL markers after the ld/st opt pass.
llvm-svn: 260264
2016-02-09 19:42:19 +00:00
Chad Rosier c46ef8876b [AArch64] Remove redundant calls and clang format. NFC.
llvm-svn: 260260
2016-02-09 19:33:42 +00:00
Chad Rosier 11eedc98af [AArch64] Hoist now common logic. NFC.
llvm-svn: 260257
2016-02-09 19:17:18 +00:00
Chad Rosier d7363db659 [AArch64] Rename variable to make it clear we're merging here, not pairing.
llvm-svn: 260256
2016-02-09 19:09:22 +00:00
Chad Rosier b5933d7bde [AArch64] Separage the codegen logic for widening vs. pairing. NFC.
llvm-svn: 260249
2016-02-09 19:02:12 +00:00
Chad Rosier 24c46ad50f [AArch64] Cleanup to simplify logic when widening vs. pairing loads/stores. NFC.
The logic to pair instructions and merge narrow instructions has become cloogy
and error prone.  This patch beings to unravel these two similar, but distinct
optimizations.

llvm-svn: 260242
2016-02-09 18:10:20 +00:00
Chad Rosier 5c6a66ce34 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 260228
2016-02-09 15:59:57 +00:00
Chad Rosier 4f28e50dc8 [AArch64] Remove stale comment.
llvm-svn: 260226
2016-02-09 15:51:33 +00:00
Tim Northover e316f76222 AArch64: match correct order in subtraction pattern.
The accumulator in multiply-and-subtract instructions is actually subtracted
*from* so these patterns were computing the wrong value.

llvm-svn: 260131
2016-02-08 19:33:18 +00:00
Evandro Menezes d761ca2308 [AArch64] Add the scheduling model for Exynos-M1
Summary:
Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A).


Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover

Subscribers: aemerson, rengolin, MatzeB

Differential Revision: http://reviews.llvm.org/D16644

llvm-svn: 259958
2016-02-06 00:01:41 +00:00
Jun Bum Lim 1de2d44dcf [AArch64] Refactoring aarch64-ldst-opt. NCF.
Remove narrow load / store instructions from getMatchingPairOpcode(),
and add getMatchingWideOpcode().

llvm-svn: 259914
2016-02-05 20:02:03 +00:00
Renato Golin 6274e5222d Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."
This reverts commit r259812 as it broke AArch64 self-hosting.

llvm-svn: 259881
2016-02-05 12:14:30 +00:00
Chad Rosier 35706ad6bb [AArch64] Bound the number of instructions we scan when searching for updates.
This only impacts the creation of pre-/post-index instructions.  The bound was
set high enough such that it did not change code generation for SPEC200X.

llvm-svn: 259828
2016-02-04 21:26:02 +00:00
Chad Rosier 05f8020cdf [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769 and r259790.  The tramp3d failure was caused
by an incorrect refactoring in the patch.  Specifically, we weren't always
properly clearing the SExtIdx flag.

llvm-svn: 259812
2016-02-04 18:59:49 +00:00
Silviu Baranga 33b3bd17dd [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:

(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:

(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.

Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.

Patch by Chris Diamand!

llvm-svn: 259800
2016-02-04 16:47:09 +00:00
Chad Rosier 18896c0f5e Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r259790. tramp3d-v4 is still having problems.

llvm-svn: 259795
2016-02-04 16:01:40 +00:00
Chad Rosier feec2aeb0f [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure.  I'm unable to reproduce the issue at this time.

llvm-svn: 259790
2016-02-04 14:42:55 +00:00
Chad Rosier 1142f3cf90 [AArch64] Add a FIXME comment.
llvm-svn: 259515
2016-02-02 15:22:55 +00:00
Chad Rosier bba881ef3d [AArch64] Allocate the modified and used regs only once per function.
llvm-svn: 259510
2016-02-02 15:02:30 +00:00
Chad Rosier dbdb1d6eaf Move comments a bit closer to associated code. NFC.
llvm-svn: 259411
2016-02-01 21:38:31 +00:00
Chad Rosier 064261da16 Remove extra semicolon. NFC.
llvm-svn: 259402
2016-02-01 20:54:36 +00:00
Balaram Makam 92431703d7 AArch64: Implement missed conditional compare sequences.
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB

Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16291

llvm-svn: 259387
2016-02-01 19:13:07 +00:00
Geoff Berry 29d4a695f4 [AArch64] Simplify prolog/epilog callee save/restore. NFC.
Summary:
Factor out common code for callee-save register pair calculation.  This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Depends on D16732

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16734

llvm-svn: 259384
2016-02-01 19:07:06 +00:00
Geoff Berry 04bf91a8c1 [AArch64] Simplify callee-save register save/restore. NFC.
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.

This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16732

llvm-svn: 259365
2016-02-01 16:29:19 +00:00
Ahmed Bougacha 53010a0d5b [AArch64] Fix i64 nontemporal high-half extraction.
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.

When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.

llvm-svn: 259134
2016-01-29 01:08:41 +00:00
Chad Rosier 3ada75f7e8 [AArch64] Set MMOs on pre- and post-index instructions.
Without the MMOs the MI scheduler is unable to reason about the dependencies of
these instructions.

llvm-svn: 259052
2016-01-28 15:38:24 +00:00
Benjamin Kramer f9172fd4ac Rename TargetSelectionDAGInfo into SelectionDAGTargetInfo and move it to CodeGen/
It's a SelectionDAG thing, not a Target thing.

llvm-svn: 258939
2016-01-27 16:32:26 +00:00
Benjamin Kramer b3e8a6d2b8 Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs.
llvm-svn: 258917
2016-01-27 10:01:28 +00:00
Chris Bieneman e49730d4ba Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
Benjamin Kramer f57c1977c1 Reflect the MC/MCDisassembler split on the include/ level.
No functional change, just moving code around.

llvm-svn: 258818
2016-01-26 16:44:37 +00:00
Junmo Park 3ca3e192d0 Silence a -Wparentheses warning; NFC.
llvm-svn: 258676
2016-01-25 10:17:17 +00:00
Aaron Ballman add830b5d1 Silence a -Wparentheses warning; NFC.
llvm-svn: 258626
2016-01-23 15:42:21 +00:00
Matthias Braun 327bca776c Inline variable into assert
Seems like some compilers still give unused variable warnings for
bool var = ...;
(void)var;
so I have to inline the variable.

llvm-svn: 258619
2016-01-23 06:49:29 +00:00
NAKAMURA Takumi 9974fa9c8c AArch64ISelLowering.cpp: Fix a warning. [-Wunused-variable]
llvm-svn: 258618
2016-01-23 06:34:59 +00:00
Matthias Braun fdef49b183 AArch64ISel: Fix ccmp code selection matching deep expressions.
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.

Also rename some variable to better reflect their usage.

llvm-svn: 258605
2016-01-23 04:05:22 +00:00
Matthias Braun 985bdf9084 AArch64ISelLowering: Reduce maximum recursion depth of isConjunctionDisjunctionTree()
This function will exhibit exponential runtime (2**n) so we should
rather use a lower limit.

llvm-svn: 258604
2016-01-23 04:05:18 +00:00
Matthias Braun fd13c14669 Fix wrong indentation
llvm-svn: 258603
2016-01-23 04:05:16 +00:00
Ahmed Bougacha 78d6efdb93 [AArch64] Simplify emitConditionalCompare calls. NFC.
Now that both callsites are identical, we can simplify the
prototype and make it easier to reason about the 2-CC case.

llvm-svn: 258534
2016-01-22 19:43:57 +00:00
Ahmed Bougacha 99209b90a4 [AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs.
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.

Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
  %cc1 = fcmp one float %a, %b
  %cc2 = fcmp olt float %c, %d
  %and = and i1 %cc1, %cc2
  %r = select i1 %and, i32 %e, i32 %f
  ret i32 %r
}

Assuming (%a < %b) and (%c < %d); we used to do:
  fcmp  s0, s1            # nzcv <- 1000
  orr   w8, wzr, #0x1     # w8 <- 1
  csel  w9, w8, wzr, mi   # w9 <- 1
  csel  w8, w8, w9, gt    # w8 <- 1
  fcmp  s2, s3            # nzcv <- 1000
  cset   w9, mi           # w9 <- 1
  tst    w8, w9           # (w8 & w9) == 1, so: nzcv <- 0000
  csel  w0, w0, w1, ne    # w0 <- w0

We now do:
  fcmp  s2, s3            # nzcv <- 1000
  fccmp s0, s1, #0, mi    #  mi, so: nzcv <- 1000
  fccmp s0, s1, #8, le    # !le, so: nzcv <- 1000
  csel  w0, w0, w1, pl    # !pl, so: w0 <- w1

In other words, we transformed:
  (c < d) &&  ((a < b) || (a > b))
into:
  (c < d) &&   (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
  (c < d) && !((a u>= b) && (a u<= b))

Note that this problem doesn't occur in the test-suite.

changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
  In general we can create code for arbitrary "... (and (and A B) C)"
  sequences.  We can also implement some "or" expressions, because
  "(or A B)" is equivalent to "not (and (not A) (not B))" and we can
  implement some  negation operations. [...] However there is no way
  to negate the result of a partial sequence.

Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
    == ((a olt b) || (a ogt b))
    == ((a ord b) && (a une b))
- (a ueq b)
    == ((a uno b) || (a oeq b))
    == ((a ule b) && (a uge b))

Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
  (a || b)
  = !(!a && !(b))
  = !(!a && !(b1 || b2))  <- changeFPCCToAArch64CC(b, b1, b2)
  = !(!a && !b1 && !b2)

However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:

  (a || b)
  = !(!a && (!b))
  = !(!a && (b1 && b2))   <- changeFPCCToANDAArch64CC(!b, b1, b2)
  = !(!a && b1 && b2)

This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.

llvm-svn: 258533
2016-01-22 19:43:54 +00:00
Ahmed Bougacha 6345b9ecfa [AArch64] Assert that CCMP isel didn't fail inconsistently.
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.

llvm-svn: 258532
2016-01-22 19:43:43 +00:00
Pirama Arumuga Nainar 71e9a2a4c4 Do not lower VSETCC if operand is an f16 vector
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type.  This patch skips
lowering of VSETCC if the operand is an f16 vector.

v4 and v8 tests included.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15361

llvm-svn: 258471
2016-01-22 01:16:57 +00:00
Keith Walker 8c44bf1b89 Write AArch64 big endian data fixup entries as BE.
There was support for writing the AArch64 big endian data fixup entries in
the .eh_frame section in BE.    This is changed to write all such fixup
entries in BE with no restriction on the section.  This is similar to
the existing support for fixup entries for ARM.

A test is added to check the length field in the .debug_line section as
this is an example of where such a fixup occurs.

Differential Revision: http://reviews.llvm.org/D16064

llvm-svn: 258320
2016-01-20 15:59:14 +00:00
Oliver Stannard f7696f8267 [AArch64] Fix two bugs in the .inst directive
The AArch64 .inst directive was implemented using EmitIntValue, which resulted
in both $x and $d (code and data) mapping symbols being emitted at the same
address. This fixes it to only emit the $x mapping symbol.

EmitIntValue also emits the value in big-endian order when targeting big-endian
systems, but instructions are always emitted in little-endian order for
AArch64.

Differential Revision: http://reviews.llvm.org/D16349

llvm-svn: 258308
2016-01-20 12:54:31 +00:00
Eduard Burtescu 23c4d83aa3 [NFC] Replace several manual GEP loops with gep_type_iterator.
Reviewers: dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16335

llvm-svn: 258262
2016-01-20 00:26:52 +00:00
Chad Rosier 5c72966ea3 [AArch64] Remove a bunch of useless FIXME comments.
llvm-svn: 258193
2016-01-19 21:47:24 +00:00
Chad Rosier b11c82d3e2 [AArch64] Remove more dead code after r258093.
llvm-svn: 258191
2016-01-19 21:27:05 +00:00
Eduard Burtescu 19eb03106d [opaque pointer types] [NFC] GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.

GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.

Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16275

llvm-svn: 258145
2016-01-19 17:28:00 +00:00
Chad Rosier 401a4ab8d8 Typo.
llvm-svn: 258137
2016-01-19 16:50:45 +00:00
Chad Rosier 234bf6fe5c [AArch64] Remove unused arguments. NFC.
AFAICT, these have been unused since the initial backend import.

llvm-svn: 258093
2016-01-18 21:56:40 +00:00
Manuel Jacob 5f6eaac611 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Manman Ren 4632e8e625 CXX_FAST_TLS calling convention: fix issue on AArch64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257929
2016-01-15 20:13:28 +00:00
Weiming Zhao 038393bba0 Fix AArch64ConditionOptimizer
Summary:
This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL.
Modifying CMP will have side effect on CSEL.

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16147

llvm-svn: 257844
2016-01-15 00:06:58 +00:00
Rui Ueyama da00f2fdf4 Update to use new name alignTo().
llvm-svn: 257804
2016-01-14 21:06:47 +00:00
Ahmed Bougacha dfc77357a0 [AArch64] Don't assume extractelt constant index when matching shuffle.
llvm-svn: 257735
2016-01-14 02:12:30 +00:00
Rafael Espindola 8340f94df1 Convert a few assert failures into proper errors.
Fixes PR25944.

llvm-svn: 257697
2016-01-13 22:56:57 +00:00
Haicheng Wu 08b9462540 [AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path
Allow fadd/fmul to be reassociated in aarch64.

llvm-svn: 257024
2016-01-07 04:01:02 +00:00
Philip Reames c86ed0055d Extract helper function to merge MemoryOperand lists [NFC]
In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder.

I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface.

Differential Revision: http://reviews.llvm.org/D15757

llvm-svn: 256909
2016-01-06 04:39:03 +00:00
Junmo Park 3a40237c03 Delete trailing whitespace; NFC
llvm-svn: 256908
2016-01-06 03:53:36 +00:00
Junmo Park 3ec882feed Delete trailing whitespace; NFC
llvm-svn: 256906
2016-01-06 03:41:30 +00:00
MinSeong Kim a7385ebf78 [AArch64] Add support for Samsung Exynos-M1
Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A).

Differential Revision: http://reviews.llvm.org/D15663

llvm-svn: 256828
2016-01-05 12:51:59 +00:00
Junmo Park 3b8c715b2f Remove extra whitespace. NFC.
llvm-svn: 256820
2016-01-05 09:36:47 +00:00
Geoff Berry 9e934b0cc2 [AArch64] Optimize some simple TBZ/TBNZ cases.
Summary:
Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases:

 (tbz (and x, m), b) -> (tbz x, b)
 (tbz (shl x, c), b) -> (tbz x, b-c)
 (tbz (shr x, c), b) -> (tbz x, b+c)
 (tbz (xor x, -1), b) -> (tbnz x, b)

Reviewers: jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15702

llvm-svn: 256765
2016-01-04 18:55:47 +00:00
Craig Topper daf2e3ff7a Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC
llvm-svn: 256427
2015-12-25 22:10:01 +00:00
Jun Bum Lim 6755c3bc5f [AArch64] Promote loads from stored
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.

Original commit message:

This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256249
2015-12-22 16:36:16 +00:00
Matthew Simpson 11c4de6054 [AArch64] Add additional extract-extend patterns for smov
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.

This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.

llvm-svn: 256176
2015-12-21 18:31:25 +00:00
Jun Bum Lim 4bb171c8da Revert "[AArch64] Promote loads from stores"
This reverts commit r256004 due to a failure in cortex-a53.

llvm-svn: 256160
2015-12-21 15:36:49 +00:00
Chad Rosier d016574df8 [AArch64] Enable PostRAScheduler for AArch64 generic build.
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.

http://reviews.llvm.org/D15652

llvm-svn: 256158
2015-12-21 14:43:45 +00:00
Jun Bum Lim 3509d64c24 [AArch64] Promote loads from stores
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256004
2015-12-18 18:08:30 +00:00
Matthew Simpson 13dddb0799 Revert "[AArch64] Add DAG combine for extract extend pattern"
This reverts commit r255895. The patch breaks internal tests. Reverting until a
fix is ready.

llvm-svn: 255928
2015-12-17 21:29:47 +00:00
Rafael Espindola 9e1cae510f Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build"
This reverts commit r255896. It broke the tests.

llvm-svn: 255899
2015-12-17 15:12:26 +00:00
MinSeong Kim d05e9fd194 [AArch64] Enable PostRAScheduler for AArch64 generic build
This patch enables PostRAScheduler specifically for AArch64 generic build,
which is beneficial from the performance perspective.
Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed.
Also benchmarks from LLVM test-suite did not regress.

Differential Revision: http://reviews.llvm.org/D15557

llvm-svn: 255896
2015-12-17 14:51:22 +00:00
Matthew Simpson 4355e404d5 [AArch64] Add DAG combine for extract extend pattern
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.

Differential Revision: http://reviews.llvm.org/D15515

llvm-svn: 255895
2015-12-17 14:30:55 +00:00
Matthias Braun 454192917b AArch64: Simplify emitEpilogue() and related code; NFC
This is in preparation to an upcoming patch.

llvm-svn: 255872
2015-12-17 03:18:47 +00:00
Ahmed Bougacha 66834ec6e1 [AArch64] Simplify some TRI/TII getters. NFC.
We don't need static_casts when we use the right Subtarget.

llvm-svn: 255836
2015-12-16 22:54:06 +00:00
Manman Ren cbe4f9417d CXX_FAST_TLS calling convention: performance improvement for AArch64.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

The target independent portion was committed as r255353.
rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15341

llvm-svn: 255821
2015-12-16 21:04:19 +00:00
Geoff Berry 8f5acb1bd1 Remove dead function AArch64TargetLowering::getFunctionAlignment. NFC.
Reviewers: t.p.northover, jmolloy, mcrosier

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15458

llvm-svn: 255509
2015-12-14 17:01:10 +00:00
Cong Hou c106989fd5 Normalize MBB's successors' probabilities in several locations.
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).


Differential revision: http://reviews.llvm.org/D15259

llvm-svn: 255455
2015-12-13 09:26:17 +00:00
Hal Finkel cd8664c3c2 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Matthias Braun 60d69e2865 CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.

This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).

Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.

This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033

Differential Revision: http://reviews.llvm.org/D15320

llvm-svn: 255362
2015-12-11 19:42:09 +00:00
Matt Arsenault fbd9bbfda3 Start replacing vector_extract/vector_insert with extractelt/insertelt
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359
2015-12-11 19:20:16 +00:00
Pirama Arumuga Nainar 1317d5f311 Fix fptosi, fptoui from f16 vectors to i8, i16 vectors
Summary:
Convert f16 vectors to corresponding f32 vectors before doing the
conversion to int.

Add tests for v4f16, v8f16.

Reviewers: ab, jmolloy

Subscribers: llvm-commits, srhines

Differential Revision: http://reviews.llvm.org/D14936

llvm-svn: 255263
2015-12-10 17:16:49 +00:00
Oliver Stannard 86f729296a [AArch64] Fix FP16 vector instructions that should only accept low registers
llvm-svn: 255113
2015-12-09 14:32:11 +00:00
Ahmed Bougacha 97564c3a1b [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

llvm-svn: 255089
2015-12-09 01:19:50 +00:00
Pirama Arumuga Nainar e6ccd7b66a Define selection for v4f16, v8f16 scalar_to_vector
Summary:
This fixes failure when trying to select
    insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.

The accompanying v4 and v8 tests fail instruction selection without this
patch.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15322

llvm-svn: 255072
2015-12-08 23:07:06 +00:00
Oliver Stannard e4c3d21ea6 [AArch64] Add ARMv8.2-A FP16 vector instructions
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.

The ".2h" vector type specifier is now legal (for the scalar pairwise
reduction instructions), so some unrelated tests have been modified as
different error messages are emitted. This is not a problem as the
invalid operands are still caught.

llvm-svn: 255010
2015-12-08 12:16:10 +00:00
Manman Ren cb8470b4b5 [CXX TLS calling convention] Add support for AArch64.
rdar://9001553

llvm-svn: 254978
2015-12-08 00:14:38 +00:00
Craig Topper e5e035a3a8 Replace uint16_t with the MCPhysReg typedef in many places. A lot of physical register arrays already use this typedef.
llvm-svn: 254843
2015-12-05 07:13:35 +00:00
Philip Reames 7c6692de16 [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access.  Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing.  Note that the actual implementation was always bailing if the load or store wasn't simple.  

Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared.  In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above.  These are normal loads and what most of the optimizer works with.

llvm-svn: 254805
2015-12-05 00:18:33 +00:00
Chad Rosier f3491496dc [AArch64] Expand vector SDIVREM/UDIVREM operations.
http://reviews.llvm.org/D15214
Patch by Ana Pazos <apazos@codeaurora.org>!

llvm-svn: 254773
2015-12-04 21:38:44 +00:00
Matthias Braun 0d4505c067 AArch64FastISel: Use cbz/cbnz to branch on i1
In the case of a conditional branch without a preceding cmp we used to emit
a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.

Differential Revision: http://reviews.llvm.org/D15122

llvm-svn: 254621
2015-12-03 17:19:58 +00:00
Christof Douma 8b5dc2c94e [AArch64]: Add support for Cortex-A35
Adds support for the new Cortex-A35 ARMv8-A core.

llvm-svn: 254503
2015-12-02 11:53:44 +00:00
Tim Northover f3be9d5c0b AArch64: fix 128-bit shifts
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).

The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.

rdar://22491037

llvm-svn: 254475
2015-12-02 00:33:54 +00:00
Weiming Zhao 56ab51870c [AArch64] Fix a corner case in BitFeild select
Summary:
When not useful bits, BitWidth becomes 0 and APInt will not be happy.

See https://llvm.org/bugs/show_bug.cgi?id=25571

We can just mark the operand as IMPLICIT_DEF is none bits of it is used.

Reviewers: t.p.northover, jmolloy

Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14803

llvm-svn: 254440
2015-12-01 19:17:49 +00:00
Oliver Stannard a34e47066e [AArch64] Add ARMv8.2-A Statistical Profiling Extension
The Statistical Profiling Extension is an optional extension to
ARMv8.2-A. Since it is an optional extension, I have added the
FeatureSPE subtarget feature to control it. The assembler-visible parts
of this extension are the new "psb csync" instruction, which is
equivalent to "hint #17", and a number of system registers.

Differential Revision: http://reviews.llvm.org/D15021

llvm-svn: 254401
2015-12-01 10:48:51 +00:00
Oliver Stannard b25914e03f [AArch64] Add ARMv8.2-A FP16 scalar instructions
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Most of these instructions are the same as the 32- and 64-bit versions,
but with the type field (bits 23-22) set to 0b11. Previously the top bit
of the size field was always 0, so the instruction classes only provided
a 1-bit size field, which I have widened to 2 bits.

Differential Revision: http://reviews.llvm.org/D15014

llvm-svn: 254198
2015-11-27 13:04:48 +00:00
Oliver Stannard 64c167db7a [AArch64] Add ARMv8.2-A new AT instruction variants
ARMv8.2-A adds new variants of the "at" (address translate) system
instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These
are a required part of ARMv8.2-A, so no additional subtarget features
are required.

Differential Revision: http://reviews.llvm.org/D15018

llvm-svn: 254159
2015-11-26 15:34:44 +00:00
Oliver Stannard 911ea20f07 [AArch64] Add ARMv8.2-A UAO PSTATE bit
ARMv8.2-A adds a new PSTATE bit, PSTATE.UAO, which allows the LDTR/STTR
instructions to behave the same as LDR/STR with respect to execute-only
pages at higher privilege levels. New variants of the MSR/MRS
instructions are added to allow reading and writing this bit. It is a
required part of ARMv8.2-A, so no additional subtarget features are
required.

Differential Revision: http://reviews.llvm.org/D15020

llvm-svn: 254157
2015-11-26 15:32:30 +00:00
Oliver Stannard 1a81cc9f43 [AArch64] Add ARMv8.2-A persistent memory instruction
ARMv8.2-A adds the "dc cvap" instruction, which is a system instruction
that cleans caches to the point of persistence (for systems that have
persistent memory). It is a required part of ARMv8.2-A, so no additional
subtarget features are required.

Differential Revision: http://reviews.llvm.org/D15016

llvm-svn: 254156
2015-11-26 15:28:47 +00:00
Oliver Stannard 48b43741d0 [AArch64] Add ARMv8.2-A ID_A64MMFR2_EL1 register
ARMv8.2-A adds a new ID register, ID_A64MMFR2_EL1, which behaves in the
same way as ID_A64MMFR0_EL1 and ID_A64MMFR1_EL1. It is a required part
of ARMv8.2-A, so no additional subtarget features are required.

Differential Revision: http://reviews.llvm.org/D15017

llvm-svn: 254155
2015-11-26 15:26:10 +00:00
Oliver Stannard 7cc0c4e675 [AArch64] Add subtarget features for ARMv8.2-A
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature. There
is also one large, optional feature, which adds 16-bit floating point
versions of all existing floating-point instructions (VFP and SIMD),
this is represented by the FeatureFullFP16 subtarget feature.

Differential Revision: http://reviews.llvm.org/D15013

llvm-svn: 254154
2015-11-26 15:23:32 +00:00
Artyom Skrobov 314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Cong Hou 1938f2eb98 Let SelectionDAG start to use probability-based interface to add successors.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.


Differential revision: http://reviews.llvm.org/D14361

llvm-svn: 253965
2015-11-24 08:51:23 +00:00
Jun Bum Lim 80ec0d3f5a [AArch64]Merge narrow zero stores to a wider store
This change merges adjacent zero stores into a wider single store.
For example :
  strh wzr, [x0]
  strh wzr, [x0, #2]
becomes
  str wzr, [x0]

This will fix PR25410.

llvm-svn: 253711
2015-11-20 21:14:07 +00:00
Jun Bum Lim c12c2790e1 [AArch64] Refactoring aarch64-ldst-opt. NCF.
Summary :
 * Rename isSmallTypeLdMerge() to isNarrowLoad().
 * Rename NumSmallTypeMerged to NumNarrowTypePromoted.
 * Use Subtarget defined as a member variable.

llvm-svn: 253587
2015-11-19 18:41:27 +00:00
Jun Bum Lim 4c35ccac91 [AArch64]Extend merging narrow loads into a wider load
This change extends r251438 to handle more narrow load promotions
including byte type, unscaled, and signed. For example, this change will
convert :
  ldursh w1, [x0, #-2]
  ldurh  w2, [x0, #-4]
into
  ldur  w2, [x0, #-4]
  asr   w1, w2, #16
  and   w2, w2, #0xffff

llvm-svn: 253577
2015-11-19 17:21:41 +00:00
Pete Cooper 67cf9a723b Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Quentin Colombet f6645cce91 [AArch64] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14360

rdar://problem/20820748

llvm-svn: 253520
2015-11-18 23:12:20 +00:00
Pete Cooper 72bc23ef02 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Matthew Simpson 343af07aa9 [Aarch64] Add cost for missing extensions.
This patch adds a cost estimate for some missing sign and zero extensions. The
costs were determined by counting the number of shift instructions generated
without context for each new extension.

Differential Revision: http://reviews.llvm.org/D14730

llvm-svn: 253482
2015-11-18 18:03:06 +00:00
Ahmed Bougacha 88ddeae8bd [AArch64] Promote f16 SELECT_CC CC operands when op is legal.
SELECT_CC has the nasty property of having operands with unrelated
types. So if you do something like:

  f32 = select_cc f16, f16, f32, f32, cc

You'd only look for the action for <select_cc, f32>, but never f16.
If the types are all legal, but the op isn't (as for f16 on AArch64,
or for f128 on x86_64/AArch64?), then you get into trouble.
For f128, we have softenSetCCOperands to handle this case.

Similarly, for f16, we can directly promote the CC operands.

llvm-svn: 253344
2015-11-17 16:45:40 +00:00
Oliver Stannard 9be59af3ab [Assembler] Make fatal assembler errors non-fatal
Currently, if the assembler encounters an error after parsing (such as an
out-of-range fixup), it reports this as a fatal error, and so stops after the
first error. However, for most of these there is an obvious way to recover
after emitting the error, such as emitting the fixup with a value of zero. This
means that we can report on all of the errors in a file, not just the first
one. MCContext::reportError records the fact that an error was encountered, so
we won't actually emit an object file with the incorrect contents.

Differential Revision: http://reviews.llvm.org/D14717

llvm-svn: 253328
2015-11-17 10:00:43 +00:00
Oliver Stannard 9327a7575b [ARM,AArch64] Store source location of asm constant pool entries
Storing the source location of the expression that created a constant pool
entry allows us to emit better error messages if we later discover that the
expression cannot be represented by a relocation.

Differential Revision: http://reviews.llvm.org/D14646

llvm-svn: 253220
2015-11-16 16:25:47 +00:00
Oliver Stannard 09be060606 [ARM,AArch64] Store source location for values in assembly files
The MCValue class can store a SMLoc to allow better error messages to be
emitted if an error is detected after parsing. The ARM and AArch64 assembly
parsers were not setting this, so error messages did not have source
information.

Differential Revision: http://reviews.llvm.org/D14645

llvm-svn: 253219
2015-11-16 16:22:47 +00:00
Oliver Stannard db9081bf89 [AArch64] ldr= pseudo-instruction silently ignored if register invalid
The AArch64 assembler was silently ignoring instructions like this:
  ldr foo, =bar

AArch64AsmParser::parseOperand was returning true as the parse failed, but was
not calling AArch64AsmParser::Error to report this to the user, so the
instruction was ignored without printing an error message.

Differential Revision: http://reviews.llvm.org/D14651

llvm-svn: 253193
2015-11-16 10:25:19 +00:00
Akira Hatanaka b11ef0897c Reduce the size of MCRelaxableFragment.
MCRelaxableFragment previously kept a copy of MCSubtargetInfo and
MCInst to enable re-encoding the MCInst later during relaxation. A copy
of MCSubtargetInfo (instead of a reference or pointer) was needed
because the feature bits could be modified by the parser.

This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment
with a constant reference to MCSubtargetInfo. The copies of
MCSubtargetInfo are kept in MCContext, and the target parsers are now
responsible for asking MCContext to provide a copy whenever the feature
bits of MCSubtargetInfo have to be toggled.
 
With this patch, I saw a 4% reduction in peak memory usage when I
compiled verify-uselistorder.lto.bc using llc.

rdar://problem/21736951

Differential Revision: http://reviews.llvm.org/D14346

llvm-svn: 253127
2015-11-14 06:35:56 +00:00
Akira Hatanaka bd9fc28444 [MCTargetAsmParser] Move the member varialbes that reference
MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a
member function getSTI.

This is done in preparation for making changes to shrink the size of
MCRelaxableFragment. (see http://reviews.llvm.org/D14346).

llvm-svn: 253124
2015-11-14 05:20:05 +00:00
Justin Bogner fff708db92 AArch64: Default AArch64Subtarget::ReserveX18 to true on darwin
Darwin reserves x18, so it's never ABI compliant to generate code that
uses it. Set the default value based on the OS part of the triple
rather than forcing front-ends to set the +reserve-x18 target feature
in order to build correct code for Darwin.

This will make r243310 redundant, so I'll revert that shortly.

llvm-svn: 253102
2015-11-13 23:05:46 +00:00
Ahmed Bougacha 4a85643907 [MC] Use LShr for constant evaluation of ">>" on non-arm64 darwin.
Follow-up to r235963: this matches other assemblers and is less
unexpected (e.g. PR23227).

llvm-svn: 252681
2015-11-11 00:51:36 +00:00
Sanjay Patel 241c31fb64 [AArch64] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
AArch64 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any AArch64
implementation.

The net result of allowing this speculation for the regression tests in this
patch is that we get this code:

ctlz:
  clz  w0, w0
  ret

cttz:
  rbit  w8, w0
  clz  w0, w8
  ret

Instead of:

ctlz:
  cbz  w0, .LBB0_2
  clz  w0, w0
  ret
.LBB0_2:
  orr  w0, wzr, #0x20
  ret

cttz:
  cbz  w0, .LBB1_2
  rbit  w8, w0
  clz  w0, w8
  ret
.LBB1_2:
  orr  w0, wzr, #0x20
  ret

See D14469 for the larger motivation.

Differential Revision: http://reviews.llvm.org/D14505

llvm-svn: 252625
2015-11-10 18:11:37 +00:00
Oliver Stannard d414c99b9c [AArch64] Fix halfword load merging for big-endian targets
For big-endian targets, when we merge two halfword loads into a word load, the
order of the halfwords in the loaded value is reversed compared to
little-endian, so the load-store optimiser needs to swap the destination
registers.

This does not affect merging of two word loads, as we use ldp, which treats the
memory as two separate 32-bit words.

llvm-svn: 252597
2015-11-10 11:04:18 +00:00
Tim Northover 339c83e27f AArch64: add experimental support for address tagging.
AArch64 has the ability to use the top 8-bits of an "address" for extra
information, with the memory subsystem automatically masking them off for loads
and stores. When that's happening, we can sometimes skip masks on memory
operations in the compiler.

However, this requires the host OS and support stack to preserve those bits so
it can't be enabled everywhere. In principle iOS 8.0 and above do take the
required precautions and but we'll put it under a flag for now.

llvm-svn: 252573
2015-11-10 00:44:23 +00:00
Sanjay Patel 776e59b0fe don't repeat function names in comments; NFC
llvm-svn: 252502
2015-11-09 19:18:26 +00:00
Charlie Turner 90dafb1b6d [AArch64] Add UABDL patterns for log2 shuffle.
Summary:
This matches the sum-of-absdiff patterns emitted by the vectoriser using log2 shuffles.

Relies on D14207 to be able to match the `extract_subvector(..., 0)`

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14208

llvm-svn: 252465
2015-11-09 13:10:52 +00:00
Charlie Turner 7b7b06f737 [AArch64] Handle extract_subvector(..., 0) in ISel.
Summary:
Lowering this pattern early to an `EXTRACT_SUBREG` was making it impossible to match larger patterns in tblgen that use `extract_subvector(..., 0)` as part of the their input pattern.

It seems like there will exist somewhere a better way of specifying this pattern over all relevant register value types, but I didn't manage to find it.

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14207

llvm-svn: 252464
2015-11-09 12:45:11 +00:00
Colin LeMahieu 8a0453e23a [AsmParser] Backends can parameterize ASM tokenization.
llvm-svn: 252439
2015-11-09 00:31:07 +00:00
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
Ahmed Bougacha cf49b523a0 [AArch64][FastISel] Don't even try to select vector icmps.
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).

Give up, let SDAG agree with itself on a vNi1 legalization strategy.

llvm-svn: 252364
2015-11-06 23:16:53 +00:00
Jun Bum Lim 22fe15ee86 [AArch64]Enable the narrow ld promotion only on profitable microarchitectures
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.

llvm-svn: 252316
2015-11-06 16:27:47 +00:00
Tim Northover 775aaeb765 Remove windows line endings introduced by r252177. NFC.
llvm-svn: 252217
2015-11-05 21:54:58 +00:00
Sanjay Patel 387e66e79f replace MachineCombinerPattern namespace and enum with enum class; NFCI
Also, remove an enum hack where enum values were used as indexes into an array.

We may want to make this a real class to allow pattern-based queries/customization (D13417).

llvm-svn: 252196
2015-11-05 19:34:57 +00:00
Oleg Ranevskyy 057c5a6b2b [DebugInfo] Fix ARM/AArch64 prologue_end position. Related to D11268.
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.

D11268 is quite old and has merge conflicts against the current trunk.
This request 
 - rebases D11268 onto the new trunk;
 - resolves the merge conflicts;
 - fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.

Reviewers: echristo, rengolin, kubabrecka

Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl

Differential Revision: http://reviews.llvm.org/D14338

llvm-svn: 252177
2015-11-05 17:50:17 +00:00
Craig Topper 4b27576001 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

llvm-svn: 251490
2015-10-28 04:02:12 +00:00
Jun Bum Lim c9879ecfbc [AArch64]Merge halfword loads into a 32-bit load
This recommits r250719, which caused a failure in SPEC2000.gcc
because of the incorrect insert point for the new wider load.

Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
  ldrh w0, [x2]
  ldrh w1, [x2, #2]
becomes
  ldr w0, [x2]
  ubfx w1, w0, #16, #16
  and  w0, w0, #ffff

llvm-svn: 251438
2015-10-27 19:16:03 +00:00
Cong Hou 07eeb8001e Create a new interface addSuccessorWithoutWeight(MBB*) in MBB to add successors when optimization is disabled.
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.

We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.

In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.

Differential revision: http://reviews.llvm.org/D13963

llvm-svn: 251429
2015-10-27 17:59:36 +00:00
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
Craig Topper ee0c859788 Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

llvm-svn: 251382
2015-10-27 04:14:24 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Craig Topper 7bf52c9d26 Use MVT::SimpleValueType instead of MVT in template parameter. NFC
llvm-svn: 251217
2015-10-25 00:27:14 +00:00
Craig Topper 272d6a57bb Call the version of ConvertCostTableLookup that takes a statically sized array rather than pointer and size. NFC
llvm-svn: 251196
2015-10-24 18:40:22 +00:00
James Molloy 5b18b4ce96 Revert "[AArch64]Merge halfword loads into a 32-bit load"
This reverts commit r250719. This introduced a codegen fault in SPEC2000.gcc, when compiled for Cortex-A53.

llvm-svn: 251108
2015-10-23 10:41:38 +00:00
Matthias Braun d276de6db1 AArch64: Disable the latency heuristic
It turned out not to improve any of our benchmarks but occasionally led
to increased register pressure and spilling.

Only enabling for the Cyclone CPU as the results on the cortex CPUs
give mixed results.

Differential Revision: http://reviews.llvm.org/D13708

llvm-svn: 251038
2015-10-22 18:07:38 +00:00
Craig Topper 8fe40e0ed5 Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC
llvm-svn: 251032
2015-10-22 17:05:00 +00:00
Jun Bum Lim d3548303ec [AArch64]Merge halfword loads into a 32-bit load
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
  ldrh w0, [x2]
  ldrh w1, [x2, #2]
becomes
  ldr w0, [x2]
  ubfx w1, w0, #16, #16
  and  w0, w0, #ffff

llvm-svn: 250719
2015-10-19 18:34:53 +00:00
Craig Topper 2626094fa1 Make a bunch of static arrays const.
llvm-svn: 250642
2015-10-18 05:15:34 +00:00
Charlie Turner 434d4599d4 [AArch64] Implement vector splitting on UADDV.
Summary: Fixes PR25056.

Reviewers: mcrosier, junbuml, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13466

llvm-svn: 250520
2015-10-16 15:38:25 +00:00
Evgeniy Stepanov 9addbc9fc1 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov 142947e9f0 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
Duncan P. N. Exon Smith d3b9df02b3 AArch64: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250216
2015-10-13 20:02:15 +00:00
Akira Hatanaka 5a4e4f8d8a [AArch64] Check the size of the vector before accessing its elements.
This fixes an assert in AArch64AsmParser::MatchAndEmitInstruction.

rdar://problem/23081753

llvm-svn: 250207
2015-10-13 18:55:34 +00:00
Duncan P. N. Exon Smith 769e1a972d AArch64: Make getNextNode() cleanup in r249764 more clear
After r249764, if you didn't see the full context, it looked like
`std::next(I)` would get the same result as
`++MachineBasicBlock::iterator(I)`.  However, `I` is a `MachineInstr*`
(not a `MachineBasicBlock::iterator`).

Use the `getIterator()` helper I added later (r249782) to make this code
more clear.

llvm-svn: 249852
2015-10-09 16:54:54 +00:00
Jun Bum Lim 0aace13d18 Improve ISel across lane float min/max reduction
In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

  svn0 = vector_shuffle t0, undef<2,3,u,u>
  fmin = fminnum t0,svn0
  svn1 = vector_shuffle fmin, undef<1,u,u,u>
  cc = setcc fmin, svn1, ole
  n0 = extract_vector_elt cc, #0
  n1 = extract_vector_elt fmin, #0
  n2 = extract_vector_elt fmin, #1
  result = select n0, n1,n2
into :
  result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

llvm-svn: 249834
2015-10-09 14:11:25 +00:00
Duncan P. N. Exon Smith d389165c14 AArch64: Stop using MachineInstr::getNextNode()
Stop using `getNextNode()` to get an insertion point (at least, in this
one place).  Instead, use iterator logic directly.

The `getNextNode()` interface isn't actually supposed to work for
creating iterators; it's supposed to return `nullptr` (not a real
iterator) if this is the last node.  It's currently broken and will
"happen" to work, but if we ever fix the function, we'll get some
strange failures in places like this.

llvm-svn: 249764
2015-10-08 22:43:26 +00:00
Evgeniy Stepanov 5fe279e727 Add Triple::isAndroid().
This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750
2015-10-08 21:21:24 +00:00
Chad Rosier 7c6ac2b8f9 [AArch64] Fold a floating-point divide by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249579
2015-10-07 17:51:37 +00:00
Chad Rosier fa30c9b436 [AArch64] Fold a floating-point multiply by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249576
2015-10-07 17:39:18 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Alexandros Lamprineas 1bab191f25 [MC layer][AArch64] llvm-mc accepts 4-bit immediate values for
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.

Differential Revision: http://reviews.llvm.org/D13011

llvm-svn: 249313
2015-10-05 13:42:31 +00:00
Rafael Espindola e3a20f57d9 Fix pr24486.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.

With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.

In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.

This patch is a joint work by Maxim Ostapenko andy myself.

llvm-svn: 249303
2015-10-05 12:07:05 +00:00
Chad Rosier 1f385618c0 [ARM] Typo. NFC.
llvm-svn: 249153
2015-10-02 16:42:59 +00:00
Chad Rosier f11d040f01 [AArch64] Deprecate a command-line option used for testing.
Support for pairing unscaled loads and stores has been enabled since the
original ARM64 port.  This feature is no longer experimental, AFAICT.

llvm-svn: 249049
2015-10-01 18:17:12 +00:00
Chad Rosier b7c5b91068 [AArch64] Hoist commonly failing check. NFC.
llvm-svn: 249011
2015-10-01 13:43:05 +00:00
Chad Rosier 0b15e7c618 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 249008
2015-10-01 13:33:31 +00:00
Chad Rosier 7a83d770ae [AArch64] Update comment to reflect reality.
llvm-svn: 249007
2015-10-01 13:09:44 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Chad Rosier 4f04e2ec87 [AArch64] Use helper function to improve readability. NFC.
llvm-svn: 248914
2015-09-30 16:50:41 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Chad Rosier 32d4d37e61 [AArch64] Scale offsets by the size of the memory operation. NFC.
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored.  This change gets
us one step closer to forming LDPSW instructions.  This change also enables
pre- and post-indexing for halfword and byte loads and stores.

llvm-svn: 248804
2015-09-29 16:07:32 +00:00
Chad Rosier a4d3217e81 [AArch64] Remove some redundant cases. NFC.
llvm-svn: 248800
2015-09-29 14:57:10 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Chad Rosier 1bbd7fb38e [AArch64] Add support for generating pre- and post-index load/store pairs.
llvm-svn: 248593
2015-09-25 17:48:17 +00:00
Chad Rosier b02f5a5a1f [AArch64] Improve the readability of the ld/st optimization pass. NFC.
In this context, MI is an add/sub instruction not a loads/store.

llvm-svn: 248540
2015-09-24 21:27:49 +00:00
Chad Rosier 7cd472b719 [AArch64] The paired post-increment store instruction has an output register.
The pre- and post-increment version update the base register, but the post-
version was defined incorrectly.  There is no test case as we don't currently
generate these instructions, but I plan on changing that in the near future.

llvm-svn: 248528
2015-09-24 19:21:42 +00:00
Chad Rosier 2dfd35499e [AArch64] Refactor pre- and post-index merge fuctions into a single function. NFC.
llvm-svn: 248377
2015-09-23 13:51:44 +00:00
Ahmed Bougacha 07a844d758 [AArch64] Emit clrex in the expanded cmpxchg fail block.
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248291
2015-09-22 17:21:44 +00:00
Stephen Canon 8216d88511 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
Chad Rosier 03a47305ec [Machine Combiner] Refactor machine reassociation code to be target-independent.
No functional change intended.
Patch by Haicheng Wu <haicheng@codeaurora.org>!

http://reviews.llvm.org/D12887
PR24522

llvm-svn: 248164
2015-09-21 15:09:11 +00:00
Craig Topper 3c76c523e1 Cleanup places that passed SMLoc by const reference to pass it by value instead. NFC
llvm-svn: 248135
2015-09-20 23:35:59 +00:00
Geoff Berry 43ec15e57e [AArch64] Improved bitfield instruction selection.
Summary:
For bitfield insert OR matching, check both operands for larger pattern
first before checking for smaller pattern.

Add pattern for unsigned bitfield insert-in-zero done with SHL+AND.

Resolves PR21631.

Reviewers: jmolloy, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits, mcrosier

Differential Revision: http://reviews.llvm.org/D12908

llvm-svn: 248006
2015-09-18 17:11:53 +00:00
Chad Rosier d90e2ebdf6 [AArch64] Reorder cases to improve readability. NFC.
llvm-svn: 247989
2015-09-18 14:15:19 +00:00
Chad Rosier 84a0afdeff [AArch64] Remove some redundant cases. NFC.
llvm-svn: 247988
2015-09-18 14:13:18 +00:00
Chad Rosier 6c1f0933ac Typos. NFC.
llvm-svn: 247884
2015-09-17 13:10:27 +00:00
Eric Christopher a4e5d3cf8e constify the Function parameter to the TTI creation callback and
propagate to all callers/users/etc.

llvm-svn: 247864
2015-09-16 23:38:13 +00:00
Daniel Sanders 50f17235dd Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders 153010c52d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders c40de48041 Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders 18d4b0dab7 Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Jun Bum Lim 34b9bd0435 Improve ISel using across lane min/max reduction
In vectorized integer min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :
  %svn0 = vector_shuffle %0, undef<2,3,u,u>
  %smax0 = smax %0, svn0
  %svn3 = vector_shuffle %smax0, undef<1,u,u,u>
  %sc = setcc %smax0, %svn3, gt
  %n0 = extract_vector_elt %sc, #0
  %n1 = extract_vector_elt %smax0, #0
  %n2 = extract_vector_elt $smax0, #1
  %result = select %n0, %n1, n2
becomes :
  %1 = smaxv %0
  %result = extract_vector_elt %1, 0

This change extends r246790.

llvm-svn: 247575
2015-09-14 16:19:52 +00:00
Ahmed Bougacha 5246867384 [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha 9d677131c4 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
Steven Wu e3b1f2b765 Fix an undefined behavior introduces in r247234
llvm-svn: 247296
2015-09-10 16:32:28 +00:00
Chandler Carruth e4405e949f [ADT] Switch a bunch of places in LLVM that were doing single-character
splits to actually use the single character split routine which does
less work, and in a debug build is *substantially* faster.

llvm-svn: 247245
2015-09-10 06:12:31 +00:00
Ahmed Bougacha 05541459fa [AArch64] Match FI+offset in STNP addressing mode.
First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.

Follow-up to r247234.

llvm-svn: 247236
2015-09-10 01:54:43 +00:00
Ahmed Bougacha c0ac38d584 [AArch64] Match base+offset in STNP addressing mode.
Followup to r247231.

llvm-svn: 247234
2015-09-10 01:48:29 +00:00
Ahmed Bougacha b8886b517d [AArch64] Support selecting STNP.
We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.

Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.

Part of PR24086.

llvm-svn: 247231
2015-09-10 01:42:28 +00:00
Silviu Baranga a3e27edb5d [CostModel][AArch64] Remove amortization factor for some of the vector select instructions
Summary:
We are not scalarizing the wide selects in codegen for i16 and i32 and
therefore we can remove the amortization factor. We still have issues
with i64 vectors in codegen though.

Reviewers: mcrosier

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12724

llvm-svn: 247156
2015-09-09 15:35:02 +00:00
Jun Bum Lim 4d3c5986f2 Remove white space (test commit)
llvm-svn: 247021
2015-09-08 16:11:22 +00:00
Chad Rosier 6c36eff1d6 [AArch64] Improve ISel using across lane addition reduction.
In vectorized add reduction code, the final "reduce" step is sub-optimal.
This change wll combine :

ext  v1.16b, v0.16b, v0.16b, #8
add  v0.4s, v1.4s, v0.4s
dup  v1.4s, v0.s[1]
add  v0.4s, v1.4s, v0.4s

into

addv s0, v0.4s

PR21371
http://reviews.llvm.org/D12325
Patch by Jun Bum Lim <junbuml@codeaurora.org>!

llvm-svn: 246790
2015-09-03 18:13:57 +00:00
Chad Rosier 08ef462d15 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r246769.

This appears to have broken Multisource/Benchmarks/tramp3d-v4.

llvm-svn: 246782
2015-09-03 16:41:28 +00:00
Chad Rosier 491a1bd998 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

llvm-svn: 246769
2015-09-03 14:41:37 +00:00
Chad Rosier 5f668e170a [AArch64] Reuse MayLoad. NFC.
llvm-svn: 246767
2015-09-03 14:19:43 +00:00
Ahmed Bougacha 63fae0e58b [AArch64] More consistently separate asm opc and operands with '\t'.
Somehow missed these in r246686.

llvm-svn: 246687
2015-09-02 18:52:54 +00:00
Ahmed Bougacha cca07716f5 [AArch64] Consistently separate asm opc and operands with '\t'.
Some of the instructions use ' ', which drives OCD-me nuts.
Let's put an end to this.

NFC-ish: hopefully nobody cares about whitespace.
llvm-svn: 246686
2015-09-02 18:38:36 +00:00
Ahmed Bougacha b0ff6437cb [AArch64] Lower READCYCLECOUNTER using MRS PMCCTNR_EL0.
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.

Differential Revision: http://reviews.llvm.org/D12425

llvm-svn: 246555
2015-09-01 16:23:45 +00:00
Silviu Baranga 755ec0e027 [AArch64] Turn on by default interleaved access vectorization
Summary:
This change turns on by default interleaved access vectorization
for AArch64.

We also clean up some tests which were spedifically enabling this
behaviour.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12149

llvm-svn: 246542
2015-09-01 11:26:46 +00:00
Quentin Colombet a80b9c824e [AArch64][CollectLOH] Remove an invalid assertion and add a test case exposing it.
rdar://problem/22491525

llvm-svn: 246472
2015-08-31 19:02:00 +00:00
Matthias Braun 0acbd08f3c AArch64: Fix loads to lower NEON vector lanes using GPR registers
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.

Reported in rdar://22223823

Differential Revision: http://reviews.llvm.org/D12467

llvm-svn: 246462
2015-08-31 18:25:15 +00:00
Hal Finkel 982e8d48f8 [MIR Serialization] static -> static const in getSerializable*MachineOperandTargetFlags
Make the arrays 'static const' instead of just 'static'. Post-commit review
comment from Roman Divacky on IRC. NFC.

llvm-svn: 246376
2015-08-30 08:07:29 +00:00
Quentin Colombet fa4ecb4b9a [AArch64][CollectLOH] Fix a regression that prevented us to detect chains of
more than 2 instructions.

I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!

Fix that as well!

llvm-svn: 246239
2015-08-27 23:47:10 +00:00
Chad Rosier 9f4709b261 [AArch64] Remove a use-after-free when collecting stats.
The call to mergePairedInsns() deletes MI, so the later use by isUnscaledLdSt()
is referencing freed memory.

llvm-svn: 246033
2015-08-26 13:39:48 +00:00
Silviu Baranga db1ddb32ce [AArch64] Unify the integer min/max vector selection patterns with the intrinsic ones
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.

There should already be testing in place for this, so no further tests
were added.

Reviewers: jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12276

llvm-svn: 246030
2015-08-26 11:11:14 +00:00
Matthias Braun 17af607796 FastISel: Factor out common code; NFC intended
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.

llvm-svn: 245997
2015-08-26 01:38:00 +00:00
Mehdi Amini a758398833 Add missing break in AArch64DAGToDAGISel::Select() switch case
Reported by coverity.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245800
2015-08-23 00:42:57 +00:00
Matthias Braun 46e5639806 AArch64: Fix cmp;ccmp ordering
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.

This fixes http://llvm.org/PR24459

llvm-svn: 245641
2015-08-20 23:33:34 +00:00
Matthias Braun 266204b7dc AArch64: Do not create CCMP on multiple users.
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.

llvm-svn: 245640
2015-08-20 23:33:31 +00:00
Juergen Ributzka b12248e9cd [AArch64][FastISel] Don't fold shifts with UB.
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.

This fixes rdar://problem/22345295.

llvm-svn: 245499
2015-08-19 20:52:55 +00:00
Ahmed Bougacha 9e00ec6195 [AArch64] Improve short-form diags on long-form Match_InvalidOperand.
Since r244955, we try to use the short-form ErrorInfo when both
tries failed, and the long-form match failed on a suffix operand.
However, this means we sometimes mix ErrorInfo and MatchResult
(one manifestation of this being PR24498). Instead, restore both.

llvm-svn: 245469
2015-08-19 17:40:19 +00:00
Renato Golin eb552e83e0 Revert "[AArch64] Simplify/refactor code to ease code review. NFC."
This reverts commit r245443, as it broke AArch64 test-suite tramp3d
with an assert "Reg && "Null register has no regunits".

llvm-svn: 245455
2015-08-19 16:29:53 +00:00
Chad Rosier 494abf1ad8 [AArch64] Simplify/refactor code to ease code review. NFC.
llvm-svn: 245443
2015-08-19 14:34:54 +00:00
Alex Lorenz f3630113cd MIR Serialization: Serialize the operand's bit mask target flags.
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
2015-08-18 22:52:15 +00:00
Chad Rosier 3dd0e942b6 [AArch64] Simplify the logic for computing in bounds offset. NFC.
llvm-svn: 245307
2015-08-18 16:20:03 +00:00
Silviu Baranga b322aa6f53 [CostModel][AArch64] Increase cost of vector insert element and add missing cast costs
Summary:
Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.

Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.

Reviewers: rengolin

Subscribers: mcrosier, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11939

llvm-svn: 245226
2015-08-17 16:05:09 +00:00
James Molloy 88edc8243d Remove hand-rolled matching for fmin and fmax.
SDAGBuilder now does this all for us.

llvm-svn: 245198
2015-08-17 07:13:20 +00:00
James Y Knight 5567bafe93 Remove redundant TargetFrameLowering::getFrameIndexOffset virtual
function.

This was the same as getFrameIndexReference, but without the FrameReg
output.

Differential Revision: http://reviews.llvm.org/D12042

llvm-svn: 245148
2015-08-15 02:32:35 +00:00
Ahmed Bougacha cd35787217 [AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns.
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.

llvm-svn: 245107
2015-08-14 22:06:05 +00:00
Rafael Espindola dbaf0498a9 Revert "Centralize the information about which object format we are using."
This reverts commit r245047.

It was failing on the darwin bots. The problem was that when running

./bin/llc -march=msp430

llc gets to

  if (TheTriple.getTriple().empty())
    TheTriple.setTriple(sys::getDefaultTargetTriple());

Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.

That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.

llvm-svn: 245062
2015-08-14 15:48:41 +00:00
Rafael Espindola 90eb70c8a7 Centralize the information about which object format we are using.
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.

It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.

llvm-svn: 245047
2015-08-14 13:31:17 +00:00
James Molloy 63be198712 [AArch64] FMINNAN/FMAXNAN on f16 is not legal.
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.

I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.

llvm-svn: 245035
2015-08-14 09:08:50 +00:00
Ahmed Bougacha 80e4ac802a [AArch64] Provide "too few operands" diags on short-form NEON also.
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.

Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.

llvm-svn: 244955
2015-08-13 21:09:13 +00:00
Ahmed Bougacha 2a97b1bcf8 [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.

Follow-up to r243924, r243926, and r244858.

llvm-svn: 244860
2015-08-13 01:13:56 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
James Molloy b7b2a1e9b4 [AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic.
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:

  - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
  - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!

llvm-svn: 244595
2015-08-11 12:06:37 +00:00
James Molloy edf38f0cb0 [AArch64] Replace the custom AArch64ISD::FMIN/MAX nodes with ISD::FMINNAN/MAXNAN
NFCI. This just removes custom ISDNodes that are no longer needed.

llvm-svn: 244594
2015-08-11 12:06:33 +00:00
Chad Rosier c56a9132d0 [AArch64] Convert a conditional check that will always be true to an assert. NFC.
llvm-svn: 244479
2015-08-10 18:42:45 +00:00
Chad Rosier caed6db51e Typo. Move comment closer to relevant code. NFC.
llvm-svn: 244465
2015-08-10 17:17:19 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Quentin Colombet 7d8c74ff3f [AArch64][LoadStoreOptimizer] Turn a test into an assert. NFC.
At this point the given Opc must be valid, otherwise we should
not look for a matching pair to form paired load or store.

Thanks to Chad to point out this piece of code!

llvm-svn: 244366
2015-08-07 22:40:51 +00:00
Juergen Ributzka f09c7a3d0f [AArch64][FastISel] Always use AND before checking the branch flag.
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.

This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.

This fixes rdar://problem/22161062.

llvm-svn: 244291
2015-08-06 22:44:15 +00:00
Juergen Ributzka 9f54dbe7a1 Revert "[AArch64][FastISel] Add more truncation tests." and "[AArch64][FastISel] Always use an AND instruction when truncating to non-legal types."
This reverts commit r243198 and 243304.

Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.

llvm-svn: 244287
2015-08-06 22:13:48 +00:00
Nico Rieck 78199518c4 Rename inst_range() to instructions() for consistency. NFC
llvm-svn: 244248
2015-08-06 19:10:45 +00:00
Chad Rosier 22eb71056d [AArch64] Use a static function and other minor cleanup for readability. NFC.
llvm-svn: 244233
2015-08-06 17:37:18 +00:00
Chad Rosier f77e909f0a [AArch64] Improve the readability of the ld/st optimization pass. NFC.
llvm-svn: 244222
2015-08-06 15:50:12 +00:00
Chandler Carruth 93205eb966 [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'
rather than 'unsigned' for their costs.

For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).

All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.

This passes all tests, and is also UBSan clean.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D11741

llvm-svn: 244080
2015-08-05 18:08:10 +00:00
Pete Cooper 3ae0ee5453 Move BB succ_iterator to be inside TerminatorInst. NFC.
To get the successors of a BB we currently do successors(BB) which
ultimately walks the successors of the BB's terminator.

This moves the iterator to TerminatorInst as thats what we're actually
using to do the iteration, and adds a member function to TerminatorInst
to allow us to iterate directly over successors given an instruction.

For example, we can now do

  for (auto *Succ : BI->successors())

instead of

  for (unsigned i = 0, e = BI->getNumSuccessors(); i != e; ++i)

Reviewed by Tobias Grosser.

llvm-svn: 244074
2015-08-05 17:43:01 +00:00
Chad Rosier 69e3eb3c79 [AArch64] Register AArch64DeadRegisterDefinition pass with LLVM pass manager.
llvm-svn: 244067
2015-08-05 17:35:34 +00:00
Chad Rosier 1c81432eb6 [AArch64] Register (existing) AArch64BranchRelaxation pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

llvm-svn: 244060
2015-08-05 16:12:10 +00:00
Chad Rosier 0c6c5fc303 [AArch64] Make the naming of the Address Type Promotion pass consistent.
llvm-svn: 244057
2015-08-05 15:32:23 +00:00
Chad Rosier 794b9b2fdd [AArch64] Register (existing) AArch64AdvSIMDScalar pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

IIRC, this pass is off by default, but it's still helpful when debugging.

llvm-svn: 244056
2015-08-05 15:18:58 +00:00
Chad Rosier 084b78632e Make this less error prone by using a #define. NFC.
llvm-svn: 244048
2015-08-05 14:48:44 +00:00
Chad Rosier 9378c16ac8 [AArch64] Register (existing) AArch64ExpandPseudo pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

llvm-svn: 244046
2015-08-05 14:22:53 +00:00
Chad Rosier 96530b3a43 [AArch64] Register (existing) AArch64LoadStoreOpt pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

This is the AArch64 version of r243052.

llvm-svn: 244041
2015-08-05 13:44:51 +00:00
Chad Rosier 43f5c84cfc Update comment. NFC.
llvm-svn: 244038
2015-08-05 12:40:13 +00:00
Sanjay Patel 924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Ahmed Bougacha 81fda188f9 [AArch64] Rename FP formats to be more consistent. NFC.
Some are named "FP", others "SD", others still "FP*SD".
Rename all this to just use "FP", which, except for conversions
(which don't use this format naming scheme), implies "SD" anyway.

llvm-svn: 243936
2015-08-04 01:38:08 +00:00
Ahmed Bougacha e0e12db8c8 [AArch64] Add isel support for f16 indexed LD/ST.
llvm-svn: 243935
2015-08-04 01:29:38 +00:00
Ahmed Bougacha e8ea9ac32b [AArch64][v8.1a] The "pan" sysreg isn't MSR-specific. NFCI.
It's already in SysRegMappings, no need to also have it in MSRMappings:
the latter is only used if we didn't find a match in the former.

llvm-svn: 243933
2015-08-04 00:55:11 +00:00
Ahmed Bougacha 0cbe2efcd6 [AArch64] Remove unnecessary "break". NFC.
llvm-svn: 243931
2015-08-04 00:49:08 +00:00
Ahmed Bougacha 239d635d3d [AArch64] Use SDValue bool operator. NFC.
llvm-svn: 243930
2015-08-04 00:48:02 +00:00
Ahmed Bougacha b0ae36f0d1 [AArch64] Vector FCOPYSIGN supports Custom-lowering: mark it as such.
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!

The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.

llvm-svn: 243926
2015-08-04 00:42:34 +00:00
Pete Cooper 7be8f8f018 Convert some AArch64 code to foreach loops. NFC.
Also converted a cast<> to dyn_cast while i was working on the same
line of code.

llvm-svn: 243894
2015-08-03 19:04:32 +00:00
Craig Topper e3dcce9700 De-constify pointers to Type since they can't be modified. NFC
This was already done in most places a while ago. This just fixes the ones that crept in over time.

llvm-svn: 243842
2015-08-01 22:20:21 +00:00
Geoff Berry 8a7ef3b2ee [AArch64] Favor extended reg patterns for sub
Summary:
Favor the extended reg patterns over the shifted reg patterns that match
only the operand shift and not the full sign/zero extend and shift.

Reviewers: jmolloy, t.p.northover

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11569

llvm-svn: 243753
2015-07-31 15:55:54 +00:00
Nick Lewycky c3890d2969 Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.)
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.

llvm-svn: 243585
2015-07-29 22:32:47 +00:00
Tim Northover 2a9d801fd5 AArch64: use 32-bit MOV rather than UBFX to truncate registers.
It's potentially more efficient on Cyclone, and from the optimization guides &
schedulers looks like it has no effect on Cortex-A53 or A57. In general you'd
expect a MOV to be about the most efficient instruction with its semantics,
even though the official "UXTW" alias is really a UBFX.

llvm-svn: 243576
2015-07-29 21:34:32 +00:00
Tim Northover cf739b8c3d AArch64: use AddressingModes.h accessors for compare shifts
No functional change because "lsl #12" is actually encoded as 12, but one less
bug if someone ever decides to change that for the giggles.

llvm-svn: 243536
2015-07-29 16:39:56 +00:00
Akira Hatanaka f53b0403f8 [AArch64] Define subtarget feature strict-align.
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -aarch64-strict-align to decide whether strict alignment should be
forced.

rdar://problem/21529937

llvm-svn: 243516
2015-07-29 14:17:26 +00:00
Sanjay Patel 1dd15598cf fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold
This fix was suggested as part of D11345 and is part of fixing PR24141.

With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.

There is no NFC-intended other than that.

Differential Revision: http://reviews.llvm.org/D11531

llvm-svn: 243498
2015-07-28 23:05:48 +00:00
Tim Northover 17ae83a25f AArch64: be careful of large immediates when optimising cmps.
llvm-svn: 243492
2015-07-28 22:42:32 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Geoff Berry c573bf7a5f [AArch64] Match float round and convert to int instructions.
Summary:
Add patterns for doing floating point round with various rounding modes
followed by conversion to int as a single FCVT* instruction.

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D11424

llvm-svn: 243422
2015-07-28 15:24:10 +00:00
Adhemerval Zanella 7bc3319d84 Implement __builtin_thread_pointer
This path add the aarch64 lowering of __builtin_thread_pointer.  It uses
the already implemented AArch64ISD::THREAD_POINTER used in TLS generation.

llvm-svn: 243412
2015-07-28 13:03:31 +00:00
Colin LeMahieu fe2c8b8015 [llvm-mc] Pushing plumbing through for --fatal-warnings flag.
llvm-svn: 243334
2015-07-27 21:56:53 +00:00
Akira Hatanaka 2541e0241c [AArch64] Remove check for Darwin that was needed to decide if x18 should
be reserved.

The decision to reserve x18 is going to be made solely by the front-end,
so it isn't necessary to check if the OS is Darwin in the backend.

llvm-svn: 243308
2015-07-27 19:18:47 +00:00
Silviu Baranga 7581d22512 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

llvm-svn: 243270
2015-07-27 14:39:34 +00:00
Juergen Ributzka 6364985b58 [AArch64][FastISel] Always use an AND instruction when truncating to non-legal types.
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.

This commit fixes this and adds the missing i32 truncate tests.

This fixes rdar://problem/21990703.

llvm-svn: 243198
2015-07-25 02:16:53 +00:00
Akira Hatanaka 0d4c9ea6e0 [AArch64] Define subtarget feature "reserve-x18", which is used to decide
whether register x18 should be reserved.

This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.

Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11463

llvm-svn: 243186
2015-07-25 00:18:31 +00:00
Pete Cooper 7679afda82 Use make_range(rbegin(), rend()) to allow foreach loops. NFC.
Instead of the pattern

for (auto I = x.rbegin(), E = x.end(); I != E; ++I)

we can use make_range to construct the reverse range and iterate using
that instead.

llvm-svn: 243163
2015-07-24 21:13:43 +00:00
Luke Cheeseman b5c627aba8 When lowering vector shifts a check is performed to see if the value to shift by
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.

Differential Revision: http://reviews.llvm.org/D11408

llvm-svn: 243100
2015-07-24 09:31:48 +00:00
Lawrence Hu 687097a0a9 test commit, only added one space
llvm-svn: 243070
2015-07-23 23:55:28 +00:00
Weiming Zhao b33a5557f4 This patch eanble register coalescing to coalesce the following:
%vreg2<def> = MOVi32imm 1; GPR32:%vreg2
  %W1<def> = COPY %vreg2; GPR32:%vreg2
into:
  %W1<def> = MOVi32imm 1
Patched by Lawrence Hu (lawrence@codeaurora.org)

llvm-svn: 243033
2015-07-23 19:24:53 +00:00
Chad Rosier 1bf48a6a69 Simplify switch as all cases other than default return true. NFC.
llvm-svn: 242922
2015-07-22 18:41:57 +00:00
Chad Rosier fe5399fe88 Follow up to r242810. NFC.
llvm-svn: 242812
2015-07-21 17:47:56 +00:00
Chad Rosier 96a18a96cc [AArch64] Simplify the passing of arguments. NFC.
This is setup for future work planned for the AArch64 Load/Store Opt pass.

llvm-svn: 242810
2015-07-21 17:42:04 +00:00
Matthias Braun c8b67e656b AArch64: Restrict macroop fusion heuristics to cyclone.
Even though this is just some hinting for the scheduler it doesn't make
sense to do that unless you know the target can perform the fusion.

llvm-svn: 242732
2015-07-20 23:11:42 +00:00
JF Bastien e4d22d59d1 Targets: commonize some stack realignment code
This patch does the following:
* Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`.
* Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute.

Multiple targets duplicated the same `needsStackRealignment` code:
 - Aarch64.
 - ARM.
 - Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has.
 - PowerPC.
 - WebAssembly.
 - x86 almost: has an extra `-force-align-stack` option, which the default implementation now has.

The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects:
 - AMDGPU
 - BPF
 - CppBackend
 - MSP430
 - NVPTX
 - Sparc
 - SystemZ
 - XCore
 - Out-of-tree targets
This is a breaking change! `make check` passes.

The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation.

`needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone.

Reviewers: sunfish

Subscribers: aemerson, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11160

llvm-svn: 242727
2015-07-20 22:51:32 +00:00
Matthias Braun e536f4f681 AArch64: Add aditional Cyclone macroop fusion opportunities
Related to rdar://19205407

Differential Revision: http://reviews.llvm.org/D10746

llvm-svn: 242724
2015-07-20 22:34:47 +00:00
Geoff Berry e41c2df0ef Fix comment typo (test commit). NFC
llvm-svn: 242719
2015-07-20 22:03:52 +00:00
Chad Rosier 3da0ea7f5d [AArch64] Change EON pattern to match more often.
Phabricator: http://reviews.llvm.org/D11359
Patch by Geoff Berry <gberry@codeaurora.org>

llvm-svn: 242694
2015-07-20 18:42:27 +00:00
James Molloy faf4e3c33b [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
No functional change, but it preps codegen for the future when SABSDIFF
will start getting generated in anger.

llvm-svn: 242545
2015-07-17 17:10:45 +00:00
Tim Northover a5740e0874 AArch64: add comment missed out from earlier patch.
Helps explain some of the background behind this bit of code.

llvm-svn: 242503
2015-07-17 03:31:50 +00:00
Tim Northover ca0ffc3561 AArch64: make inexact signalling on round Darwin-specific
C11 leaves the choice on whether round-to-integer operations set the inexact
flag implementation-defined. Darwin does expect it to be set, but this seems to
be against the intent of the IEEE document and slower to implement anyway. So
it should be opt-in.

llvm-svn: 242446
2015-07-16 21:30:21 +00:00
Matthias Braun af7d7709d6 AArch64: Implement conditional compare sequence matching.
This is a new iteration of the reverted r238793 /
http://reviews.llvm.org/D8232 which wrongly assumed that any and/or
trees can be represented by conditional compare sequences, however there
are some restrictions to that. This version fixes this and adds comments
that explain exactly what types of and/or trees can actually be
implemented as conditional compare sequences.

Related to http://llvm.org/PR20927, rdar://18326194

Differential Revision: http://reviews.llvm.org/D10579

llvm-svn: 242436
2015-07-16 20:02:37 +00:00
Mehdi Amini bd7287ebe5 Move most user of TargetMachine::getDataLayout to the Module one
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

This patch is quite boring overall, except for some uglyness in
ASMPrinter which has a getDataLayout function but has some clients
that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so
some methods are taking a DataLayout as parameter.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11090

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242386
2015-07-16 06:11:10 +00:00
Petr Pavlu 097adfb98c [AArch64] Fix problems in decoding generic MSR instructions
Bitpatterns rejected by the decoder method of `MSR (immediate)` should be
decoded as the `extended MSR (register)` instruction.

Differential Revision: http://reviews.llvm.org/D7174

llvm-svn: 242276
2015-07-15 08:10:30 +00:00
JF Bastien c8f48c19d3 WebAssembly: fix build breakage.
Summary:
processFunctionBeforeCalleeSavedScan was renamed to determineCalleeSaves and now takes a BitVector parameter as of rL242165, reviewed in http://reviews.llvm.org/D10909

WebAssembly is still marked as experimental and therefore doesn't build by default. It does, however, grep by default! I notice that processFunctionBeforeCalleeSavedScan is still mentioned in a few comments and error messages, which I also fixed.

Reviewers: qcolombet, sunfish

Subscribers: jfb, dsanders, hfinkel, MatzeB, llvm-commits

Differential Revision: http://reviews.llvm.org/D11199

llvm-svn: 242242
2015-07-14 23:06:07 +00:00
Matthias Braun 9912bb817c MachineRegisterInfo: Remove UsedPhysReg infrastructure
We have a detailed def/use lists for every physical register in
MachineRegisterInfo anyway, so there is little use in maintaining an
additional bitset of which ones are used.

Removing it frees us from extra book keeping. This simplifies
VirtRegMap.

Differential Revision: http://reviews.llvm.org/D10911

llvm-svn: 242173
2015-07-14 17:52:07 +00:00
Matthias Braun 0256486532 PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
  the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
  physcial registers which are only read but never modified.

Related to rdar://21539507

Differential Revision: http://reviews.llvm.org/D10909

llvm-svn: 242165
2015-07-14 17:17:13 +00:00
Tim Northover c962d4f28b AArch64: add rev64 alias for 64-bit rev instruction.
It could be useful to assembly programmers and makes the permitted variants a
little more uniform.

llvm-svn: 242164
2015-07-14 17:07:29 +00:00
Duncan P. N. Exon Smith 754e21f244 MC: Remove MCSubtargetInfo() default constructor
Force all creators of `MCSubtargetInfo` to immediately initialize it,
merging the default constructor and the initializer into an initializing
constructor.  Besides cleaning up the code a little, this makes it clear
that the initializer is never called again later.

Out-of-tree backends need a trivial change: instead of calling:

    auto *X = new MCSubtargetInfo();
    InitXYZMCSubtargetInfo(X, ...);
    return X;

they should call:

    return createXYZMCSubtargetInfoImpl(...);

There's no real functionality change here.

llvm-svn: 241957
2015-07-10 22:43:42 +00:00
Evgeniy Stepanov 00b3020453 Fix AArch64 prologue for empty frame with dynamic allocas.
Fixes PR23804: assertion failure in emitPrologue in the case of a
function with an empty frame and a dynamic alloca that needs stack
realignment. This is a typical case for AddressSanitizer.

llvm-svn: 241943
2015-07-10 21:24:07 +00:00
JF Bastien b73a2ed20e Target RegisterInfo: devirtualize TargetFrameLowering
Summary:
The target frame lowering's concrete type is always known in RegisterInfo, yet it's only sometimes devirtualized through a static_cast. This change adds an auto-generated static function <Target>GenRegisterInfo::getFrameLowering(const MachineFunction &MF) which does this devirtualization, and uses this function in all targets which can.

This change was suggested by sunfish in D11070 for WebAssembly, I figure that I may as well improve the other targets while I'm here.

Subscribers: sunfish, ted, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11093

llvm-svn: 241921
2015-07-10 18:13:17 +00:00
Pat Gavlin a717f255b6 Allow {e,r}bp as the target of {read,write}_register.
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.

Differential Revision: http://reviews.llvm.org/D10977

llvm-svn: 241827
2015-07-09 17:40:29 +00:00
Mehdi Amini eaabc51e78 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Arnaud A. de Grandmaison f40f99e3a4 [AArch64] Select SBFIZ or UBFIZ instead of left + right shifts
And rename LSB to Immr / MSB to Imms to match the ARM ARM terminology.

llvm-svn: 241803
2015-07-09 14:33:38 +00:00
Renato Golin 17d4efe7c1 Add support for nest attribute to AArch64 backend
The nest attribute is currently supported on the x86 (32-bit) and x86-64
backends, but not on ARM (32-bit) or AArch64. This patch adds support for
nest to the AArch64 backend.

Register x18 is used by GCC for this purpose and hence is used here.
As discussed on the GCC mailing list the register choice is an ABI issue
and so choosing the same register as GCC means __builtin_call_with_static_chain
is compatible.

Patch by Stephen Cross.

llvm-svn: 241794
2015-07-09 10:18:02 +00:00
Mehdi Amini 157e5a6d10 Remove getDataLayout() from TargetSelectionDAGInfo (had no users)
Summary:
Remove empty subclass in the process.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted

Differential Revision: http://reviews.llvm.org/D11045

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
2015-07-09 02:10:08 +00:00
Mehdi Amini a749f2ad47 Remove getDataLayout() from TargetLowering
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11042

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
2015-07-09 02:09:52 +00:00
Mehdi Amini 0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini 9639d650bb Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini 44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Mehdi Amini 5010ebf181 Make TargetTransformInfo keeping a reference to the Module DataLayout
DataLayout is no longer optional. It was initialized with or without
a DataLayout, and the DataLayout when supplied could have been the
one from the TargetMachine.

Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11021

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241774
2015-07-09 02:08:42 +00:00
Mehdi Amini 56228dabfa Redirect DataLayout from TargetMachine to Module in ComputeValueVTs()
Summary:
Avoid using the TargetMachine owned DataLayout and use the Module owned
one instead. This requires passing the DataLayout up the stack to
ComputeValueVTs().

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11019

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241773
2015-07-09 01:57:34 +00:00
Arnold Schwaighofer 3d43f66c91 Add more nvcasts
Tim Northover has told me that they can occur when the compiler cleverly
constructs constants - as demonstrated in the test case.

rdar://21703486

llvm-svn: 241641
2015-07-07 23:13:18 +00:00
Arnold Schwaighofer 4bc34b1515 Add a pattern for a nvcast from v2f64 -> v4f32
Since the NvCast is generated by the selection process the concerns about
endianess and bit reversal don't apply.

rdar://21703486

llvm-svn: 241611
2015-07-07 18:31:55 +00:00
Daniel Sanders f423f5627c Change the last few internal StringRef triples into Triple objects.
Summary:
This concludes the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

At this point, the StringRef-form of GNU Triples should only be used in the
public API (including IR serialization) and a couple objects that directly
interact with the API (most notably the Module class). The next step is to
replace these Triple objects with the TargetTuple object that will represent
our authoratative/unambiguous internal equivalent to GNU Triples.

Reviewers: rengolin

Subscribers: llvm-commits, jholewinski, ted, rengolin

Differential Revision: http://reviews.llvm.org/D10962

llvm-svn: 241472
2015-07-06 16:56:07 +00:00
Daniel Sanders fbdab437f0 Where Triple has a suitable predicate, use it rather than the enum values. NFC.
Reviewers: mcrosier

Subscribers: llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10960

llvm-svn: 241469
2015-07-06 16:33:18 +00:00
Chad Rosier 85a346395e Fix a bug in the A57FPLoadBalancing register tracking/scavenger.
The code in AArch64A57FPLoadBalancing::scavengeRegister() to handle dead defs
was not correctly handling aliased registers.  E.g. if the dead def was of D2,
then S2 was not being marked as unavailable, so it could potentially be used
across a live-range in which it would be clobbered.

Patch by Geoff Berry <gberry@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D10900

llvm-svn: 241449
2015-07-06 14:46:34 +00:00
Peter Collingbourne 6a9d1774d0 IR: Do not consider available_externally linkage to be linker-weak.
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.

Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.

Differential Revision: http://reviews.llvm.org/D10941

llvm-svn: 241413
2015-07-05 20:52:35 +00:00
Benjamin Kramer 9bfb627a0e [TargetLowering] StringRefize asm constraint getters.
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.

llvm-svn: 241411
2015-07-05 19:29:18 +00:00
Sanjay Patel 910d5daa4b fix formatting; NFC
llvm-svn: 241175
2015-07-01 17:58:53 +00:00
Arnaud A. de Grandmaison 650c520007 [AArch64] Implement add/adds/sub/subs/cmp/cmn with negative immediate aliases
This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn
with a negative immediate operand and convert them as shown:

  add  Rd, Rn, -imm -> sub  Rd, Rn, imm
  sub  Rd, Rn, -imm -> add  Rd, Rn, imm
  adds Rd, Rn, -imm -> subs Rd, Rn, imm
  subs Rd, Rn, -imm -> adds Rd, Rn, imm
  cmp  Rn, -imm     -> cmn  Rn, imm
  cmn  Rn, -imm     -> cmp  Rn, imm

Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers (gas). They are documented in the "ARMv8 Instruction Set
Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc
a programmer-friendly assembler !

This also fixes PR20978: "Assembly handling of adding negative numbers
not as smart as gas".

llvm-svn: 241166
2015-07-01 15:05:58 +00:00
Ranjeet Singh 86ecbb7b54 Reverting r241058 because it's causing buildbot failures.
llvm-svn: 241061
2015-06-30 12:32:53 +00:00
Ranjeet Singh 5b119091a1 There are a few places where subtarget features are still
represented by uint64_t, this patch replaces these
usages with the FeatureBitset (std::bitset) type.

Differential Revision: http://reviews.llvm.org/D10542

llvm-svn: 241058
2015-06-30 11:30:42 +00:00
Pete Cooper 3af9a25b65 Add op_values() to iterate over the SDValue operands of an SDNode.
SDNode already had ops() which would iterate over the operands and return
SDUse*.  This version instead gets the SDValue's out of the SDUse's so that
we can use foreach in more places.

Reviewed by David Blaikie.

llvm-svn: 240805
2015-06-26 18:17:36 +00:00
Rafael Espindola c5fb508c9d Optimize the creation of mapping symbols.
No need to create two symbols just to assign one to the other.

llvm-svn: 240773
2015-06-26 11:31:13 +00:00
Hao Liu 7ec8ee3119 [AArch64] Lower interleaved memory accesses to ldN/stN intrinsics. This patch also adds a function to calculate the cost of interleaved memory accesses.
E.g. Lower an interleaved load:
        %wide.vec = load <8 x i32>, <8 x i32>* %ptr
        %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>
        %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>
     into:
        %ld2 = { <4 x i32>, <4 x i32> } call llvm.aarch64.neon.ld2(%ptr)
        %vec0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
        %vec1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Lower an interleaved store:
        %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
        store <12 x i32> %i.vec, <12 x i32>* %ptr
     into:
        %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>
        %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>
        %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>
        call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)

Differential Revision: http://reviews.llvm.org/D10533

llvm-svn: 240754
2015-06-26 02:32:07 +00:00
Benjamin Kramer e61cbd1f3a Replace copy-pasted debug value skipping with MBB::getLastNonDebugInstr
No functional change intended.

llvm-svn: 240639
2015-06-25 13:28:24 +00:00
Rafael Espindola ce4c2bc1d6 Use MCSymbols for FastISel.
The summary is that it moves the mangling earlier and replaces a few
calls to .addExternalSymbol with addSym.

I originally wanted to replace all the uses of addExternalSymbol with
addSym, but noticed it was a lot of work and doesn't need to be done
all at once.

llvm-svn: 240395
2015-06-23 12:21:54 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Sanjay Patel cfe0393b82 name change: hasPattern() -> getMachineCombinerPatterns() ; NFC
This was suggested as part of D10460, but it's independent of
any functional change.

llvm-svn: 240192
2015-06-19 23:21:42 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Eric Christopher 572e03a396 Fix "the the" in comments.
llvm-svn: 240112
2015-06-19 01:53:21 +00:00
Rafael Espindola dfe2d359c5 Move IsUsedInReloc from MCSymbolELF to MCSymbol.
There is a free bit is MCSymbol and MachO needs the same information.

llvm-svn: 239933
2015-06-17 20:08:20 +00:00
Matthias Braun 8321006d44 Revert "AArch64: Use CMP;CCMP sequences for and/or/setcc trees."
The patch triggers a miscompile on SPEC 2006 403.gcc with the (ref)
200.i and scilab.i inputs. I opened PR23866 to track analysis of this.

This reverts commit r238793.

llvm-svn: 239880
2015-06-17 04:02:32 +00:00
Daniel Sanders c81f450f1a Clean up redundant copies of Triple objects. NFC
Summary:

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10382

llvm-svn: 239823
2015-06-16 15:44:21 +00:00
Ahmed Bougacha 8c7754b965 [AArch64] Generalize extract-high DUP extension to MOVI/MVNI.
These are really immediate DUPs, and suffer from the same problem
with long instructions with a high/2 variant (e.g. smull).

By extending a MOVI (or DUP, before this patch), we can avoid an ext
on the other operand of the long instruction, e.g. turning:
    ext.16b v0, v0, v0, #8
    movi.4h v1, #0x53
    smull.4s  v0, v0, v1
into:
    movi.8h v1, #0x53
    smull2.4s  v0, v0, v1

While there, add a now-necessary combine to fold (VT NVCAST (VT x)).

llvm-svn: 239799
2015-06-16 01:18:14 +00:00
Sanjoy Das b666ea369c [TargetInstrInfo] Rename getLdStBaseRegImmOfs and implement for x86.
Summary:

TargetInstrInfo::getLdStBaseRegImmOfs to
TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86.  The
implementation only handles a few easy cases now and will be made more
sophisticated in the future.

This is NFCI: the only user of `getLdStBaseRegImmOfs` (now
`getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion`
is disabled for x86.

Reviewers: reames, ab, MatzeB, atrick

Reviewed By: MatzeB, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10199

llvm-svn: 239741
2015-06-15 18:44:14 +00:00
Evgeny Astigeevich ff1f4be4c7 On behalf of Alexandros Lamprineas:
LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned
data at -O0/fast-isel (-mno-unaligned-access).
The root cause seems to be in fast-isel not producing unaligned access correctly
for -mno-unaligned-access.

The patch just aborts fast-isel for loads and stores when -mno-unaligned-access is
present. 
The regression test is updated to check this new test case (-mno-unaligned-access 
together with fast-isel).

Differential Revision: http://reviews.llvm.org/D10360

llvm-svn: 239732
2015-06-15 15:48:44 +00:00
Hao Liu 1c2e89a57a [AArch64] Delete two empty files, which should be removed by r239713.
llvm-svn: 239715
2015-06-15 02:56:40 +00:00
Hao Liu d0ca8d7edd [AArch64] Revert r239711 again. We need to discuss how to share code between AArch64 and ARM backend.
llvm-svn: 239713
2015-06-15 01:56:40 +00:00
Hao Liu cb070e3833 [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X.
This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X.

llvm-svn: 239711
2015-06-15 01:35:49 +00:00
Matthias Braun 39a2afc941 Rename TargetSubtargetInfo::enablePostMachineScheduler() to enablePostRAScheduler()
r213101 changed the behaviour of this method to not only affect the
PostMachineScheduler scheduler but also the PostRAScheduler scheduler,
renaming should make this fact clear. Also document that the preferred
way is to specify this in the scheduling model instead of overriding
this method.

Differential Revision: http://reviews.llvm.org/D10427

llvm-svn: 239659
2015-06-13 03:42:16 +00:00
Tim Northover 02cfdbb7f1 AArch64: map bare-metal arm64-macho triple to MachO MC layer.
Far better than an assertion about expecting ELF.

llvm-svn: 239647
2015-06-12 23:37:11 +00:00
Daniel Sanders 3e5de88dac Replace string GNU Triples with llvm::Triple in TargetMachine. NFC.
Summary:
For the moment, TargetMachine::getTargetTriple() still returns a StringRef.

This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: ted, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10362

llvm-svn: 239554
2015-06-11 19:41:26 +00:00
Ahmed Bougacha c88bf54366 [CodeGen] ArrayRef'ize cond/pred in various TII APIs. NFC.
llvm-svn: 239553
2015-06-11 19:30:37 +00:00
Rafael Espindola 65d37e64a9 This reverts commit r239529 and r239514.
Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."

The  test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.

llvm-svn: 239544
2015-06-11 17:30:33 +00:00
Daniel Sanders ed64d62c70 Replace string GNU Triples with llvm::Triple in computeDataLayout(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: llvm-commits, jfb, rengolin

Differential Revision: http://reviews.llvm.org/D10361

llvm-svn: 239538
2015-06-11 15:34:59 +00:00
Aaron Ballman b6b58b3152 Fixing MSVC 2013 build error.
llvm-svn: 239529
2015-06-11 13:06:02 +00:00
Hao Liu 4566d18e89 [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true"

E.g. Transform an interleaved load (Factor = 2):
       %wide.vec = load <8 x i32>, <8 x i32>* %ptr
       %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>  ; Extract even elements
       %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>  ; Extract odd elements
     Into:
       %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr)
       %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
       %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Transform an interleaved store (Factor = 2):
       %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7>  ; Interleaved vec
       store <8 x i32> %i.vec, <8 x i32>* %ptr
     Into:
       %v0 = shuffle %i.vec, undef, <0, 1, 2, 3>
       %v1 = shuffle %i.vec, undef, <4, 5, 6, 7>
       call void aarch64.neon.st2(%v0, %v1, %ptr)

llvm-svn: 239514
2015-06-11 09:05:02 +00:00
Daniel Sanders a73f1fdb19 Replace string GNU Triples with llvm::Triple in MCSubtargetInfo and create*MCSubtargetInfo(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10311

llvm-svn: 239467
2015-06-10 12:11:26 +00:00
Daniel Sanders 418caf5002 Replace string GNU Triples with llvm::Triple in MCAsmBackend subclasses and create*AsmBackend(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: echristo, rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10243

llvm-svn: 239464
2015-06-10 10:35:34 +00:00
Craig Topper 8e29d71623 Remove unnecessary conversion from StringRef to std::string and back to StringRef. NFC.
llvm-svn: 239455
2015-06-10 02:07:37 +00:00
Chad Rosier cf90acc104 [AArch64] Remove an overly conservative check when generating store pairs.
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.

Previously, the read of w1 (see below) prevented the formation of a stp.

        str      w0, [x2]
        ldr     w8, [x2, #8]
        add      w0, w8, w1
        str     w1, [x2, #4]
        ret

We now generate the following code.

        stp      w0, w1, [x2]
        ldr     w8, [x2, #8]
        add      w0, w8, w1
        ret

All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.

llvm-svn: 239432
2015-06-09 20:59:41 +00:00
Matt Arsenault 8b643559d4 MC: Add target hook to control symbol quoting
llvm-svn: 239370
2015-06-09 00:31:39 +00:00
Ranjeet Singh 10511a493e [AArch64] AsmParser should be case insensitive about accepting vector register names.
Differential Revision: http://reviews.llvm.org/D10320

llvm-svn: 239353
2015-06-08 21:32:16 +00:00
Keno Fischer e70b31fc1b [InstrInfo] Refactor foldOperandImpl to thread through InsertPt. NFC
Summary:
This was a longstanding FIXME and is a necessary precursor to cases
where foldOperandImpl may have to create more than one instruction
(e.g. to constrain a register class). This is the split out NFC changes from
D6262.

Reviewers: pete, ributzka, uweigand, mcrosier

Reviewed By: mcrosier

Subscribers: mcrosier, ted, llvm-commits

Differential Revision: http://reviews.llvm.org/D10174

llvm-svn: 239336
2015-06-08 20:09:58 +00:00
Javed Absar e1c7dc3ee2 ARM]: Add support for MMFR4_EL1 in assembler
This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler.
This register provides information about the implemented memory model and memory management support.

llvm-svn: 239302
2015-06-08 15:01:11 +00:00
Jim Grosbach 36e60e9127 MC: Clean up naming in MCObjectWriter. NFC.
s/WriteObject/writeObject/
s/RecordRelocation/recordRelocation/
s/IsSymbolRefDifferenceFullyResolved/isSymbolRefDifferenceFullyResolved/
s/Write8/write8/
s/WriteLE16/writeLE16/
s/WriteLE32/writeLE32/
s/WriteLE64/writeLE64/
s/WriteBE16/writeBE16/
s/WriteBE32/writeBE32/
s/WriteBE64/writeBE64/
s/Write16/write16/
s/Write32/write32/
s/Write64/write64/
s/WriteZeroes/writeZeroes/
s/WriteBytes/writeBytes/

llvm-svn: 239108
2015-06-04 22:24:41 +00:00
Ahmed Bougacha 8207641251 [GlobalMerge] Take into account minsize on Global users' parents.
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.

Differential Revision: http://reviews.llvm.org/D10054

llvm-svn: 239087
2015-06-04 20:39:23 +00:00
Jim Grosbach 7c76b4cc6e MC: Remove obsolete MachO UseAggressiveSymbolFolding.
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.

rdar://21201804

llvm-svn: 239084
2015-06-04 20:27:42 +00:00
Benjamin Kramer 50e2a29385 Replace custom fixed endian to raw_ostream emission with EndianStream.
Less code, clearer and more efficient. No functionality change intended.

llvm-svn: 239040
2015-06-04 15:03:02 +00:00
Daniel Sanders 7813ae879e Replace string GNU Triples with llvm::Triple in MCAsmInfo subclasses and create*AsmInfo(). NFC.
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: ted, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10236

llvm-svn: 239036
2015-06-04 13:12:25 +00:00
Rafael Espindola f8794ff29d Remove MCELFSymbolFlags.h. It is now internal to MCSymbolELF.
llvm-svn: 238996
2015-06-04 00:47:43 +00:00
Rafael Espindola 95fb9b93ed Merge MCELF.h into MCSymbolELF.h.
Now that we have a dedicated type for ELF symbol, these helper functions can
become member function of MCSymbolELF.

llvm-svn: 238864
2015-06-02 20:38:46 +00:00
Tim Northover 3f3a4d8503 AArch64: fix typo in SMIN far atomics and add tests
llvm-svn: 238858
2015-06-02 18:37:20 +00:00
Vladimir Sukharev 5f6f60d942 [AArch64] Add v8.1a atomic instructions
Patch by: Tom Coxon

Reviewers: t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8501

llvm-svn: 238818
2015-06-02 10:58:41 +00:00
Matthias Braun 72b8f74813 AArch64: Use CMP;CCMP sequences for and/or/setcc trees.
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.

PR20927, rdar://18326194

Differential Revision: http://reviews.llvm.org/D8232

llvm-svn: 238793
2015-06-01 22:31:17 +00:00
Luke Cheeseman 85fd06d389 Re-commit of r238201 with fix for building with shared libraries.
llvm-svn: 238739
2015-06-01 12:02:47 +00:00
Matt Arsenault bd7d80a4a6 Add address space argument to isLegalAddressingMode
This is important because of different addressing modes
depending on the address space for GPU targets.

This only adds the argument, and does not update
any of the uses to provide the correct address space.

llvm-svn: 238723
2015-06-01 05:31:59 +00:00
Jim Grosbach 13760bd152 MC: Clean up MCExpr naming. NFC.
llvm-svn: 238634
2015-05-30 01:25:56 +00:00
Rafael Espindola 4d37b2a259 Remove getData.
This completes the mechanical part of merging MCSymbol and MCSymbolData.

llvm-svn: 238617
2015-05-29 21:45:01 +00:00
Rafael Espindola beb6060a51 Remove the MCSymbolData typedef.
The getData member function is next.

llvm-svn: 238611
2015-05-29 20:41:47 +00:00
Rafael Espindola b5d316bfc3 Rename getOrCreateSymbolData to registerSymbol and return void.
Another step in merging MCSymbol and MCSymbolData.

llvm-svn: 238607
2015-05-29 20:21:02 +00:00
Rafael Espindola e3b2acf274 Pass MCSymbols to the helper functions in MCELF.h.
llvm-svn: 238596
2015-05-29 18:47:23 +00:00
Rafael Espindola 3a5d3cce80 Remove a trivial forwarding function. NFC.
llvm-svn: 238506
2015-05-28 21:36:02 +00:00
Chad Rosier adc06311ba Reuse Loc variable. NFC.
llvm-svn: 238448
2015-05-28 18:18:21 +00:00
Rafael Espindola f4a1365387 Use operator<< instead of print in a few more places.
llvm-svn: 238315
2015-05-27 13:05:42 +00:00
Diego Novillo bfecc06656 Revert "Re-commit changes in r237579 with fix for bug breaking windows builds."
This reverts commit r238201 to fix linking problems in x86 Linux
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html

llvm-svn: 238223
2015-05-26 17:45:38 +00:00
Luke Cheeseman a5d053d6f4 Re-commit changes in r237579 with fix for bug breaking windows builds.
llvm-svn: 238201
2015-05-26 13:40:31 +00:00
Michael Kuperstein db0712f986 Use std::bitset for SubtargetFeatures.
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.

The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures.
Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables. 
This should now be fixed.

llvm-svn: 238192
2015-05-26 10:47:10 +00:00
Rafael Espindola 61e724a8c5 Stop using MCSectionData in MCMachObjectWriter.h.
llvm-svn: 238165
2015-05-26 01:15:30 +00:00
Rafael Espindola 7549f87672 Return a MCSection from MCFragment::getParent().
Another step in merging MCSectionData and MCSection.

llvm-svn: 238162
2015-05-26 00:36:57 +00:00
Rafael Espindola 6e6820a7e6 Stop forwarding getOrdinal and setOrdinal.
llvm-svn: 238139
2015-05-25 14:12:48 +00:00
Benjamin Kramer be48c40475 [AArch64] Clean up the ELF streamer a bit.
llvm-svn: 238102
2015-05-23 16:39:10 +00:00
Benjamin Kramer 1d1b9243d5 [AArch64] Move AArch64TargetStreamer out of MCStreamer.h
It doesn't belong in the shared MC layer. NFC.

llvm-svn: 238101
2015-05-23 16:15:10 +00:00
Chad Rosier a73b359542 Use new MachineInstr mayLoadOrStore() API.
llvm-svn: 237965
2015-05-21 21:59:57 +00:00
Chad Rosier ce8e5abbaf [AArch64] Enhance the load/store optimizer with target-specific alias analysis.
Phabricator: http://reviews.llvm.org/D9863
llvm-svn: 237963
2015-05-21 21:36:46 +00:00
David Blaikie 457343dcaa [opaque pointer type] Allow gep_type_iterator to work with the pointee type from the GEP instruction
The raw non-instruction/constant form of this is still relying on being
able to access the pointee type from a pointer type - those will be
cleaned up later. For now, just focus on the cases where the pointee
type is easily accessible.

llvm-svn: 237958
2015-05-21 21:12:43 +00:00
Rafael Espindola 0709a7bd1a Move alignment from MCSectionData to MCSection.
This starts merging MCSection and MCSectionData.

There are a few issues with the current split between MCSection and
MCSectionData.

* It optimizes the the not as important case. We want the production
of .o files to be really fast, but the split puts the information used
for .o emission in a separate data structure.

* The ELF/COFF/MachO hierarchy is not represented in MCSectionData,
leading to some ad-hoc ways to represent the various flags.

* It makes it harder to remember where each item is.

The attached patch starts merging the two by moving the alignment from
MCSectionData to MCSection.

Most of the patch is actually just dropping 'const', since
MCSectionData is mutable, but MCSection was not.

llvm-svn: 237936
2015-05-21 19:20:38 +00:00
Duncan P. N. Exon Smith 92a699c50e MC: Remove most remaining uses of MCSymbolData::getSymbol(), NFC
Remove most remaining calls to `MCSymbolData::getSymbol()`, instead
using the already available `MCSymbol` directly.

llvm-svn: 237829
2015-05-20 20:18:16 +00:00
Pete Cooper 9e1d335697 Change Function::getIntrinsicID() to return an Intrinsic::ID. NFC.
Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.

This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.

llvm-svn: 237810
2015-05-20 17:16:39 +00:00
Duncan P. N. Exon Smith fd27a1dc1b MC: Update MCAssembler to use MCSymbol, NFC
Use `MCSymbol` over `MCSymbolData` where both are needed.

llvm-svn: 237803
2015-05-20 16:02:11 +00:00
Duncan P. N. Exon Smith 99d8a8e8ac MC: Take MCSymbol in MachObjectWriter::getSymbolAddress(), NFC
Pass through an `MCSymbol` instead of an `MCSymbolData` so we can get
rid of the back pointer.

llvm-svn: 237750
2015-05-20 00:02:39 +00:00
Duncan P. N. Exon Smith 2a40483418 MC: Use MCSymbol in MCAsmLayout::getSymbolOffset(), NFC
Continue to canonicalize on MCSymbol instead of MCSymbolData when both
are needed.

llvm-svn: 237749
2015-05-19 23:53:20 +00:00
Matthias Braun 07066cca20 MachineInstr: Remove unused parameter.
llvm-svn: 237726
2015-05-19 21:22:20 +00:00
Pete Cooper f0cd2b49f5 Remove unnecessary cast. NFC
llvm-svn: 237722
2015-05-19 20:50:14 +00:00
David Blaikie ff6409d096 Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced init only
llvm-svn: 237624
2015-05-18 22:13:54 +00:00
Tim Northover d6223a2471 AArch64: work around ld64 bug more aggressively.
ld64 currently mishandles internal pointer relocations (i.e.
ARM64_RELOC_UNSIGNED referred to by section & offset rather than symbol). The
existing __cfstring clause was an early discovery and workaround for this, but
the problem is wider and we should avoid such relocations wherever possible for
now.

This code should be reverted to allowing internal relocations as soon as
possible.

PR23437.

llvm-svn: 237621
2015-05-18 22:07:20 +00:00
Matthias Braun fa3872e7ad MachineInstr: Change return value of getOpcode() to unsigned.
This was previously returning int. However there are no negative opcode
numbers and more importantly this was needlessly different from
MCInstrDesc::getOpcode() (which even is the value returned here) and
SDValue::getOpcode()/SDNode::getOpcode().

llvm-svn: 237611
2015-05-18 20:27:55 +00:00
Jim Grosbach 6f482000e9 MC: Clean up method names in MCContext.
The naming was a mish-mash of old and new style. Update to be consistent
with the new. NFC.

llvm-svn: 237594
2015-05-18 18:43:14 +00:00
Oliver Stannard 6cb23465e0 Revert r237579, as it broke windows buildbots
llvm-svn: 237583
2015-05-18 16:39:16 +00:00
Oliver Stannard 0c553afe6a [LLVM - ARM/AArch64] Add ACLE special register intrinsics
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.

This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.

Patch by Luke Cheeseman.

Differential Revision: http://reviews.llvm.org/D9699

llvm-svn: 237579
2015-05-18 16:23:33 +00:00
Duncan P. N. Exon Smith 6e23e5a680 MC: Use MCSymbol in RelAndSymbol, NFC
Switch from `MCSymbolData` to `MCSymbol`.

llvm-svn: 237502
2015-05-16 01:14:19 +00:00
Duncan P. N. Exon Smith 09bfa58edd MC: Change MCFragment::Atom to an MCSymbol, NFC
Change `MCFragment::Atom` from an `MCSymbolData` to an `MCSymbol`,
moving in the direction of removing the back-pointer.

llvm-svn: 237497
2015-05-16 00:48:58 +00:00
Jim Grosbach 4c98cf77d9 MC: MCCodeGenInfo naming update. NFC.
s/InitMCCodeGenInfo/initMCCodeGenInfo/

llvm-svn: 237471
2015-05-15 19:13:31 +00:00
Jim Grosbach 91df21f740 MC: Update MCCodeEmitter naming. NFC.
s/EncodeInstruction/encodeInstruction/

llvm-svn: 237469
2015-05-15 19:13:16 +00:00
Jim Grosbach 63661f8d73 MC: Update MCFixup naming. NFC.
s/MCFixup::Create/MCFixup::create/

llvm-svn: 237468
2015-05-15 19:13:05 +00:00
James Molloy cfb0443af6 Mark SMIN/SMAX/UMIN/UMAX nodes as legal and add patterns for them.
The new [SU]{MIN,MAX} SDNodes can be lowered directly to instructions for
most NEON datatypes - the big exclusion being v2i64.

llvm-svn: 237455
2015-05-15 16:15:57 +00:00
Artyom Skrobov a70dfe18d3 Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases
No longer breaks SPEC2000/2006

llvm-svn: 237361
2015-05-14 12:59:46 +00:00
Vladimir Sukharev 8ccf0a3aa7 [AArch64] Slight naming changes and comments for AArch64NamedImmMapper
Reviewers: echristo

Subscribers: llvm-commits

Follow-up to: http://reviews.llvm.org/D8496#158595

Relates to: http://reviews.llvm.org/rL235089

llvm-svn: 237354
2015-05-14 09:50:14 +00:00
Jim Grosbach e9119e41ef MC: Modernize MCOperand API naming. NFC.
MCOperand::Create*() methods renamed to MCOperand::create*().

llvm-svn: 237275
2015-05-13 18:37:00 +00:00
Silviu Baranga 780a3b3be7 Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006
llvm-svn: 237256
2015-05-13 14:03:18 +00:00
Artyom Skrobov b526681e08 [AArch64] Codegen VMAX/VMIN for safe math cases
llvm-svn: 237247
2015-05-13 12:01:09 +00:00
Michael Kuperstein c3434b390d Reverting r237234, "Use std::bitset for SubtargetFeatures"
The buildbots are still not satisfied.
MIPS and ARM are failing (even though at least MIPS was expected to pass).

llvm-svn: 237245
2015-05-13 10:28:46 +00:00
Michael Kuperstein aba4a34ef2 Use std::bitset for SubtargetFeatures
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.

The first two times this was committed (r229831, r233055), it caused several buildbot failures. 
At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed.

llvm-svn: 237234
2015-05-13 08:27:08 +00:00
Arnold Schwaighofer f54b73d681 ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.

Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.

Fix for PR23459.

rdar://20740035

llvm-svn: 236916
2015-05-08 23:52:00 +00:00
Matthias Braun d04893fa36 Change getTargetNodeName() to produce compiler warnings for missing cases, fix them
llvm-svn: 236775
2015-05-07 21:33:59 +00:00
Pete Cooper f52123b454 [AArch64] Fix sext/zext folding in address arithmetic.
We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there.

Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use.

llvm-svn: 236764
2015-05-07 19:21:36 +00:00
Wei Mi 062c74484d [X86] Disable loop unrolling in loop vectorization pass when VF is 1.
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.

Differential Revision: http://reviews.llvm.org/D9515

llvm-svn: 236613
2015-05-06 17:12:25 +00:00
Quentin Colombet 61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Quentin Colombet 0de2346859 [AArch64][FastISel] Variant of the logical instructions that use two input
registers cannot write on SP.

rdar://problem/20748715

llvm-svn: 236352
2015-05-01 21:34:57 +00:00
Quentin Colombet 9df2fa261b [AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences.
rdar://problem/20748715

llvm-svn: 236346
2015-05-01 20:57:11 +00:00
Quentin Colombet 329fa890ba [AArch64] Fix bad register class constraint in fast-isel for TST instruction.
rdar://problem/20748715

llvm-svn: 236273
2015-04-30 22:27:20 +00:00
Tim Northover 03b99f66d7 AArch64: add BFC alias for the BFI/BFM instructions.
Unlike 32-bit ARM, AArch64 can use wzr/xzr to implement this without the need
for a separate instruction.

rdar://18679590

llvm-svn: 236245
2015-04-30 18:28:58 +00:00
Manman Ren 0e20822887 [AArch64] Refactor out codes that depend on specific CS save sequence.
No functionality change.

llvm-svn: 236143
2015-04-29 20:03:38 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Sergey Dmitrouk 842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper 48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Ahmed Bougacha 190528703f [MC] Use LShr for constant evaluation of ">>" on ELF/arm64--darwin.
This matches other assemblers and is less unexpected (e.g. PR23227).
On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both
agree on LShr.  On COFF, I couldn't get my hands on an assembler yet,
so don't change the behavior.  For now, don't change it on non-AArch64
Darwin either, as the other assembler is gas v1.38, which does an AShr.

llvm-svn: 235963
2015-04-28 01:37:11 +00:00
Ahmed Bougacha c004c60c0a [AArch64] Also combine vector selects fed by non-i1 SETCCs.
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.

This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.

Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).

llvm-svn: 235922
2015-04-27 21:43:12 +00:00
Ahmed Bougacha 89bba61c84 [AArch64] Don't assert when combining (v3f32 select (setcc f64)).
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.

llvm-svn: 235917
2015-04-27 21:01:20 +00:00
Lang Hames 9ff69c8f4d [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.
AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a
reference for this is crufty.

llvm-svn: 235752
2015-04-24 19:11:51 +00:00
Pirama Arumuga Nainar 745615ca00 [AArch64] Add nvcast patterns for v4f16 and v8f16
Summary:
Constant stores of f16 vectors can create NvCast nodes from various
operand types to v4f16 or v8f16 depending on patterns in the stored
constants.  This patch adds nvcast rules with v4f16 and v8f16 values.

AArchISelLowering::LowerBUILD_VECTOR has the details on which constant
patterns generate the nvcast nodes.

Reviewers: jmolloy, srhines, ab

Subscribers: rengolin, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D9201

llvm-svn: 235610
2015-04-23 17:32:25 +00:00
Pirama Arumuga Nainar b18815354d [AArch64] Handle vec4, vec8, vec16 *itofp for half
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.

Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors.  Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.

The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64.  I am
adding a test here for completeness.

Reviewers: aemerson, rengolin, ab, jmolloy, srhines

Subscribers: rengolin, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D9166

llvm-svn: 235609
2015-04-23 17:16:27 +00:00
Pete Cooper 037b700b7f [AArch64] Use MachineRegisterInfo instead of LiveIntervals to calculate liveness. NFC.
The CondOpt pass currently uses LiveIntervals to set the dead flag on a def.  This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead.

This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64.

Reviewed by Chad Rosier and Zhaoshi.

llvm-svn: 235532
2015-04-22 18:05:13 +00:00
James Molloy cd2334e86e [AArch64] Disable complex GEP optimization by default.
Enough concerns were raised that this optimization is pessimising some code patterns.

The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.

llvm-svn: 235491
2015-04-22 09:11:38 +00:00
Vladimir Sukharev bad1d1dc02 [AArch64] LORID_EL1 register must be treated as read-only
Patch by: John Brawn

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9105

llvm-svn: 235314
2015-04-20 16:54:37 +00:00
Ahmed Bougacha e14a4d487e [AArch64] Don't force MVT::Untyped when selecting LD1LANEpost.
The result is either an Untyped reg sequence, on ldN with N > 1, or
just the type of the input vector, on ld1.  Don't force Untyped.
Instead, just use the type of the reg sequence.

This mirrors the behavior of createTuple, which feeds the LD1*_POST.

The narrow code path wasn't actually covered by tests, because V64
insert_vector_elt are widened to V128 before the LD1LANEpost combine
has the chance to run, usually.

The only case where it does run on V64 vectors is if the vector ops
legalizer ran.  So, tickle the code with a ctpop.

Fixes PR23265.

llvm-svn: 235243
2015-04-17 23:43:33 +00:00
Ahmed Bougacha 2448ef5f33 [AArch64] Avoid vector->load dependency cycles when creating LD1*post.
They would break the SelectionDAG.
Note that the opposite load->vector dependency is already obvious in:
  (LD1*post vec, ..)

llvm-svn: 235224
2015-04-17 21:02:30 +00:00
Benjamin Kramer 97fbdd5a39 [mc] Clean up emission of byte sequences
No functional change intended.

llvm-svn: 235178
2015-04-17 11:12:43 +00:00
Ahmed Bougacha 941420d9ea [AArch64] Don't assert on f16 in DUP PerfectShuffle generator.
Found by code inspection, but breaking i16 at least breaks other tests.
They aren't checking this in particular though, so also add some
explicit tests for the already working types.

llvm-svn: 235148
2015-04-16 23:57:07 +00:00
Pete Cooper 19d704d13c Disable AArch64 fast-isel on big-endian call vector returns.
A big-endian vector return needs a byte-swap which we aren't doing right now.

For now just bail on these cases to get correctness back.

llvm-svn: 235133
2015-04-16 21:19:36 +00:00
Vladimir Sukharev 6334cf3d69 [AArch64] Add v8.1a "Virtualization Host Extensions"
Reviewers: t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8500

Patch by: Tom Coxon

llvm-svn: 235107
2015-04-16 15:38:58 +00:00
Vladimir Sukharev d49cb8fdd7 [AArch64] Add v8.1a "Limited Ordering Regions" extension
Reviewers: 	t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8499

Patch by: Tom Coxon

llvm-svn: 235105
2015-04-16 15:30:43 +00:00
Vladimir Sukharev 251ce0c2db [AArch64] Add v8.1a "Privileged Access Never" extension
Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8498

llvm-svn: 235104
2015-04-16 15:20:51 +00:00
Vladimir Sukharev a11db3eb88 [AArch64] Handle Cyclone-specific register in common way
Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8584

Patch by: Tom Coxon

llvm-svn: 235102
2015-04-16 15:01:20 +00:00
Vladimir Sukharev 950b606a2b [AArch64] Follow-up to: Refactor AArch64NamedImmMapper to become dependent on subtarget features
Fixed compilation with clang on some buildbots with "-Werror -Wmissing-field-initializers"

Related to: http://reviews.llvm.org/rL235089

llvm-svn: 235099
2015-04-16 14:36:13 +00:00
Vladimir Sukharev a98f6897a2 [AArch64] Refactor AArch64NamedImmMapper to become dependent on subtarget features.
In order to introduce v8.1a-specific entities, Mappers should be aware of SubtargetFeatures available.

This patch introduces refactoring, that will then allow to easily introduce:

- v8.1-specific "pan" PState for PStateMapper (PAN extension)

- v8.1-specific sysregs for SysRegMapper (LOR,VHE extensions)

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8496

Patch by Tom Coxon

llvm-svn: 235089
2015-04-16 12:15:27 +00:00
James Molloy f8aa57aa3b [AArch64] Fix invalid use of references to BuildMI.
This was found in GCC PR65773 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65773).

We shouldn't be taking a reference to the temporary that BuildMI returns, we must copy it.

llvm-svn: 235088
2015-04-16 11:37:40 +00:00
Richard Trieu 6b1aa5f5e1 Change range-based for-loops to be -Wrange-loop-analysis clean.
No functionality change.

llvm-svn: 234963
2015-04-15 01:21:15 +00:00
Rafael Espindola 5560a4cfbd Use raw_pwrite_stream in the object writer/streamer.
The ELF object writer will take advantage of that in the next commit.

llvm-svn: 234950
2015-04-14 22:14:34 +00:00
Bradley Smith b913653b91 [AArch64] Allow non-standard INS/DUP encodings
The ARMv8 ARMARM states that for these instructions in A64 state:

  "Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS).

Make the disassembler accept any encoding with these ignored bits set to 1.

llvm-svn: 234896
2015-04-14 15:07:26 +00:00
Duncan P. N. Exon Smith 7348ddaa74 DebugInfo: Gut DIVariable and DIGlobalVariable
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`.  Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).

llvm-svn: 234840
2015-04-14 02:22:36 +00:00
Krzysztof Parzyszek a46c36b8f4 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Alexander Kornienko f817c1cb9a Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925

llvm-svn: 234679
2015-04-11 02:11:45 +00:00
Ahmed Bougacha b96444efd1 [CodeGen] Split -enable-global-merge into ARM and AArch64 options.
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.

Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.

llvm-svn: 234666
2015-04-11 00:06:36 +00:00
Quentin Colombet fd7475b5e8 [AArch64] Strengthen the code for the prologue insertion.
The spilled registers are pristine and thus, correctly handled by
the register scavenger and so on, but the liveness information is
strictly speaking wrong at this point.
Fix that.

llvm-svn: 234664
2015-04-10 23:14:34 +00:00
Chad Rosier 518659d9b4 [AArch64] Changes some SchedAlias to WriteRes for Cortex-A57.
Using SchedAliases is convenient and works well for latency and resource
lookup for instructions.  However, this creates an entry in
AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any
SchedReadAdvance since the lookup will fail.

http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 234594
2015-04-10 13:19:27 +00:00
Chad Rosier a82c876045 [AArch64] Adjusts Cortex-A57 machine model to handle zero shift.
http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 234593
2015-04-10 13:19:21 +00:00
Benjamin Kramer 619c4e57ba Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

llvm-svn: 234586
2015-04-10 11:24:51 +00:00
Ahmed Bougacha 1ffe7c7d36 [AArch64] Promote f16 operations to f32.
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.

Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.

f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.

While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.

Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755

llvm-svn: 234550
2015-04-10 00:08:48 +00:00
Juergen Ributzka bd0c7eb4dc [AArch64][FastISel] Fix integer extend optimization.
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.

The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.

This fixes rdar://problem/20470788.

llvm-svn: 234529
2015-04-09 20:00:46 +00:00
Rafael Espindola 49286e9f4a clang-format bits of code to make a followup patch easy to read.
llvm-svn: 234519
2015-04-09 18:32:58 +00:00
Kristof Beyls 17cb8982f4 [AArch64] Add support for dynamic stack alignment
Differential Revision: http://reviews.llvm.org/D8876

llvm-svn: 234471
2015-04-09 08:49:47 +00:00
Lang Hames 522bf13b83 [AArch64] Remove redundant -march option. Also fix a think-o from r234462.
llvm-svn: 234467
2015-04-09 05:34:57 +00:00
Lang Hames 903338511b [AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignment
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.

This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.

<rdar://problem/17829180>

llvm-svn: 234462
2015-04-09 03:40:33 +00:00
Tim Northover 5b44f1ba19 AArch64: disallow "fmov sD, #-0.0" during assembly.
We weren't checking the sign of the floating point immediate before translating
it to "fmov sD, wzr". Similarly for D-regs.

Technically "movi vD.2s, #0x80, lsl #24" would work most of the time, but it's
not a blessed alias (and I don't think it should be since people expect writing
sD to zero out the high lanes, and there's no dD equivalent). So an error it is.

rdar://20455398

llvm-svn: 234372
2015-04-07 22:49:47 +00:00
Matthias Braun b6ac8fa39e AArch64: Don't lower ISD::SELECT to ISD::SELECT_CC
Instead of lowering SELECT to SELECT_CC which is further lowered later
immediately call the SELECT_CC lowering code. This is preferable
because:
- Avoids an unnecessary roundtrip through the legalization queues with
  an intermediate node.
- More importantly: Lowered operations get visited last leading to SELECT_CC
  getting visited with legalized operands and unlegalized ones for preexisting
  SELECT_CC nodes. This does not hurt the current code (hence no testcase) but
  is required for another patch I am working on.

Differential Revision: http://reviews.llvm.org/D8187

llvm-svn: 234334
2015-04-07 17:33:05 +00:00
Rafael Espindola b91455b5c0 Refactor a lot of duplicated code for stub output.
This also moves it earlier so that it they are produced before we print
an end symbol for the data section.

llvm-svn: 234315
2015-04-07 13:42:44 +00:00
Duncan P. N. Exon Smith e686f1591f CodeGen: Stop using DIDescriptor::is*() and auto-casting
Same as r234255, but for lib/CodeGen and lib/Target.

llvm-svn: 234258
2015-04-06 23:27:40 +00:00
Quentin Colombet a64723c2bf [AArch64] Add a comment to make it explicit why we increased the complexity.
Follow-up of r233653.

llvm-svn: 233936
2015-04-02 18:54:23 +00:00
Vladimir Sukharev 439328e172 [AArch64] Rename v8.1a from "extension" to "architecture"
v8.1a is renamed to architecture, accordingly to approaches in ARM backend.

Excess generic cpu is removed. Intended use: "generic" cpu with "v8.1a" subtarget feature

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8766

llvm-svn: 233810
2015-04-01 14:49:29 +00:00
Quentin Colombet 6843ac470b [AArch64] Enable the codegenprepare optimization that promotes operation to form
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.

rdar://problem/19267165

llvm-svn: 233753
2015-03-31 20:52:32 +00:00
Vladimir Sukharev 297bf0eae0 [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension
Reviewers: t.p.northover, jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8502

llvm-svn: 233693
2015-03-31 13:15:48 +00:00
Quentin Colombet 387a0e7cce [AArch64] Fix poor codegen for add immediate.
We used to match the register variant before the immediate when the register
argument could be implicitly zero-extended.

llvm-svn: 233653
2015-03-31 00:31:13 +00:00
Eric Christopher f8019408dc Replace the MCSubtargetInfo parameter with a Triple when creating
an MCInstPrinter. Update all callers and use where we wanted a Triple
previously.

llvm-svn: 233648
2015-03-31 00:10:04 +00:00
Juergen Ributzka 5fe5ef9e0e Transfer implicit operands when expanding the RET_ReallyLR pseudo instruction.
When we expand the RET_ReallyLR pseudo instruction we also need to transfer the
implicit operands.

The return register is an implicit operand and without it the liveness
calculation generates an incorrect live-out set for the patchpoint.

This fixes rdar://problem/19068476.

llvm-svn: 233635
2015-03-30 22:45:56 +00:00
Eric Christopher 2226c72301 Remove unused MCSubtargetInfo argument from the AArch64 MCInstPrinter ctors.
llvm-svn: 233608
2015-03-30 21:52:26 +00:00
Eric Christopher c7c5592b7e Remove unused Target argument from MCInstPrinter ctor functions.
llvm-svn: 233607
2015-03-30 21:52:21 +00:00
Akira Hatanaka bceb2a5a1c [AArch64InstPrinter] Use the feature bits of the subtarget passed to the print
method.

This enables the instprinter to print a different system register name based on
the feature bits of the per-function subtarget. 

Differential Revision: http://reviews.llvm.org/D8668 

llvm-svn: 233412
2015-03-27 20:37:20 +00:00
Akira Hatanaka b46d0234a6 [MCInstPrinter] Enable MCInstPrinter to change its behavior based on the
per-function subtarget.

Currently, code-gen passes the default or generic subtarget to the constructors
of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which
enables some targets (AArch64, ARM, and X86) to change their instprinter's
behavior based on the subtarget feature bits. Since the backend can now use
different subtargets for each function, instprinter has to be changed to use the
per-function subtarget rather than the default subtarget.

This patch takes the first step towards enabling instprinter to change its
behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to
AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the
various print methods table-gen auto-generates. 

I will follow up with changes to instprinters of AArch64, ARM, and X86.

llvm-svn: 233411
2015-03-27 20:36:02 +00:00
Vladimir Sukharev 45523ffd07 [AArch64] Don't store available subtarget features in AArch64SysReg::SysRegMapper
Subtarget features must not be a part of the target machine. So, they are now not being stored in SysRegMapper, but provided each time fromString()/toString() are called

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8655

llvm-svn: 233386
2015-03-27 17:11:29 +00:00
Vladimir Sukharev edc71abedd [AArch64] Rename Pairs to Mappings in AArch64NamedImmMapper
Third element is to be added soon to "struct AArch64NamedImmMapper::Mapping". So its instances are renamed from ...Pairs to ...Mappings

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8582

llvm-svn: 233300
2015-03-26 17:57:39 +00:00
Vladimir Sukharev 017d10bb76 [AArch64] Move initializations of AArch64NamedImmMapper out of void AArch64Operand::print(...)
class AArch64NamedImmMapper is to become dependent of SubTargetFeatures, while class AArch64Operand don't have access to the latter. 

So, AArch64NamedImmMapper constructor invocations are refactored away from methods of AArch64Operand.

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8579

llvm-svn: 233297
2015-03-26 17:29:53 +00:00
Vladimir Sukharev c632cda8b2 [AArch64, ARM] Add v8.1a architecture and generic cpu
New architecture and cpu added, following http://community.arm.com/groups/processors/blog/2014/12/02/the-armv8-a-architecture-and-its-ongoing-development

Reviewers: t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8505

llvm-svn: 233290
2015-03-26 17:05:54 +00:00
Peter Collingbourne e8813e6c2c AArch64: use a different means to determine whether to byte swap relocations.
This code depended on a bug in the FindAssociatedSection function that would
cause it to return the wrong result for certain absolute expressions. Instead,
use EvaluateAsRelocatable.

llvm-svn: 233119
2015-03-24 21:47:03 +00:00
David Blaikie 186d2cbd1d Refactor: Simplify boolean expressions in AArch64 target
Simplify boolean expressions using `true` and `false` with `clang-tidy`

Patch by Richard Thomson.

Reviewed By: rengolin

Differential Revision: http://reviews.llvm.org/D8525

llvm-svn: 233089
2015-03-24 16:24:01 +00:00
Michael Kuperstein 29704e7fb4 Revert "Use std::bitset for SubtargetFeatures"
This reverts commit r233055.

It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time.

llvm-svn: 233068
2015-03-24 12:56:59 +00:00
Michael Kuperstein 774b441b5e Use std::bitset for SubtargetFeatures
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.

The first time this was committed (r229831), it caused several buildbot failures. 
At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed.

Differential Revision: http://reviews.llvm.org/D8542

llvm-svn: 233055
2015-03-24 09:17:25 +00:00
Ahmed Bougacha d1655cb1c0 [AArch64, ARM] Enable GlobalMerge with -O3 rather than -O1.
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.

Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization.  Lengthy discussion can be found
on llvmdev (AArch64-focused;  ARM has similar problems):
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.

GlobalMerge needs to better identify those cases that benefit, and this
will be done separately.  In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.

llvm-svn: 233024
2015-03-23 21:17:36 +00:00
Benjamin Kramer 799003bf8c Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
2015-03-23 19:32:43 +00:00
Benjamin Kramer 16132e6faa Purge unused includes throughout libSupport.
NFC.

llvm-svn: 232976
2015-03-23 18:07:13 +00:00
Chad Rosier affe181b39 [AArch64] Enable rematerialization of float 0 values.
Patch by Geoff Berry<gberry@codeaurora.org>.

llvm-svn: 232967
2015-03-23 17:19:34 +00:00
Benjamin Kramer 51f6096cf8 Move private classes into anonymous namespaces
NFC.

llvm-svn: 232944
2015-03-23 12:30:58 +00:00
Daniel Sanders f731eee322 [aarch64] Distinguish the 'Q' and 'm' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.

Clang also has code for 'Ump', 'Utf', 'Usa', and 'Ush' but calls
llvm_unreachable() on this code path so they are not converted to a
constraint id at the moment.

No functional change intended.

Reviewers: t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D8177

llvm-svn: 232941
2015-03-23 11:33:15 +00:00
Eric Christopher faad620569 Remove the bare getSubtargetImpl call from the AArch64 port. As part
of this add a test that shows we can generate code for functions
that specifically enable a subtarget feature.

llvm-svn: 232884
2015-03-21 04:04:50 +00:00
Ahmed Bougacha e6bb09ac3f [AArch64] Prefer UZP for concat_vector of illegal truncs.
Follow-up to r232459: prefer a UZP shuffle to the intermediate truncs.

llvm-svn: 232871
2015-03-21 01:08:39 +00:00
Rafael Espindola 36a15cb975 Don't declare all text sections at the start of the .s
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.

The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.

The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.

To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.

llvm-svn: 232842
2015-03-20 20:00:01 +00:00
John Brawn 1f26a47630 [ARM] Fix handling of thumb1 out-of-range frame offsets
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure. 

Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer. 

Differential Revision: http://reviews.llvm.org/D8419

llvm-svn: 232825
2015-03-20 17:20:07 +00:00
Rafael Espindola cd584a809d Split the object streamer callback in one per file format.
There are two main advantages to doing this

* Targets that only need to handle one of the formats specially don't have
  to worry about the others. For example, x86 now only registers a
  constructor for the COFF streamer.

* Changes to the arguments passed to one format constructor will not impact
  the other formats.

llvm-svn: 232699
2015-03-19 01:50:16 +00:00
Rafael Espindola 69244c3e78 two or more, use a for.
llvm-svn: 232688
2015-03-18 23:15:49 +00:00
Eric Christopher a0de253d27 Revert "Migrate the AArch64 TargetRegisterInfo to its TargetMachine"
as we don't necessarily need to do this yet - though we could move
the base class to the TargetMachine as it isn't subtarget dependent.

This reverts commit r232103.

llvm-svn: 232665
2015-03-18 20:37:30 +00:00
Pirama Arumuga Nainar 12aeefc63b Fix bug while building FP16 constant vectors for AArch64
Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64.  This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson

Differential Revision: http://reviews.llvm.org/D8369

llvm-svn: 232562
2015-03-17 23:10:29 +00:00
NAKAMURA Takumi c085eca176 Appease AArch64ISelLowering.cpp miscompiled by g++-4.7.2.
I will revert this when 4.7.3 is ready.

llvm-svn: 232561
2015-03-17 22:55:01 +00:00
Rafael Espindola 9ab09237dc Centralize the handling of unique ids for temporary labels.
Before this patch code wanting to create temporary labels for a given entity
(function, cu, exception range, etc) had to keep its own counter to have stable
symbol names.

createTempSymbol would still add a suffix to make sure a new symbol was always
returned, but it kept a single counter. Because of that, if we were to use
just createTempSymbol("cu_begin"), the label could change from cu_begin42 to
cu_begin43 because some other code started using temporary labels.

Simplify this by just keeping one counter per prefix and removing the various
specialized counters.

llvm-svn: 232535
2015-03-17 20:07:06 +00:00
Rafael Espindola 5345e420c4 Convert the easy cases of GetTempSymbol to createTempSymbol.
In these cases no code was depending on GetTempSymbol finding an existing
symbol.

llvm-svn: 232478
2015-03-17 14:22:31 +00:00
Ahmed Bougacha e0afb1fe6c [AArch64] Use intermediate step for concat_vectors of illegal truncs.
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality,  e.g.,
  (v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
                         (v2i16 (truncate (v2i64)))))
->
  (v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
                                          (v2i32 (truncate (v2i64)))))))

This isn't really target-specific, and, as such, would best go in the
DAGCombiner.  However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better.  On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387

llvm-svn: 232459
2015-03-17 03:23:09 +00:00
Ahmed Bougacha e33e6c979c [AArch64] Factor out N->getOperand()s; format. NFCI.
llvm-svn: 232458
2015-03-17 03:19:18 +00:00
Rafael Espindola f696df1148 Pass in a "const Triple &T" instead of a raw StringRef.
llvm-svn: 232429
2015-03-16 22:29:29 +00:00
Rafael Espindola 9bcf2fcb89 Remove unused argument. NFC.
llvm-svn: 232428
2015-03-16 22:06:15 +00:00
Rafael Espindola 73870dd438 There is only one Asm streamer, there is no need for targets to register it.
Instead, have the targets register a TargetStreamer to be use with the
asm streamer (if any).

llvm-svn: 232423
2015-03-16 21:43:42 +00:00
David Blaikie 9f380a3ca0 Fix uses of reserved identifiers starting with an underscore followed by an uppercase letter
This covers essentially all of llvm's headers and libs. One or two weird
cases I wasn't sure were worth/appropriate to fix.

llvm-svn: 232394
2015-03-16 18:06:57 +00:00
Daniel Sanders bf5b80f5f9 Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC.
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.

Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8173

llvm-svn: 232373
2015-03-16 13:13:41 +00:00
Benjamin Kramer 76e37aa334 unique_ptrs are unique already, no need to unique them any further.
llvm-svn: 232178
2015-03-13 16:59:29 +00:00
Daniel Sanders 60f1db0525 Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints.
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.

PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there. 

llvm-svn: 232165
2015-03-13 12:45:09 +00:00
Eric Christopher 1b585aeb8a Migrate the AArch64 TargetRegisterInfo to its TargetMachine
implementation. This requires a bit of scaffolding and a few fixups
that'll go away once all of the ports have been migrated.

llvm-svn: 232103
2015-03-12 21:04:46 +00:00
Hal Finkel e78e52ba9b Revert "r232027 - Add infrastructure for support of multiple memory constraints"
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.

Original commit message:

The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

llvm-svn: 232093
2015-03-12 20:09:39 +00:00
Eric Christopher 63ea0402c2 Fix comment formatting.
llvm-svn: 232076
2015-03-12 18:23:01 +00:00
Daniel Sanders 41c072e63b Add infrastructure for support of multiple memory constraints.
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8171

llvm-svn: 232027
2015-03-12 11:00:48 +00:00
Eric Christopher 09696d3fea Remove the need to cache the subtarget in the AArch64 TargetRegisterInfo
classes. Replace it with a cache to the Triple and use that
where applicable at the moment.

llvm-svn: 232005
2015-03-12 02:04:46 +00:00
Mehdi Amini 93e1ea167e Move the DataLayout to the generic TargetMachine, making it mandatory.
Summary:
I don't know why every singled backend had to redeclare its own DataLayout.
There was a virtual getDataLayout() on the common base TargetMachine, the
default implementation returned nullptr. It was not clear from this that
we could assume at call site that a DataLayout will be available with
each Target.

Now getDataLayout() is no longer virtual and return a pointer to the
DataLayout member of the common base TargetMachine. I plan to turn it into
a reference in a future patch.

The only backend that didn't have a DataLayout previsouly was the CPPBackend.
It now initializes the default DataLayout. This commit is NFC for all the
other backends.

Test Plan: clang+llvm ninja check-all

Reviewers: echristo

Subscribers: jfb, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8243

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231987
2015-03-12 00:07:24 +00:00
Eric Christopher 9deb75d176 Have getCallPreservedMask and getThisCallPreservedMask take a
MachineFunction argument so that we can grab subtarget specific
features off of it.

llvm-svn: 231979
2015-03-11 22:42:13 +00:00
Eric Christopher 7af9528747 Have getCalleeSavedRegs take a non-null MachineFunction all the
time. The target independent code was passing in one all the
time and targets weren't checking validity before using. Update
a few calls to pass in a MachineFunction where necessary.

llvm-svn: 231970
2015-03-11 21:41:28 +00:00
Pete Cooper d987cd2f31 Constify AArch64CollectLOH.cpp. NFC
llvm-svn: 231969
2015-03-11 21:40:25 +00:00
Eric Christopher 0169e42c3b Remove the use of the subtarget in MCCodeEmitter creation and
update all ports accordingly. Required a couple of small rewrites
in handling subtarget features during creation in PPC.

llvm-svn: 231861
2015-03-10 22:03:14 +00:00
Ahmed Bougacha fab5892f8b [AArch64] Avoid going through GPRs for across-vector instructions.
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
  (v4i32 (uaddv ...))
is the same as
  (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
  (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
           (i32 (int_aarch64_neon_uaddv ...)), ssub)

In a combine, we transform all such across-vector-lanes intrinsics to:

  (i32 (extract_vector_elt (uaddv ...), 0))

This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs.  Consider:

    uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
        return vmulq_n_u32(a, vaddvq_u32(b));
    }

We now generate:
    addv.4s  s1, v1
    mul.4s   v0, v0, v1[0]
instead of the previous:
    addv.4s  s1, v1
    fmov     w8, s1
    dup.4s   v1, w8
    mul.4s   v0, v1, v0

rdar://20044838

llvm-svn: 231840
2015-03-10 20:45:38 +00:00
Ahmed Bougacha 8f6a115de9 [AArch64] Remove integer INSvi*lane patterns. NFCI.
Most are redundant, and they never seem to fire.

The V128 integer patterns already exist in the INS multiclass.
The duplicates only fire when the vector index type isn't i64,
because they accept "imm" instead of an explicit "i64", as the
instruction definition patterns do.

TLI::getVectorIdxTy is i64 on AArch64, so this should never happen.
Also, one of them had a typo: for i64, INSvi32lane was used.
I noticed because I mistakenly used an explicit i32 as the idx type,
and got ins.s for an i64 vector_insert.

The V64 patterns also don't seem to ever fire, as V64 vector
extract/insert are legalized to V128.

The equivalent float patterns are unique and useful, so keep them.

No functional change intended;  none exhibited on the LIT and LNT tests.

llvm-svn: 231838
2015-03-10 20:37:19 +00:00
Kevin Qin aef68418de [AArch64] Enable partial & runtime unrolling on cortex-a57
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.

llvm-svn: 231632
2015-03-09 06:14:28 +00:00
Benjamin Kramer 57a3d084cd Make static variables const if possible. Makes them go into a read-only section.
Or fold them into a initializer list which has the same effect. NFC.

llvm-svn: 231598
2015-03-08 16:07:39 +00:00
Benjamin Kramer 867bfc53ee Make constant arrays that are passed to functions as const.
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.

llvm-svn: 231571
2015-03-07 17:41:00 +00:00
Eric Christopher 25dbdeb4d1 Typo.
llvm-svn: 231547
2015-03-07 01:39:09 +00:00
Quentin Colombet 66b616351c [AArch64][LoadStoreOptimizer] Generate LDP + SXTW instead of LD[U]R + LD[U]RSW.
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!

<rdar://problem/20072968>

llvm-svn: 231527
2015-03-06 22:42:10 +00:00
Bruno Cardoso Lopes 618c67a018 [AsmPrinter][TLOF] 32-bit MachO support for replacing GOT equivalents
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.

-- before

    _extgotequiv:
       .long _extfoo

    _delta:
       .long _extgotequiv-_delta

-- after

    _delta:
       .long L_extfoo$non_lazy_ptr-_delta

       .section __IMPORT,__pointers,non_lazy_symbol_pointers
    L_extfoo$non_lazy_ptr:
       .indirect_symbol _extfoo
       .long 0

llvm-svn: 231475
2015-03-06 13:49:05 +00:00
Bruno Cardoso Lopes 52b1391df6 [AsmPrinter][TLOF] ARM64 MachO support for replacing GOT equivalents
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:

-- before

   .globl  _foo
  _foo:
   .long   42

   .globl  _gotequivalent
  _gotequivalent:
   .quad   _foo

   .globl  _delta
  _delta:
   .long   _gotequivalent-_delta

-- after

   .globl  _foo
  _foo:
   .long   42

   .globl  _delta
  Ltmp3:
   .long _foo@GOT-Ltmp3

llvm-svn: 231474
2015-03-06 13:48:45 +00:00
Ahmed Bougacha 1b67630cb3 [AArch64] Teach AsmPrinter about GlobalAddress operands.
Fixes PR22761, rdar://20024866.
Differential Revision: http://reviews.llvm.org/D8042

llvm-svn: 231400
2015-03-05 20:04:21 +00:00
JF Bastien f14889ee34 Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded.
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.

This does not represent a behavioural change and as such no tests were added.

Patch by: Richard Diamond.

Reviewers: jfb

Reviewed By: jfb

Subscribers: jfb, aemerson, t.p.northover, llvm-commits

Differential Revision: http://reviews.llvm.org/D7713

llvm-svn: 231250
2015-03-04 15:47:57 +00:00
Kristof Beyls aea8461820 Fix PR22408 - LLVM producing AArch64 TLS relocations that GNU linkers cannot handle yet.
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.

This patch:

* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
  would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
  linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
  is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
  accesses is produced, by making use of a new pseudo instruction. The patch also removes
  two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
  SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).

llvm-svn: 231227
2015-03-04 09:12:08 +00:00
Pete Cooper ef21bd444d Remove MCStreamer.h include from MCContext.h and explictly include it where necessary. NFC
llvm-svn: 231193
2015-03-04 01:24:11 +00:00
Eric Christopher 6f1e5680f6 Remove subtarget dependence in pass pipeline setup for AArch64.
llvm-svn: 231165
2015-03-03 23:22:40 +00:00
David Blaikie 01a9a412a7 Avoid copying LiveInterval, this could lead to a double-delete
llvm-svn: 231154
2015-03-03 22:25:48 +00:00
David Blaikie 7f1e0565b3 Revert "Remove the explicit SDNodeIterator::operator= in favor of the implicit default"
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.

This reverts commit r231135.

llvm-svn: 231136
2015-03-03 21:18:16 +00:00
David Blaikie bb8da4c08f Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.

llvm-svn: 231135
2015-03-03 21:17:08 +00:00
Chad Rosier 8e38f30e49 [AArch64] When combining constant mul of -3, prefer (sub x, (shl x, N)).
This change only effects codegen when the constant is -3.

llvm-svn: 231085
2015-03-03 17:31:01 +00:00
Sanjoy Das e5d1466ab3 [AArch64] fix an invalid-iterator-use bug.
Summary:
In AArch64PromoteConstant::appendAndTransferDominatedUses,
`InsertPts[NewPt]` invalidates IPI.  Therefore, `InsertPts[NewPt] =
std::move(IPI->second)` is not legal.

This was caught by running `make check` with
http://reviews.llvm.org/D7931.

Reviewers: t.p.northover, grosbach, bkramer

Reviewed By: bkramer

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D7988

llvm-svn: 230923
2015-03-02 00:17:18 +00:00
Benjamin Kramer 7149aabf8b Make some non-constant static variables non-static or fully const.
Otherwise we have to emit thread-safe initialization for them. NFC.

llvm-svn: 230894
2015-03-01 18:09:56 +00:00
Benjamin Kramer 5fbfe2ffdc Convert push_back loops into append calls.
No functionality change intended.

llvm-svn: 230849
2015-02-28 13:20:15 +00:00
Benjamin Kramer f1362f6196 ArrayRefize memory operand folding. NFC.
llvm-svn: 230846
2015-02-28 12:04:00 +00:00
Eric Christopher 1cdefae9c4 Rewrite MachineOperand::print and MachineInstr::print to avoid
uses of TM->getSubtargetImpl and propagate to all calls.

This could be a debugging regression in places where we had a
TargetMachine and/or MachineFunction but don't have it as part
of the MachineInstr. Fixing this would require passing a
MachineFunction/Function down through the print operator, but
none of the existing uses in tree seem to do this.

llvm-svn: 230710
2015-02-27 00:11:34 +00:00
Eric Christopher 11e4df73c8 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Eric Christopher 23a3a7c871 Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.

llvm-svn: 230583
2015-02-26 00:00:24 +00:00
Matthias Braun 02892ec62d AArch64: Add debug message for large shift constants.
As requested in code review.

llvm-svn: 230517
2015-02-25 18:03:50 +00:00
Matthias Braun 7526035155 AArch64: Relax assert about large shift sizes.
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).

Differential Revision: http://reviews.llvm.org/D6940

llvm-svn: 230355
2015-02-24 18:52:04 +00:00
Eric Christopher ed47b22951 Rewrite the global merge pass to be subprogram agnostic for now.
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.

As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.

llvm-svn: 230245
2015-02-23 19:28:45 +00:00
Chad Rosier 543900539f Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.
This patch adds the isProfitableToHoist API.  For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.

Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>

llvm-svn: 230241
2015-02-23 19:15:16 +00:00
Tim Northover 3b6b7ca2bc CodeGen: convert CCState interface to using ArrayRefs
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.

There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.

llvm-svn: 230118
2015-02-21 02:11:17 +00:00
Eric Christopher 3ee30d0607 Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 230000
2015-02-20 08:39:06 +00:00
Ahmed Bougacha 4c2b0781a5 [CodeGen] Use ArrayRef instead of std::vector&. NFC.
The former lets us use SmallVectors.  Do so in ARM and AArch64.

llvm-svn: 229925
2015-02-19 23:13:10 +00:00
Benjamin Kramer ea68a944a1 Demote vectors to arrays. No functionality change.
llvm-svn: 229861
2015-02-19 15:26:17 +00:00
Michael Kuperstein efd7a96d2e Reverting r229831 due to multiple ARM/PPC/MIPS build-bot failures.
llvm-svn: 229841
2015-02-19 11:38:11 +00:00
Michael Kuperstein ba5b04c798 Use std::bitset for SubtargetFeatures
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.

No functional change.

Differential Revision: http://reviews.llvm.org/D7065

llvm-svn: 229831
2015-02-19 09:01:04 +00:00
Benjamin Kramer 6cd780ff21 Prefer SmallVector::append/insert over push_back loops.
Same functionality, but hoists the vector growth out of the loop.

llvm-svn: 229500
2015-02-17 15:29:18 +00:00
Andrew Trick 05938a5481 AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

llvm-svn: 229413
2015-02-16 18:10:47 +00:00
Duncan P. N. Exon Smith 003bb7d96e AArch64: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

llvm-svn: 229218
2015-02-14 02:09:06 +00:00
Chandler Carruth 30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Rafael Espindola e4bcad4754 Learn that __DATA,__objc_classrefs is not atomized via symbols.
This should hopefully fix objc on AArch64.

llvm-svn: 228976
2015-02-12 23:11:59 +00:00
Benjamin Kramer 5f6a907288 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

llvm-svn: 228930
2015-02-12 15:35:40 +00:00
Arnaud A. de Grandmaison de79026d5e [PBQP] Cautiously update edge costs in the solver
The NodeMetadata are maintained in an incremental way. When an edge between
2 nodes has its cost updated, in the course of graph reduction for example,
the NodeMetadata need first to have the old edge cost removed, then the new
edge cost added. Only once the NodeMetadata have been fully updated, it
becomes safe to consider promoting the nodes to the
ConservativelyAllocatable or OptimallyReducible sets. Previously, this
promotion was occuring right after the removing the old cost, and this was
breaking the assumption that a ConservativelyAllocatable should not be
spilled.

This patch also adds asserts to:
 - enforces the invariant that a node's reduction can not be downgraded,
 - only not provably allocatable or optimally reducible nodes can be spilled.

llvm-svn: 228816
2015-02-11 08:25:36 +00:00
Tim Northover 45aa89c925 ARM & AArch64: teach LowerVSETCC that output type size may differ from input.
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.

Fortunately this is easy enough, just extend or truncate the naturally
compared result.

I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.

Should fix PR21645.

llvm-svn: 228518
2015-02-08 00:50:47 +00:00
Ahmed Bougacha df956a2e78 [AArch64] Use the source location of the IR branch when creating Bcc
from a conditional branch fed by an add/sub/mul-with-overflow node.

We previously used the SDLoc of the overflow node, for no good reason.
In some cases, this led to the Bcc and B terminators having different
source orders, and DBG_VALUEs being inserted between them.

The real issue is with the code that can't handle DBG_VALUEs between
terminators: the few places affected by this will be fixed soon.
In the meantime, fixing the SDLoc is a positive change no matter what.

No tests, as I have no idea how to get .loc emitted for branches?

rdar://19347133

llvm-svn: 228463
2015-02-06 23:15:39 +00:00
Benjamin Kramer 69b4ad29af AArch64PromoteConstant: Modernize and resolve some Use<->User confusion.
NFC.

llvm-svn: 228399
2015-02-06 14:43:55 +00:00
Renato Golin 6088504499 Adding support to LLVM for targeting Cortex-A72
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
load balancing pass isn't enabled for Cortex-A72 as it's not
profitable to have it enabled for this core.

Patch by Ranjeet Singh.

llvm-svn: 228140
2015-02-04 13:31:29 +00:00
Frederic Riss b61f01f1c2 Fix some unnoticed/unwanted behavior change from r222319.
The ARM assembler allows register alias redefinitions as long as it
targets the same register. r222319 broke that. In the AArch64 case
it would just produce a new warning, but in the ARM case it would
error out on previously accepted assembler.

llvm-svn: 228109
2015-02-04 03:10:03 +00:00
Eric Christopher bb1ae666fe Migrate away from using a Subtarget except for the one place we want
to use it. Use the triple to determine OS format bits at the module
level.

llvm-svn: 227947
2015-02-03 06:40:19 +00:00
Ahmed Bougacha dff57a6143 [AArch64] Prefer DUP/MOV ("CPY") to INS for vector_extract.
This avoids a partial false dependency on the previous content of
the upper lanes of the destination vector register.

Differential Revision: http://reviews.llvm.org/D7307

llvm-svn: 227820
2015-02-02 17:55:57 +00:00
Chandler Carruth ab5cb36c40 [multiversion] Remove the function parameter from the unrolling
preferences interface on TTI now that all of TTI is per-function.

llvm-svn: 227741
2015-02-01 14:31:23 +00:00
Chandler Carruth c956ab6603 [multiversion] Switch the TTI queries from TargetMachine to Subtarget
now that we have a correct and cached subtarget specific to the
function.

Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.

The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.

This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.

llvm-svn: 227738
2015-02-01 14:22:17 +00:00
Chandler Carruth c340ca839c [multiversion] Remove the cached TargetMachine pointer from the
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.

Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.

This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.

llvm-svn: 227737
2015-02-01 14:01:15 +00:00
Chandler Carruth 8b04c0d26a [multiversion] Switch all of the targets over to use the
TargetIRAnalysis access path directly rather than implementing getTTI.

This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.

As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.

llvm-svn: 227735
2015-02-01 13:20:00 +00:00
Chandler Carruth ee642690ea [multiversion] Remove a false freedom to leave the TargetMachine pointer
null.

For some reason some of the original TTI code supported a null target
machine. This seems to have been legacy, and I made matters worse when
refactoring this code by spreading that pattern further through the
various targets.

The TargetMachine can't actually be null, and it doesn't make sense to
support that use case. I've now consistently removed it and removed all
of the code trying to cope with that situation. This is probably good,
as several targets *didn't* cope with it being null despite the null
default argument in their constructors. =]

llvm-svn: 227734
2015-02-01 12:38:24 +00:00
Chandler Carruth d8b3e9a420 [PM] Remove a bunch of stale TTI creation method declarations. I nuked
their definitions, but forgot to clean up all the declarations which are
in different files.

llvm-svn: 227698
2015-02-01 00:22:15 +00:00
Chandler Carruth 93dcdc47db [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

llvm-svn: 227685
2015-01-31 11:17:59 +00:00
Chandler Carruth 705b185f90 [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

llvm-svn: 227669
2015-01-31 03:43:40 +00:00
Eric Christopher 1e51334459 Avoid using the cast and use the templated accessor function.
llvm-svn: 227643
2015-01-30 23:46:40 +00:00
Chad Rosier b23c4dd3a4 [AArch64] Make AArch64A57FPLoadBalancing output stable.
Add tie breaker to colorChainSet() sort so that processing order doesn't
depend on std::set order, which depends on pointer order, which is
unstable from run to run.

No test case as this is nearly impossible to reproduce.

Phabricator Review: http://reviews.llvm.org/D7265
Patch by Geoff Berry <gberry@codeaurora.org>!

llvm-svn: 227606
2015-01-30 19:55:40 +00:00
Hao Liu e0335d77c3 [AArch64]Fix PR21675, a bug about lowering llvm.ctpop.i32. We should noot use "DAG.getUNDEF(MVT::v8i8)" to get all zero vector.
Patch by Wei-cheng Wang.

llvm-svn: 227550
2015-01-30 02:13:53 +00:00
Eric Christopher 988ce75c07 Remove a few getSubtarget calls in AArch64 pass manager initialization.
llvm-svn: 227531
2015-01-30 01:10:26 +00:00
Eric Christopher 125898a2a1 Clean up some uses of getSubtarget in AArch64.
llvm-svn: 227530
2015-01-30 01:10:24 +00:00
Eric Christopher f761d901aa This only needs TargetInstrInfo, not the specialized one.
llvm-svn: 227529
2015-01-30 01:10:18 +00:00
Chad Rosier 11d943d32c [AArch64] Add INITIALIZE_PASS macros to AArch64A57FPLoadBalancing.
These are needed so this pass will produce output when
e.g. -print-after-all is used.

Phabricator Review: http://reviews.llvm.org/D7264
Patch by Geoff Berry <gberry@codeaurora.org>!

llvm-svn: 227506
2015-01-29 22:57:37 +00:00
Eric Christopher 905f12d96d Remove getSubtargetImpl from AArch64ISelLowering and cache the
correct subtarget by passing it in during the constructor as
TargetLowering is Subtarget specific.

llvm-svn: 227402
2015-01-29 00:19:42 +00:00
Sanjay Patel 08efcd9039 fix typos; NFC
llvm-svn: 227386
2015-01-28 22:37:32 +00:00
Eric Christopher 6c901623c0 Migrate AArch64 except for TTI and AsmPrinter away from getSubtargetImpl.
llvm-svn: 227293
2015-01-28 03:51:33 +00:00
Eric Christopher 36d9273128 Clean up the AArch64 store pair suppression pass initialization
and remove and unnecessary class variable.

llvm-svn: 227175
2015-01-27 07:54:36 +00:00
Eric Christopher 3d4276f053 The subtarget is cached on the MachineFunction. Access it directly.
llvm-svn: 227173
2015-01-27 07:31:29 +00:00
Chad Rosier f9327d6fe9 Commoning of target specific load/store intrinsics in Early CSE.
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!

llvm-svn: 227149
2015-01-26 22:51:15 +00:00
Eric Christopher 8b7706517c Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Eric Christopher c8c76853b9 Fix a problem where the AArch64 ELF assembler was failing with
-no-exec-stack. This was due to it not deriving from the correct
asm info base class and missing the override for the exec
stack section query. Added another line to the noexec test
line to make sure this doesn't regress.

llvm-svn: 227074
2015-01-26 06:32:17 +00:00
Quentin Colombet 29f553398f [AArch64][LoadStoreOptimizer] Form LDPSW when possible.
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.

<rdar://problem/19583480>

llvm-svn: 226978
2015-01-24 01:25:54 +00:00
Tim Northover 7cd58934a8 AArch64: decode all MRS/MSR forms early to avoid saving FeatureBits.
Currently, we're adding a uint64_t describing the current subtarget so
that matching can check whether the specified register is valid.
However, we want to move to a bitset for those bits (x86 has more than
64 of them).

This can't live in a union so it's probably better to do the checks
early (especially as there are only 3 of them).

llvm-svn: 226841
2015-01-22 17:23:04 +00:00
Tim Northover b9184f2b1a AArch64: add backend option to reserve x18 (platform register)
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.

From: Andrew Turner <andrew@fubar.geek.nz>
llvm-svn: 226664
2015-01-21 15:43:31 +00:00
Rafael Espindola 2658554aec Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.

Original message:

In an assembly expression like

bar:
  .long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 226503
2015-01-19 21:11:14 +00:00
Greg Fitzgerald fa78d08675 [AArch64] Implement GHC calling convention
Original patch by Luke Iannini.  Minor improvements and test added by
Erik de Castro Lopo.

Differential Revision: http://reviews.llvm.org/D6877

From: Erik de Castro Lopo <erikd@mega-nerd.com>
llvm-svn: 226473
2015-01-19 17:40:05 +00:00
David Blaikie 186db431c0 unique_ptrify the RelInfo parameter to TargetRegistry::createMCSymbolizer
llvm-svn: 226416
2015-01-18 20:45:48 +00:00
David Blaikie 9459832ebd std::unique_ptrify the MCStreamer argument to createAsmPrinter
llvm-svn: 226414
2015-01-18 20:29:04 +00:00
Rafael Espindola 7244bb3c17 Revert "Add r224985 back with two fixes."
This reverts commit r225644 while I debug a regression.

llvm-svn: 226022
2015-01-14 19:07:23 +00:00
Chandler Carruth d9903888d9 [cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.

I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.

llvm-svn: 225974
2015-01-14 11:23:27 +00:00
Rafael Espindola d9c3e308f5 Add r224985 back with two fixes.
One is that AArch64 has additional restrictions on when local relocations can
be used. We have to take those into consideration when deciding to put a L
symbol in the symbol table or not.

The other is that ld64 requires the relocations to cstring to use linker
visible symbols on AArch64.

Thanks to Michael Zolotukhin for testing this!

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 225644
2015-01-12 18:13:07 +00:00
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Ahmed Bougacha 67dd2d25a3 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.

llvm-svn: 225392
2015-01-07 21:27:10 +00:00
Karthik Bhat 9ba55334dc Revert r225165 and r225169
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.

llvm-svn: 225341
2015-01-07 06:34:34 +00:00
Lang Hames 04b37c4043 Revert r225048: It broke ObjC on AArch64.
I've filed http://llvm.org/PR22100 to track this issue.

llvm-svn: 225228
2015-01-06 00:54:32 +00:00
Ahmed Bougacha d54c448d34 [AArch64] Improve codegen of store lane instructions by avoiding GPR usage.
We used to generate code similar to:

  umov.b        w8, v0[2]
  strb  w8, [x0, x1]

because the STR*ro* patterns were preferred to ST1*.
Instead, we can avoid going through GPRs, and generate:

  add   x8, x0, x1
  st1.b { v0 }[2], [x8]

This patch increases the ST1* AddedComplexity to achieve that.

rdar://16372710
Differential Revision: http://reviews.llvm.org/D6202

llvm-svn: 225183
2015-01-05 17:10:26 +00:00
Ahmed Bougacha f964df3640 [AArch64] Improve codegen of store lane 0 instructions by directly storing the subregister.
For 0-lane stores, we used to generate code similar to:

  fmov w8, s0
  str w8, [x0, x1, lsl #2]

instead of:

  str s0, [x0, x1, lsl #2]

To correct that: for store lane 0 patterns, directly match to STR <subreg>0.

Byte-sized instructions don't have the special case for a 0 index,
because FPR8s are defined to have untyped content.

rdar://16372710
Differential Revision: http://reviews.llvm.org/D6772

llvm-svn: 225181
2015-01-05 17:02:28 +00:00
Karthik Bhat 93f27ce886 Select lower fsub,fabs pattern to fabd on AArch64
This patch lowers patterns such as-
  fsub   v0.4s, v0.4s, v1.4s
  fabs   v0.4s, v0.4s
to
  fabd  v0.4s, v0.4s, v1.4s
on AArch64.

Review: http://reviews.llvm.org/D6791
llvm-svn: 225169
2015-01-05 13:57:59 +00:00
Karthik Bhat 8ec742c2f9 Select lower sub,abs pattern to sabd on AArch64
This patch lowers patterns such as-
  sub	v0.4s, v0.4s, v1.4s
  abs	v0.4s, v0.4s
to
  sabd	v0.4s, v0.4s, v1.4s
on AArch64.

Review: http://reviews.llvm.org/D6781
llvm-svn: 225165
2015-01-05 13:11:07 +00:00
Craig Topper d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Saleem Abdulrasool 67f729933f ARM: permit tail calls to weak externals on COFF
Weak externals are resolved statically, so we can actually generate the tail
call on PE/COFF targets without breaking the requirements.  It is questionable
whether we want to propagate the current behaviour for MachO as the requirements
are part of the ARM ELF specifications, and it seems that prior to the SVN
r215890, we would have tail'ed the call.  For now, be conservative and only
permit it on PE/COFF where the call will always be fully resolved.

llvm-svn: 225119
2015-01-03 21:35:00 +00:00
Craig Topper 589ceee7f4 Minor cleanup to all the switches after MatchInstructionImpl in all the AsmParsers.
Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation.

llvm-svn: 225114
2015-01-03 08:16:34 +00:00
Rafael Espindola 54b435ec3c Add r224985 back with a fix.
The issues was that AArch64 has additional restrictions on when local
relocations can be used. We have to take those into consideration when
deciding to put a L symbol in the symbol table or not.

Original message:

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 225048
2014-12-31 17:19:34 +00:00
Rafael Espindola d4da9040de Revert "Remove doesSectionRequireSymbols."
This reverts commit r224985.

I am investigating why it made an Apple bot unhappy.

llvm-svn: 225044
2014-12-31 16:06:48 +00:00
Rafael Espindola b22d5aa49a Remove doesSectionRequireSymbols.
In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 224985
2014-12-30 13:13:27 +00:00
Karthik Bhat bf662901c1 Lower multiply-negate operation to mneg on AArch64
This patch pattern matches code such as-
neg	 w8, w8
mul	 w8, w9, w8
to
mneg	 w8, w8, w9

Review: http://reviews.llvm.org/D6754
llvm-svn: 224706
2014-12-22 13:38:58 +00:00
Adrian Prantl b9fa945d51 ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294
2014-12-16 00:20:49 +00:00
Michael Ilseman addddc441f Silence more static analyzer warnings.
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.

llvm-svn: 224255
2014-12-15 18:48:43 +00:00
Matthias Braun b2f2388a76 Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips.
llvm-svn: 224075
2014-12-11 23:18:03 +00:00
Matthias Braun 7e37a5f523 [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.

llvm-svn: 224059
2014-12-11 21:26:47 +00:00
Rafael Espindola 01c73610d0 This reverts commit r224043 and r224042.
check-llvm was failing.

llvm-svn: 224045
2014-12-11 20:03:57 +00:00
Matthias Braun 199aeff7dd Enable machineverifier in debug mode for X86, ARM, AArch64, Mips
llvm-svn: 224043
2014-12-11 19:42:09 +00:00
Matthias Braun a7c82a9f1d [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

llvm-svn: 224042
2014-12-11 19:42:05 +00:00
Juergen Ributzka 2326650ceb [AArch64] MachO large code-model: Materialize FP constants in code.
In the large code model we have to first get the address of the GOT entry, load
the address of the constant, and then load the constant itself.

To avoid these loads and the GOT entry alltogether this commit changes the way
how FP constants are materialized in the large code model. The constats are now
materialized in a GPR and then bitconverted/moved into the FPR.

Reviewed by Tim Northover

Fixes rdar://problem/16572564.

llvm-svn: 223941
2014-12-10 19:43:32 +00:00
Juergen Ributzka c6f314b8ed [FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'.
The load/store value type is currently not available when lowering the memcpy
intrinsic. Add the missing nullptr check to support this in 'computeAddress'.

Fixes rdar://problem/19178947.

llvm-svn: 223818
2014-12-09 19:44:38 +00:00
Tim Northover 67be569a31 AArch64: treat HFAs containing "half" types as blocks too.
llvm-svn: 223669
2014-12-08 17:54:58 +00:00
Benjamin Kramer 89e5306f43 Make the DenseMap bucket type configurable and use a smaller bucket for DenseSet.
DenseSet used to be implemented as DenseMap<Key, char>, which usually doubled
the memory footprint of the map. Now we use a compressed set so the second
element uses no memory at all. This required some surgery on DenseMap as
all accesses to the bucket now have to go through methods; this should
have no impact on the behavior of DenseMap though. The new default bucket
type for DenseMap is a slightly extended std::pair as we expose it through
DenseMap's iterator and don't want to break any existing users.

llvm-svn: 223588
2014-12-06 19:22:44 +00:00
Tim Northover 5e84fe3ed4 AArch64: use explicit MVT::i64 when creating EXTRACT_SUBVECTOR nodes.
All our patterns use MVT::i64, but the ISelLowering nodes were inconsistent in
their choice.

No functional change.

llvm-svn: 223551
2014-12-06 00:33:37 +00:00
Weiming Zhao cc4bf3ff3d [AArch64] Combining Load and IntToFp should check for neon availability
llvm-svn: 223382
2014-12-04 20:25:50 +00:00
Matt Arsenault 4e27343eec Allow target to specify prefix for labels
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.

llvm-svn: 223323
2014-12-04 00:06:57 +00:00
Tim Northover 293d414380 AArch64: fix wrong-endian parameter passing.
The blocked arguments code didn't take account of the hacks needed to support
it.

llvm-svn: 223247
2014-12-03 17:49:26 +00:00
Tim Northover 4a8ac260cc AArch64: strengthen Darwin ABI alignment assumptions
A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.

rdar://18533701

llvm-svn: 223180
2014-12-02 23:53:43 +00:00
Tim Northover ec7ebebe55 AArch64: don't be too greedy when folding :lo12: accesses into mem ops.
This frequently leads to cases like:
   ldr xD, [xN, :lo12:var]
   add xA, xN, :lo12:var
   ldr xD, [xA, #8]

where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).

llvm-svn: 223172
2014-12-02 23:13:39 +00:00
Lang Hames a7395bf49b [AArch64][Stackmaps] Optimize stackmap shadows on AArch64.
Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.

<rdar://problem/14959522>

llvm-svn: 223156
2014-12-02 21:36:24 +00:00
Tim Northover 24ec87debb AArch64: make register block rules apply to vector types too.
The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.

llvm-svn: 223126
2014-12-02 17:15:22 +00:00
Ahmed Bougacha d0ce058f2c [AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR".
r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6308

llvm-svn: 223075
2014-12-01 20:59:00 +00:00
Ahmed Bougacha 879463206e [AArch64] Fix v2i8->i16 bitcast legalization.
r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack.  That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).

Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6307

llvm-svn: 223074
2014-12-01 20:52:32 +00:00
Akira Hatanaka 107d13c228 Fix capitalization. NFC.
llvm-svn: 222988
2014-12-01 06:14:52 +00:00
Craig Topper 44586dc4d6 Add missing 'override' keyword.
llvm-svn: 222911
2014-11-28 03:58:26 +00:00
Tim Northover a38e5cbf20 Stop using ArrayRef of a const type.
I *think* this is what the GCC bots are complaining about.

llvm-svn: 222905
2014-11-27 21:29:20 +00:00
Tim Northover 3c55ccac48 AArch64: treat [N x Ty] as a block during procedure calls.
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.

This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.

llvm-svn: 222903
2014-11-27 21:02:42 +00:00
Will Newton 40f08faa70 Update AArch64 ELF relocations to ABI 1.0
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:

1. R_AARCH64_NONE is defined to be zero rather than 256

R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.

2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21

I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.

Tested with check-all with no regressions.

llvm-svn: 222821
2014-11-26 10:49:18 +00:00
Craig Topper c50d64b07b Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
llvm-svn: 222801
2014-11-26 00:46:26 +00:00
Juergen Ributzka eb67bd8d74 [FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching.
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.

This commit fixes this and also adds inverse versions of the conditon to catch
more cases.

llvm-svn: 222722
2014-11-25 04:16:15 +00:00
Chad Rosier ba0e0664ff [AArch64] Fix clobber computation in A57LoadBalancing pass.
Extremely difficult to reproduce, so no test case included.
PR21637

llvm-svn: 222677
2014-11-24 18:57:58 +00:00
Hao Liu 44e5d7a131 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Reid Kleckner 343c395f11 Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/
Follow up to r221940, where I must not have caught em all. NFC

llvm-svn: 222481
2014-11-20 23:51:47 +00:00
Reid Kleckner 357600eab5 Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878.  When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.

This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.

llvm-svn: 222480
2014-11-20 23:37:18 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Hao Liu 2aa06a989d [AArch64] Disable useAA for Cortex-A57.
Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find
enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved 
and benefical for ooo cores, we can enable it again.

llvm-svn: 222333
2014-11-19 06:48:56 +00:00
Hao Liu fd46bea46a [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend.
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.

Reviewed in http://reviews.llvm.org/D5864.

llvm-svn: 222331
2014-11-19 06:39:53 +00:00
David Blaikie 5106ce7897 Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)

Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.

llvm-svn: 222319
2014-11-19 05:49:42 +00:00
Weiming Zhao 7a2d15678e [Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability
llvm-svn: 222292
2014-11-19 00:29:14 +00:00
Chad Rosier c250881838 [FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic
shift-right for booleans (i1).

Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222272
2014-11-18 22:41:49 +00:00
Chad Rosier e16d16ae41 [FastISel][AArch64] Also allow folding of sign-/zero-extend and logical
shift-right for booleans (i1).

Logical shift-right immediate with sign-/zero-extensions also works for boolean
values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222270
2014-11-18 22:38:42 +00:00
Juergen Ributzka cdda930843 [FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts."
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.

Related to PR21594.

llvm-svn: 222257
2014-11-18 21:20:17 +00:00
Juergen Ributzka 7a7c4684e4 [AArch64] Don't optimize all compare instructions.
"optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add
instructions when the flags are not used anymore. This conversion is valid for
most instructions, but not all. Some instructions that don't set the flags
(e.g. sub with immediate) can set the SP, whereas the flag setting version uses
the same encoding for the "zero" register.

Update the code to also check for the return register before performing the
optimization to make sure that a cmp doesn't suddenly turn into a sub that sets
the stack pointer.

I don't have a test case for this, because it isn't easy to trigger.

llvm-svn: 222255
2014-11-18 21:02:40 +00:00
Juergen Ributzka 4328fd94b0 [FastISel][AArch64] Fix shift-immediate emission for "zero" shifts.
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.

llvm-svn: 222247
2014-11-18 19:58:59 +00:00
Aditya Nandakumar 3053155652 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Juergen Ributzka 0af310d052 [FastISel][AArch64] Don't bail during simple GEP instruction selection.
The generic FastISel code would bail, because it can't emit a sign-extend for
AArch64. This copies the code over and uses AArch64 specific emit functions.

This is not ideal and 'computeAddress' should handles this, so it can fold the
address computation into the memory operation.

I plan to clean up 'computeAddress' anyways, so I will add that in a future
commit.

Related to rdar://problem/18962471.

llvm-svn: 221923
2014-11-13 20:50:44 +00:00
Aditya Nandakumar a27193297f This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
2014-11-13 09:26:31 +00:00
Juergen Ributzka 957a1454cc [FastISel][AArch64] Optimize select when one of the operands is a 'true' or 'false' value.
Optimize selects of i1 in the presence of 'true' and 'false' operands to simple
logic operations.

This fixes rdar://problem/18960150.

llvm-svn: 221848
2014-11-13 00:36:46 +00:00
Juergen Ributzka 424c5fd12f [FastISel][AArch64] Fold the cmp into the select when possible.
This folds the compare emission into the select emission when possible, so we
can directly use the flags and don't have to emit a separate compare.

Related to rdar://problem/18960150.

llvm-svn: 221847
2014-11-13 00:36:43 +00:00
Juergen Ributzka d1a042abd0 [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.
Related to rdar://problem/18960150.

llvm-svn: 221846
2014-11-13 00:36:38 +00:00
Rafael Espindola 7fc5b87480 Pass an ArrayRef to MCDisassembler::getInstruction.
With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t>
instead of a MemoryObject.

Even on X86 there is a maximum size an instruction can have. Given
that, it seems way simpler and more efficient to just pass an ArrayRef
to the disassembler instead of a MemoryObject and have it do a virtual
call every time it wants some extra bytes.

llvm-svn: 221751
2014-11-12 02:04:27 +00:00
Juergen Ributzka 89441b0dd8 [FastISel][AArch64] Add support for fabs intrinsic.
Lower the llvm.fabs intrinsic to the 'fabs' MI instruction.

This fixes rdar://problem/18946552.

llvm-svn: 221729
2014-11-11 23:10:44 +00:00
Rafael Espindola 961d469445 MCAsmParserExtension has a copy of the MCAsmParser. Use it.
Base classes were storing a second copy.

llvm-svn: 221667
2014-11-11 05:18:41 +00:00
Juergen Ributzka ea5870a530 [AArch64][FastISel] Fix kill flags for integer extends.
In the case we optimize an integer extend away and replace it directly with the
source register, we also have to clear all kill flags at all its uses.
This is necessary, because the orignal IR instruction might be trivially dead,
but we replaced it with a nop at MI level.

llvm-svn: 221628
2014-11-10 21:05:31 +00:00
Rafael Espindola 4aa6bea7a2 Misc style fixes. NFC.
This fixes a few cases of:

* Wrong variable name style.
* Lines longer than 80 columns.
* Repeated names in comments.
* clang-format of the above.

This make the next patch a lot easier to read.

llvm-svn: 221615
2014-11-10 18:11:10 +00:00
Ahmed Bougacha 72001cf287 [AArch64] Keep flags on condition vreg when instantiating a CB branch.
Reversing a CB* instruction used to drop the flags on the condition. On the
included testcase, this lead to a read from an undefined vreg.
Using addOperand keeps the flags, here <undef>.

Differential Revision: http://reviews.llvm.org/D6159

llvm-svn: 221507
2014-11-07 02:50:00 +00:00
Juergen Ributzka f9660f0712 [AArch64] Use the correct register class for ORR.
While fixing up the register classes in the machine combiner in a previous
commit I missed one.

This fixes the last one and adds a test case.

llvm-svn: 221308
2014-11-04 22:20:07 +00:00
Benjamin Kramer 185dc0da1f AArch64: Pattern match integer vector abs like we do on ARM.
This kind of pattern is emitted by the loop vectorizer.

llvm-svn: 221289
2014-11-04 20:10:06 +00:00
Akira Hatanaka bc950d52d7 Rename variables to conform to llvm coding standards.
Differential Revision: http://reviews.llvm.org/D6062

llvm-svn: 221204
2014-11-03 23:24:10 +00:00
Akira Hatanaka 9ee2c26b49 [AArch64] Make function processLogicalImmediate more efficient. NFC.
llvm-svn: 221199
2014-11-03 23:06:31 +00:00
Oliver Stannard 269a275cb4 [AArch64] Fix miscompile of comparison with 0xffffffffffffffff
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.

llvm-svn: 221157
2014-11-03 15:28:40 +00:00
Rafael Espindola 246c4fb5d9 Remove redundant calls to isMaterializable.
This removes calls to isMaterializable in the following cases:

* It was redundant with a call to isDeclaration now that isDeclaration returns
  the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.

llvm-svn: 221050
2014-11-01 16:46:18 +00:00
Chad Rosier 7bb413e3ba [AArch64] Check Dest Register Liveness in CondOpt pass.
Our internal test reveals such case should not be transformed:

  cmp x17, #3
  b.lt .LBB10_15
  ...
  subs x12, x12, #1
  b.gt .LBB10_1

where x12 is a liveout, becomes:

  cmp x17, #2
  b.le .LBB10_15
  ...
  subs x12, x12, #2
  b.ge .LBB10_1

Unable to provide test case as it's difficult to reproduce on community branch.

http://reviews.llvm.org/D6048
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!

llvm-svn: 220987
2014-10-31 19:02:38 +00:00
Chad Rosier a675e550ca [AArch64] CondOpt pass is missing FCMP instructions when searching backward for
a CMP which defines the flags used by B.CC.

http://reviews.llvm.org/D6047
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!

llvm-svn: 220961
2014-10-31 15:17:36 +00:00
Tim Northover 00917897b2 AArch64: enable Cortex-A57 FP balancing on Cortex-A53.
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.

Patch by Z. Zheng.

llvm-svn: 220744
2014-10-28 01:24:32 +00:00
NAKAMURA Takumi 949fb6d276 AArch64InstrInfo.h: Fix a warning introduced in clang r220703. [-Winconsistent-missing-override]
llvm-svn: 220739
2014-10-27 23:29:27 +00:00
Juergen Ributzka 7ccebec668 [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check.
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.

This fixes rdar://problem/18785125.

llvm-svn: 220713
2014-10-27 19:58:36 +00:00
Juergen Ributzka 0190fea941 [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.
Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and'
instruction.

This fixes rdar://problem/18784953.

llvm-svn: 220712
2014-10-27 19:46:23 +00:00
Juergen Ributzka 90f741a2ce [FastISel][AArch64] Use 'cbz' also for null values (pointers).
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.

This fixes rdar://problem/18784732.

llvm-svn: 220709
2014-10-27 19:38:05 +00:00
Juergen Ributzka eae91040d8 [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block.
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.

This fixes rdar://problem/18784013.

llvm-svn: 220704
2014-10-27 19:16:48 +00:00
Juergen Ributzka 6de054a25a [FastISel][AArch64] Fix load/store with frame indices.
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.

This fix extends the possible addressing modes for frame indices.

This fixes rdar://problem/18783298.

llvm-svn: 220700
2014-10-27 18:21:58 +00:00
Lang Hames 5fe30ca56f [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these
sets as keys into a cache of interference matrice values in the Interference
constraint adder.

Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.

llvm-svn: 220688
2014-10-27 17:44:25 +00:00
Oliver Stannard f7a5afc3f2 [AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.

llvm-svn: 220553
2014-10-24 09:54:41 +00:00
Chad Rosier dcd2a3014c [AArch64] Add support for the .inst directive.
This has been implement using the MCTargetStreamer interface as is done in the
ARM, Mips and PPC backends.

Phabricator: http://reviews.llvm.org/D5891
PR20964

llvm-svn: 220422
2014-10-22 20:35:57 +00:00
Arnaud A. de Grandmaison 9b3330546b [AArch64] Cleanup A57PBQPConstraints
And add a long awaited testcase.

llvm-svn: 220381
2014-10-22 12:40:20 +00:00
Arnaud A. de Grandmaison a61262f989 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
James Molloy f497d5511d [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Juergen Ributzka 03a0611061 [AArch64] Fix miscompile of sdiv-by-power-of-2.
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934
2014-10-16 16:41:15 +00:00
Juergen Ributzka f82c987a5c Reapply "[FastISel][AArch64] Add custom lowering for GEPs."
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.

The original commit had a bug in the add emit logic, which has been fixed.

llvm-svn: 219831
2014-10-15 18:58:07 +00:00
Juergen Ributzka 6780f0f7a0 [FastISel][AArch64] Factor out add with immediate emission into a helper function. NFC.
Simplify add with immediate emission by factoring it out into a helper function.

llvm-svn: 219830
2014-10-15 18:58:02 +00:00
Rafael Espindola 7b61ddfa6e Simplify handling of --noexecstack by using getNonexecutableStackSection.
llvm-svn: 219799
2014-10-15 16:12:52 +00:00
Juergen Ributzka 42379d4cf7 Revert "[FastISel][AArch64] Add custom lowering for GEPs."
This breaks our internal build bots. Reverting it to get the bots green again.

llvm-svn: 219776
2014-10-15 04:55:48 +00:00
Eric Christopher 7396cc9978 Remove unused variable.
llvm-svn: 219750
2014-10-15 00:09:07 +00:00
Gerolf Hoflehner 5d26d40fc5 [AArch64] Wrong CC access in CSINC-conditional branch sequence
This is a follow up to commit r219742. It removes the CCInMI variable
and accesses the CC in CSCINC directly. In the case of a conditional
branch accessing the CC with CCInMI was wrong.

llvm-svn: 219748
2014-10-14 23:55:00 +00:00
Gerolf Hoflehner a4c96d02a2 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Juergen Ributzka 4dfd590eaa [FastISel][AArch64] Add custom lowering for GEPs.
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.

llvm-svn: 219726
2014-10-14 21:41:23 +00:00
Juergen Ributzka cd11a2806b [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.

Related to rdar://problem/18495928.

llvm-svn: 219716
2014-10-14 20:36:02 +00:00
Bradley Smith 698e08f4cf [AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround
llvm-svn: 219684
2014-10-14 14:02:41 +00:00
Hao Liu 3cb826ca10 [AArch64]Select wide immediate offset into [Base+XReg] addressing mode
e.g Currently we'll generate following instructions if the immediate is too wide:
    MOV  X0, WideImmediate
    ADD  X1, BaseReg, X0
    LDR  X2, [X1, 0]

    Using [Base+XReg] addressing mode can save one ADD as following:
    MOV  X0, WideImmediate
    LDR  X2, [BaseReg, X0]

    Differential Revision: http://reviews.llvm.org/D5477

llvm-svn: 219665
2014-10-14 06:50:36 +00:00
Bradley Smith f2a801d8ac [AArch64] Add workaround for Cortex-A53 erratum (835769)
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result.  The details are quite complex and hard to
determine statically, since branches in the code may exist in some
 circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.

This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.

The work-around code generation is not enabled by default.

llvm-svn: 219603
2014-10-13 10:12:35 +00:00
Lang Hames 6aed984e13 [PBQP] Add missing headers from r219421.
llvm-svn: 219425
2014-10-09 18:36:59 +00:00
Lang Hames 8f31f448c5 [PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint).
This patch removes the PBQPBuilder class and its subclasses and replaces them
with a composable constraints class: PBQPRAConstraint. This allows constraints
that are only required for optimisation (e.g. coalescing, soft pairing) to be
mixed and matched.

This patch also introduces support for target writers to supply custom
constraints for their targets by overriding a TargetSubtargetInfo method:

std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const;

This patch should have no effect on allocations.

llvm-svn: 219421
2014-10-09 18:20:51 +00:00
Kevin Qin 72a799a68a [AArch64] Enable partial & runtime unrolling on cortex-a57.
llvm-svn: 219401
2014-10-09 10:13:27 +00:00
Chad Rosier d9d0f86a79 [AArch64] Generate vector signed/unsigned mul and mla/mls long.
Phabricator Revision: http://reviews.llvm.org/D5589
Patch by Balaram Makam <bmakam@codeaurora.org>!!

llvm-svn: 219276
2014-10-08 02:31:24 +00:00
Juergen Ributzka ef3722d8e9 [FastISel][AArch64] Teach the address computation code to also fold sign-/zero-extends.
The code already folds sign-/zero-extends, but only if they are arguments to
mul and shift instructions. This extends the code to also fold them when they
are direct inputs.

llvm-svn: 219187
2014-10-07 03:40:06 +00:00
Juergen Ributzka 75b2f34069 [FastISel][AArch64] Teach the address computation to also fold sub instructions.
Tiny enhancement to the address computation code to also fold sub instructions
if the rhs is constant and can be folded into the offset.

llvm-svn: 219186
2014-10-07 03:40:03 +00:00
Juergen Ributzka 42bf665f2b [FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction."
This commit fixes an issue with sign-/zero-extending loads that was discovered
by Richard Barton.

We use now the correct load instructions for sign-extending loads to 64bit. Also
updated and added more unit tests.

llvm-svn: 219185
2014-10-07 03:39:59 +00:00
Eric Christopher 3faf2f1e02 Add subtarget caches to aarch64, arm, ppc, and x86.
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.

llvm-svn: 219106
2014-10-06 06:45:36 +00:00
Benjamin Kramer 2e52f02864 Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it.
llvm-svn: 219068
2014-10-04 22:44:29 +00:00
Jingyue Wu 4938e271c6 Add fake use to suppress defined-but-unused warnings
llvm-svn: 219045
2014-10-04 03:50:10 +00:00
Benjamin Kramer e12a6bac32 Eliminate some deep std::vector copies. NFC.
llvm-svn: 218999
2014-10-03 18:33:16 +00:00
Eric Christopher f12e1ab313 constify TargetMachine parameter.
llvm-svn: 218934
2014-10-03 00:42:41 +00:00
Juergen Ributzka 99bd3cba8b [Stackmaps] Make ithe frame-pointer required for stackmaps.
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.

This fixes PR21107.

llvm-svn: 218920
2014-10-02 22:21:49 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Tom Coxon e493f177ee [AArch64] Allow access to all system registers with MRS/MSR instructions.
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
    S<op0>_<op1>_<CRn>_<CRm>_<op2>

The encoding space permitted for implementation-defined system registers
is:
    op0 op1  CRn   CRm   op2
    11  xxx  1x11  xxxx  xxx

The full encoding space can now be accessed:
    op0 op1  CRn   CRm   op2
    xx  xxx  xxxx  xxxx  xxx

This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.

llvm-svn: 218753
2014-10-01 10:13:59 +00:00
Asiri Rathnayake 530b3edab6 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

llvm-svn: 218751
2014-10-01 09:59:45 +00:00
Juergen Ributzka c110c0b99a Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218693
2014-09-30 19:59:35 +00:00
Tom Coxon 2c13e71728 [AArch64] Remove unnecessary whitespace. (Test commit)
llvm-svn: 218680
2014-09-30 16:23:16 +00:00
Juergen Ributzka 6ac12439d0 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

llvm-svn: 218653
2014-09-30 00:49:58 +00:00
Juergen Ributzka 0616d9d41a [FastISel][AArch64] Factor out scale factor calculation. NFC.
Factor out the code that determines the implicit scale factor of memory
operations for a given value type.

llvm-svn: 218652
2014-09-30 00:49:54 +00:00
Dave Estes 5f9daea101 [AArch64] Refines the Cortex-A57 Machine Model
Primarily refines all of the instructions with accurate latency
and micro-op information. Refinements largely focus on the NEON
instructions.

Additionally, a few advanced features are modeled, including
forwarding for MAC instructions and hazards for floating point SQRT
and DIV.

Lastly, the issue-width is reduced to three so that the scheduler
will better accommodate the narrower decode and dispatch width.

llvm-svn: 218627
2014-09-29 21:27:36 +00:00
Chad Rosier 70d54ac848 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

llvm-svn: 218607
2014-09-29 13:59:31 +00:00
Jim Grosbach 57fd2623c3 AArch64: allow constant expressions for shifted reg literals
e.g., add w1, w2, w3, lsl #(2 - 1)

This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.

rdar://18430542

llvm-svn: 218336
2014-09-23 22:16:02 +00:00
Oliver Stannard c546625c4f Fix segfault in AArch64 backend with -g and -mbig-endian
Fix a null pointer dereference when trying to swap the endianness of
fixups in the .eh_frame section in the AArch64 backend.

llvm-svn: 218311
2014-09-23 15:38:11 +00:00
Juergen Ributzka 27e959d7b2 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

llvm-svn: 218275
2014-09-22 21:08:53 +00:00
Juergen Ributzka 92e8978e40 [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

llvm-svn: 218161
2014-09-19 22:23:46 +00:00
Aaron Ballman 0bb041b5f4 Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required.
llvm-svn: 218062
2014-09-18 17:34:23 +00:00
Aaron Ballman 11fa97fa32 Fixing a bunch of -Woverloaded-virtual warnings due to hiding getSubtargetImpl from the base class. NFC.
llvm-svn: 218050
2014-09-18 13:27:14 +00:00
Juergen Ributzka 1d3a312e2d Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."
Reverting it until I have time to investigate a regression.

llvm-svn: 218035
2014-09-18 08:07:40 +00:00
Juergen Ributzka 0f3076785f Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.

llvm-svn: 218034
2014-09-18 07:26:26 +00:00
Juergen Ributzka 2964b832ef [FastISel][AArch64] Simplify XALU multiplies.
Simplify {s|u}mul.with.overflow to {s|u}add.with.overflow when possible.

llvm-svn: 218033
2014-09-18 07:04:54 +00:00
Juergen Ributzka 2fc851002b [FastISel][AArch64] Followup commit for 218031 to handle negative offsets too.
llvm-svn: 218032
2014-09-18 07:04:49 +00:00
Juergen Ributzka a33070c321 [FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address.
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.

llvm-svn: 218031
2014-09-18 05:40:47 +00:00
Juergen Ributzka 99b7758ba0 [FastISel][AArch64] Fold 'AND' instruction during the address computation.
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.

and  x1, x1, #0xffffffff   ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]

llvm-svn: 218030
2014-09-18 05:40:41 +00:00
Juergen Ributzka c35fb03661 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218010
2014-09-18 02:44:13 +00:00
Juergen Ributzka f6430314b4 [FastISel][AArch64] Custom lower sdiv by power-of-2.
Emit an optimized instruction sequence for sdiv by power-of-2 depending on the
exact flag.

This fixes rdar://problem/18224511.

llvm-svn: 217986
2014-09-17 21:55:55 +00:00
Juergen Ributzka c611d72754 [FastISel][AArch64] Simplify mul to shift when possible.
This is related to rdar://problem/18369687.

llvm-svn: 217980
2014-09-17 20:35:41 +00:00
Juergen Ributzka 3871c69422 [FastISel][AArch64] Fold mul into add/sub and logical operations.
Try to fold the multiply into the add/sub or logical operations (when
possible).

This is related to rdar://problem/18369687.

llvm-svn: 217978
2014-09-17 19:51:38 +00:00
Juergen Ributzka 22d4cd0a4f [FastISel][AArch64] Fold mul into the address computation of memory operations.
Teach 'computeAddress' to also fold multiplies into the address computation
(when possible).

This fixes rdar://problem/18369443.

llvm-svn: 217977
2014-09-17 19:19:31 +00:00
Juergen Ributzka d8e30c0db8 [FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ.
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.

This is related to rdar://problem/18358882.

llvm-svn: 217972
2014-09-17 18:05:34 +00:00
Juergen Ributzka fb3e14375a [FastISel][AArch64] Improve branch selection to support all FP conditions.
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.

This also adds unit tests to checks all the different condition codes.

This is related o rdar://problem/18358882.

llvm-svn: 217966
2014-09-17 17:46:47 +00:00
Robin Morisset 25c8e318e4 [X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).

Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.

llvm-svn: 217928
2014-09-17 00:06:58 +00:00
Juergen Ributzka 59e631c728 [FastISel][AArch64] Add vector support to argument lowering.
Lower the first 8 vector arguments too.

llvm-svn: 217850
2014-09-16 00:25:30 +00:00
Juergen Ributzka de47c47cc1 [FastISel][AArch64] Allow handling of vectors during return lowering for little endian machines.
Allow handling of vectors during return lowering at least for little endian machines.
This was restricted in r208200 to fix it for big endian machines (according to
the comment), but it also disabled it for little endian too.

llvm-svn: 217846
2014-09-15 23:40:10 +00:00
Juergen Ributzka b9e49c73ee [FastISel][AArch64] Update function and variable names to follow the coding standard. NFC.
llvm-svn: 217845
2014-09-15 23:20:17 +00:00
Juergen Ributzka cbe802e730 [FastISel][AArch64] Make AArch64FastISel class final. NFC.
llvm-svn: 217840
2014-09-15 22:33:11 +00:00
Juergen Ributzka 993224a553 [FastISel][AArch64] Lower sin/cos/pow to runtime lib calls.
Also lower sin/cos/pow to runtime lib calls.

This fixes rdar://problem/18343468.

llvm-svn: 217839
2014-09-15 22:33:06 +00:00
Juergen Ributzka afa034fb61 [FastISel][AArch64] Add lowering support for frem.
This lowers frem to a runtime libcall inside fast-isel.

The test case also checks the CallLoweringInfo bug that was exposed by this
change.

This fixes rdar://problem/18342783.

llvm-svn: 217833
2014-09-15 22:07:49 +00:00
Juergen Ributzka e1779e2a8b [FastISel][AArch64] Refactor selectAddSub, selectLogicalOp, and SelectShift. NFC.
Small refactor to tidy up the code a little.

llvm-svn: 217827
2014-09-15 21:27:56 +00:00
Juergen Ributzka 6127b1968d [FastISel][AArch64] Refactor code to use isTypeSupported. NFC.
Gets rid of isLoadStoreTypeLegal and replace it with isTypeSupported.

llvm-svn: 217826
2014-09-15 21:27:54 +00:00
Juergen Ributzka 8984f48d89 [FastISel][AArch64] Improve floating-point compare support.
Add support for the last two missing fcmp condition codes: UEQ and ONE.

This fixes rdar://problem/18341575.

llvm-svn: 217823
2014-09-15 20:47:16 +00:00
James Molloy 05ce999134 [A57FPLoadBalancing] Modify r217689 - actually we do need to check defs
... Just make sure we check uses first so we see the kill first. It
turns out ignoring defs gives some pretty nasty runtime failures.
I'm certain this is the fix but I'm still reducing a testcase.

llvm-svn: 217735
2014-09-14 18:24:26 +00:00
Juergen Ributzka 85c1f84650 [FastISel][AArch64] Add support for non-native types for logical ops.
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.

Fixes rdar://problem/18330589.

llvm-svn: 217732
2014-09-13 23:46:28 +00:00
Chad Rosier 347ed4e831 [AArch64] Don't enable the post-RA MI scheduler at OptNone.
Hopefully, this will appease the bots.

llvm-svn: 217712
2014-09-12 22:17:28 +00:00
Chad Rosier 486e087f26 [AArch64] Enable post-RA MI scheduler.
Phabricator Revision: http://reviews.llvm.org/D5278
Patch by Sanjin Sijaric!

llvm-svn: 217693
2014-09-12 17:40:39 +00:00
James Molloy 4689647dbb [A57FPLoadBalancing] Remove support for vector types
Vector MUL/MLAs have tied operands, which gives us extra constraints
that we currently can't handle. Instead of silently doing the wrong
thing, remove support to be readded later properly.

llvm-svn: 217690
2014-09-12 16:55:32 +00:00
James Molloy a6e05a789e [A57FPLoadBalancing] Ignore <def>s when checking if a chain may be killed.
Defs are seen before uses, so a def without the kill flag doesn't necessarily
mean that the register is not killed on that instruction. It may be killed
in a later use operand.

llvm-svn: 217689
2014-09-12 16:55:26 +00:00
James Molloy f0de7e58f6 [A57LoadBalancing] unique_ptr-ify.
Thanks to David Blakie for the in-depth review!

llvm-svn: 217682
2014-09-12 14:35:17 +00:00
Patrik Hagglund c287f4a358 Fix gcc -Wpedantic.
llvm-svn: 217669
2014-09-12 12:32:08 +00:00
Gerolf Hoflehner 7b0abb89c2 [AArch64] Revert r216141 for cyclone
The increase of the interleave factor to 4 has side-effects
like performance losses eg. due to reminder loops being executed
more frequently and may increase code size. It requires more
analysis and careful heuristic tuning. Expect double digit gains
in small benchmarks like lowercase.c and losses in puzzle.c.

llvm-svn: 217540
2014-09-10 20:31:57 +00:00
Sanjay Patel b653de1ada Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Arnaud A. de Grandmaison 0dbcfba659 [AArch64] Address Chad's post commit review comments for r217504 (PBQP experimental support)
llvm-svn: 217518
2014-09-10 17:03:25 +00:00
Arnaud A. de Grandmaison cfb28f77a4 [AArch64] Pacify lld buildbot complaining about an unused static function in release build.
llvm-svn: 217505
2014-09-10 14:24:02 +00:00
Arnaud A. de Grandmaison c75dbbbdd6 [AArch64] Add experimental PBQP support
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.

By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".

llvm-svn: 217504
2014-09-10 14:06:10 +00:00
Asiri Rathnayake 369c030633 [AArch 64] Use a constant pool load for weak symbol references when
using static relocation model and small code model.

Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.

Patch from: Keith Walker

Reviewers: Renato Golin, Tim Northover
llvm-svn: 217503
2014-09-10 13:54:38 +00:00
Chad Rosier bdbca15ccd [AArch64] Enabled AA support for Cortex-A57.
llvm-svn: 217381
2014-09-08 15:34:16 +00:00
Chad Rosier 3528c1e4c6 [AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103

llvm-svn: 217371
2014-09-08 14:43:48 +00:00
Chad Rosier c9f947744d [AArch64] Enabled AA support for Cortex-A53.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103

llvm-svn: 217370
2014-09-08 14:31:49 +00:00
Jiangning Liu 1a486da543 [AArch64] Add pass to enable additional comparison optimizations by CSE.
Patched by Sergey Dmitrouk.

This pass tries to make consecutive compares of values use same operands to
allow CSE pass to remove duplicated instructions. For this it analyzes
branches and adjusts comparisons with immediate values by converting:

GE -> GT
GT -> GE
LT -> LE
LE -> LT

and adjusting immediate values appropriately. It basically corrects two
immediate values towards each other to make them equal.

llvm-svn: 217220
2014-09-05 02:55:24 +00:00
Tim Northover f7423fd090 AArch64: fix vector-immediate BIC/ORR on big-endian devices.
Follow up to r217138, extending the logic to other NEON-immediate instructions.
As before, the instruction already performs the correct operation and we're
just using a different type for convenience, so we want a true nop-cast.

Patch by Asiri Rathnayake.

llvm-svn: 217159
2014-09-04 15:05:24 +00:00
Tim Northover bb72e6c804 AArch64: fix big-endian immediate materialisation
We were materialising big-endian constants using DAG nodes with types different
from what was requested, followed by a bitcast. This is fine on little-endian
machines where bitcasting is a nop, but we need a slightly different
representation for big-endian. This adds a new set of NVCAST (natural-vector
cast) operations which are always nops.

Patch by Asiri Rathnayake.

llvm-svn: 217138
2014-09-04 09:46:14 +00:00
Juergen Ributzka 30c02e36cc [FastISel][AArch64] Cleanup and simplify 'fastSelectInstruction'. NFC.
llvm-svn: 217119
2014-09-04 01:29:21 +00:00
Juergen Ributzka 1dbc15f02d [FastISel][AArch64] Add target-specific lowering for logical operations.
This change adds support for immediate and shift-left folding into logical
operations.

This fixes rdar://problem/18223183.

llvm-svn: 217118
2014-09-04 01:29:18 +00:00
Robin Morisset ed3d48f161 Refactor AtomicExpandPass and add a generic isAtomic() method to Instruction
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.

This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.

Test Plan: make check

Reviewers: jfb

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5035

llvm-svn: 217080
2014-09-03 21:29:59 +00:00
Juergen Ributzka 88e32517c4 [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.

Reviewed by Eric

llvm-svn: 217075
2014-09-03 20:56:59 +00:00
Juergen Ributzka 5b8bb4d7dd [FastISel] Rename public visible FastISel functions. NFC.
This commit renames the following public FastISel functions:
LowerArguments -> lowerArguments
SelectInstruction -> selectInstruction
TargetSelectInstruction -> fastSelectInstruction
FastLowerArguments -> fastLowerArguments
FastLowerCall -> fastLowerCall
FastLowerIntrinsicCall -> fastLowerIntrinsicCall
FastEmitZExtFromI1 -> fastEmitZExtFromI1
FastEmitBranch -> fastEmitBranch
UpdateValueMap -> updateValueMap
TargetMaterializeConstant -> fastMaterializeConstant
TargetMaterializeAlloca -> fastMaterializeAlloca
TargetMaterializeFloatZero -> fastMaterializeFloatZero
LowerCallTo -> lowerCallTo

Reviewed by Eric

llvm-svn: 217074
2014-09-03 20:56:52 +00:00
Eric Christopher e08189195b Remove unnecessary getTarget call now that the subtarget is cached
on the machine function.

llvm-svn: 217070
2014-09-03 20:36:26 +00:00
Juergen Ributzka 7a76c2409e [FastISel] Some long overdue spring cleaning of FastISel.
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.

This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.

Reviewed by Eric

llvm-svn: 217060
2014-09-03 18:46:45 +00:00
Juergen Ributzka 31c8054594 [FastISel][AArch64] Move unconditional branch handling into 'SelectBranch'. NFC.
llvm-svn: 217054
2014-09-03 17:58:10 +00:00
Benjamin Kramer 8c90fd71f7 Add override to overriden virtual methods, remove virtual keywords.
No functionality change. Changes made by clang-tidy + some manual cleanup.

llvm-svn: 217028
2014-09-03 11:41:21 +00:00
Juergen Ributzka 31e5b7fb12 Reapply r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR.""
This reapplies r216805 with a fix to a copy-past error, which resulted in an
incorrect register class.

Original commit message:
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.

This fixes rdar://problem/18183707.

llvm-svn: 217019
2014-09-03 07:07:10 +00:00
Juergen Ributzka a1148b2173 [FastISel][AArch64] Add target-dependent instruction selection for Add/Sub.
There is already target-dependent instruction selection support for Adds/Subs to
support compares and the intrinsics with overflow check. This takes advantage of
the existing infrastructure to also support Add/Sub, which allows the folding of
immediates, sign-/zero-extends, and shifts.

This fixes rdar://problem/18207316.

llvm-svn: 217007
2014-09-03 01:38:36 +00:00
Juergen Ributzka 53dbef6ef1 [FastISel][AArch64] Use the target-dependent selection code for shifts first.
This uses the target-dependent selection code for shifts first, which allows us
to create better code for shifts with immediates and sign-/zero-extend folding.

Vector type are not handled yet and the code falls back to target-independent
instruction selection for these cases.

This fixes rdar://problem/17907920.

llvm-svn: 216985
2014-09-02 22:33:57 +00:00
Juergen Ributzka 8a4b8bebdc [FastISel][AArch64] Use a new helper function to determine if a value type is supported. NFCI.
FastISel for AArch64 supports more value types than are actually legal. Use a
dedicated helper function to reflect this.

It is very similar to the isLoadStoreTypeLegal function, with the exception
that vector types are not supported yet.

llvm-svn: 216984
2014-09-02 22:33:53 +00:00
Eric Christopher 79cc1e3ae7 Reinstate "Nuke the old JIT."
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reinstates commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 216982
2014-09-02 22:28:02 +00:00
Juergen Ributzka dbe9e174b6 [FastISel][AArch64] Move over to target-dependent instruction selection only.
This change moves FastISel for AArch64 to target-dependent instruction selection
only. This change replicates the existing target-independent behavior, therefore
there are no changes to the unit tests or new tests.

Future changes will take advantage of this change and update functionality
and unit tests.

llvm-svn: 216955
2014-09-02 21:32:54 +00:00
Pete Cooper 1175945710 Change MCSchedModel to be a struct of statically initialized data.
This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour

Reviewed by Andy Trick and Chandler C

llvm-svn: 216919
2014-09-02 17:43:54 +00:00
Alexey Samsonov 729b12ede3 Fix left shifts of negative integers in AArch64 InstPrinter/Disassembler
Summary:
Left shift of negative integer is an undefined behavior, and
is reported by UBSan. It's ok for imm values to be negative, so we can
just replace left shifts with multiplications.

Test Plan: check-llvm test suite

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: aemerson, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5132

llvm-svn: 216910
2014-09-02 16:19:41 +00:00
Aaron Ballman 8ca53885fa Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.
llvm-svn: 216902
2014-09-02 12:19:02 +00:00
David Xu 052b9d9282 Merge Extend and Shift into a UBFX
llvm-svn: 216899
2014-09-02 09:33:56 +00:00
Craig Topper fd38cbebda Remove 'virtual' keyword from methods markedwith 'override' keyword.
llvm-svn: 216823
2014-08-30 16:48:34 +00:00
Juergen Ributzka 25816b0fdd Revert r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR."
I think this broke the build bot. Reverting it for now until I have time to take a closer look.

llvm-svn: 216813
2014-08-30 06:16:26 +00:00
Juergen Ributzka 3e7f88c169 [MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR.
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.

This fixes rdar://problem/18183707.

llvm-svn: 216805
2014-08-29 23:48:09 +00:00
Juergen Ributzka c5c1c6090f [FastISel][AArch64] Use the correct register class for branches.
Also constrain the register class for branches.

This fixes rdar://problem/18181496.

llvm-svn: 216804
2014-08-29 23:48:06 +00:00
Alexey Samsonov 700964ea00 Make isValidMCLOHType take unsigned instead of enum to avoid loading invalid enum values
llvm-svn: 216797
2014-08-29 22:34:28 +00:00
Reid Kleckner 39ad7c9812 AArch64: Silence -Wabsolute-value warning with std::abs
llvm-svn: 216794
2014-08-29 22:14:26 +00:00
Robin Morisset 039781ef26 Fix typos in comments, NFC
Summary: Just fixing comments, no functional change.

Test Plan: N/A

Reviewers: jfb

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5130

llvm-svn: 216784
2014-08-29 21:53:01 +00:00
Louis Gerbarg 03c627e8a7 Remove spurious mask operations from AArch64 add->compares on 16 and 8 bit values
This patch checks for DAG patterns that are an add or a sub followed by a
compare on 16 and 8 bit inputs. Since AArch64 does not support those types
natively they are legalized into 32 bit values, which means that mask operations
are inserted into the DAG to emulate overflow behaviour. In many cases those
masks do not change the result of the processing and just introduce a dependent
operation, often in the middle of a hot loop.

This patch detects the relevent DAG patterns and then tests to see if the
transforms are equivalent with and without the mask, removing the mask if
possible. The exact mechanism of this patch was discusses in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074444.html

There is a reasonably good chance there are missed oppurtunities due to similiar
(but not identical) DAG patterns that could be funneled into this test, adding
them should be simple if we see test cases.

Tests included.

rdar://13754426

llvm-svn: 216776
2014-08-29 21:00:22 +00:00
Juergen Ributzka f6ee7a7cdd [FastISel][AArch64] Fix an incorrect kill flag due to a bug in SelectTrunc.
When we select a trunc instruction we don't emit any code if the type is already
i32 or smaller. This is because the instruction that uses the truncated value
will deal with it.

This behavior can incorrectly transfer a kill flag, which was meant for the
result of the truncate, onto the source register.

%2 = trunc i32 %1 to i16
... = ... %2                -> ... = ... vreg1 <kill>
... = ... %1                   ... = ... vreg1

This commit fixes this by emitting a COPY instruction, so that the result and
source register are distinct virtual registers.

This fixes rdar://problem/18178188.

llvm-svn: 216750
2014-08-29 17:58:16 +00:00
Tim Northover 3c0915e858 AArch64: only try to get operand of a known node.
A bug in r216725 meant we tried to discover the type of a SETCC before
confirming the node actually was a SETCC.

llvm-svn: 216734
2014-08-29 15:34:58 +00:00
Tim Northover c1c05aeb5d AArch64: skip select/setcc combine in complex case.
In an llvm-stress generated test, we were trying to create a v0iN type and
asserting when that failed. This case could probably be handled by the
function, but not without added complexity and the situation it arises in is
sufficiently odd that there's probably no benefit anyway.

Should fix PR20775.

llvm-svn: 216725
2014-08-29 13:05:18 +00:00
Arnaud A. de Grandmaison 6afbf2aa5e [AArch64] FPLoadBalancing: move ownership of the chain to its current accumulator register
and forget about the previously used accumulator.

Coming up with a simple testcase is not easy, as this highly depends on
what the register allocator is doing: this issue showed up while working
with the PBQP allocator, which produced a different allocation scheme.
A testcase would need to come up with chain starting in D[0-7], then
moving to D[8-15], followed by a call to a function whose regmask
clobbers the starting accumulator in D[0-7], then another use of the chain.

Fixed some formatting, added some invariant checks while there.

llvm-svn: 216721
2014-08-29 09:54:11 +00:00
Jiangning Liu 08f4cda2ec [AArch64] Fix some failures exposed by value type v4f16 and v8f16.
1) Add some missing bitcast patterns for v8f16.
2) Add type promotion for operand of ld/st operations.

llvm-svn: 216706
2014-08-29 01:31:42 +00:00
Juergen Ributzka 77bc09f5ab [FastISel][AArch64] Don't fold instructions that are not in the same basic block.
This fix checks first if the instruction to be folded (e.g. sign-/zero-extend,
or shift) is in the same machine basic block as the instruction we are folding
into.

Not doing so can result in incorrect code, because the value might not be
live-out of the basic block, where the value is defined.

This fixes rdar://problem/18169495.

llvm-svn: 216700
2014-08-29 00:19:21 +00:00
Jim Grosbach ec2b0d0b11 AArch64: More correctly constrain target vector extend lowering.
The AArch64 target lowering for [zs]ext of vectors is set up to handle
input simple types and expects the generic SDag path to do something reasonable
with anything that's not a simple type. The code, however, was only
checking that the result type was a simple type and assuming that
implied that the source type would also be a simple type. That's not a
valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>"
demonstrate. The fix is to simply explicitly validate the source type
as well as the result type.

PR20791

llvm-svn: 216689
2014-08-28 22:08:28 +00:00
David Xu ee978203e6 Generate CMN when comparing a short int with minus
llvm-svn: 216651
2014-08-28 04:59:53 +00:00
Juergen Ributzka 843f14f411 Revert "[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation."
Quentin pointed out that this is not the correct approach and there is a better and easier solution.

llvm-svn: 216632
2014-08-27 23:09:40 +00:00
Juergen Ributzka ad8beabe38 [FastISel][AArch64] Don't fold instructions too aggressively into the memory operation.
Currently instructions are folded very aggressively into the memory operation,
which can lead to the use of killed operands:
  %vreg1<def> = ADDXri %vreg0<kill>, 2
  %vreg2<def> = LDRBBui %vreg0, 2
  ... = ... %vreg1 ...

This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.

If the computed address is used by only memory operations in the same basic
block, then it is safe to fold them. This is because all memory operations will
fold the address computation and the original computation will never be emitted.

This fixes rdar://problem/18142857.

llvm-svn: 216629
2014-08-27 22:52:33 +00:00
Juergen Ributzka 56b4b33190 [FastISel][AArch64] Fix a comment in my previous commit (r216617).
llvm-svn: 216622
2014-08-27 21:40:50 +00:00
Juergen Ributzka 3c1b286152 [FastISel][AArch64] Fix simplify address when the address comes from a shift.
When the address comes directly from a shift instruction then the address
computation cannot be folded into the memory instruction, because the zero
register is not available as a base register. Simplify addess needs to emit the
shift instruction and use the result as base register.

llvm-svn: 216621
2014-08-27 21:38:33 +00:00
Juergen Ributzka 100a9b7fda [FastISel][AArch64] Use the zero register for stores.
Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.

llvm-svn: 216617
2014-08-27 21:04:52 +00:00
Oliver Stannard 89d1542840 Teach the AArch64 backend about v4f16 and v8f16
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.

llvm-svn: 216555
2014-08-27 16:16:04 +00:00
Craig Topper e1d1294853 Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.
llvm-svn: 216525
2014-08-27 05:25:25 +00:00
Juergen Ributzka fb506a417d [FastISel][AArch64] Fix address simplification.
When a shift with extension or an add with shift and extension cannot be folded
into the memory operation, then the address calculation has to be materialized
separately. While doing so the code forgot to consider a possible sign-/zero-
extension. This fix folds now also the sign-/zero-extension into the add or
shift instruction which is used to materialize the address.

This fixes rdar://problem/18141718.

llvm-svn: 216511
2014-08-27 00:58:30 +00:00
Juergen Ributzka 99dd30f338 [FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction.
llvm-svn: 216510
2014-08-27 00:58:26 +00:00
James Molloy 36b8a88188 Change the return value of "getEnd()" from a MachineInstr* to a MachineBasicBlock::iterator.
It seems on Darwin the illegal round-trip ::iterator -> MachineInstr* -> ::iterator breaks execution horribly when the iterator is not a real MachineInstr, like ::end().

llvm-svn: 216455
2014-08-26 13:41:31 +00:00
Dylan Noblesmith 4af4d2c111 AArch64: use std::fill instead of memset
Followup based on review.

llvm-svn: 216436
2014-08-26 03:33:26 +00:00
Dylan Noblesmith b06f77b608 Revert "AArch64: use std::vector for temp array"
This reverts commit r216365.

llvm-svn: 216433
2014-08-26 02:03:43 +00:00
Juergen Ributzka 1912e24898 [FastISel][AArch64] Refactor float zero materialization. NFCI.
llvm-svn: 216403
2014-08-25 19:58:05 +00:00
Karthik Bhat 7f33ff7dea Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)

llvm-svn: 216371
2014-08-25 04:56:54 +00:00
Dylan Noblesmith b899464f5b AArch64: unique_ptr-ify map structures
llvm-svn: 216366
2014-08-25 01:59:38 +00:00
Dylan Noblesmith 6076debd98 AArch64: use std::vector for temp array
llvm-svn: 216365
2014-08-25 01:59:36 +00:00
Juergen Ributzka 0e0b4c1cda [FastISel][AArch64] Add support for variable shift.
This adds the missing variable shift support for value type i8, i16, and i32.

This fixes <rdar://problem/18095685>.

llvm-svn: 216242
2014-08-21 23:06:07 +00:00
Robin Morisset 59c23cd946 Rename AtomicExpandLoadLinked into AtomicExpand
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of
a group that aim at making it more target-independent. See
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html
for details

The command line option is "atomic-expand"

llvm-svn: 216231
2014-08-21 21:50:01 +00:00
Juergen Ributzka addb75a4f3 [FastISel][AArch64] Use the correct register class to make the MI verifier happy.
This is mostly achieved by providing the correct register class manually,
because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and
MVT::i64.

Also cleanup the code to use the FastEmitInst_* method whenever possible. This
makes sure that the operands' register class is properly constrained. For all
the remaining cases this adds the missing constrainOperandRegClass calls for
each operand.

llvm-svn: 216225
2014-08-21 20:57:57 +00:00
Quentin Colombet 0c740d4b9a [AArch64] Run a peephole pass right after AdvSIMD pass.
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.

<rdar://problem/12702965>

llvm-svn: 216200
2014-08-21 18:10:07 +00:00
Juergen Ributzka c83265a6c5 [FastISel][AArch64] Factor out ANDWri instruction generation into a helper function. NFCI.
llvm-svn: 216199
2014-08-21 18:02:25 +00:00
James Molloy a88896b5c0 [LoopVectorize] Up the maximum unroll factor to 4 for AArch64
Only for Cortex-A57 and Cyclone for now, where it has shown wins.

llvm-svn: 216141
2014-08-21 00:02:51 +00:00
Juergen Ributzka e1bb055ed3 [FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare.
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.

Related to <rdar://problem/17913111>.

llvm-svn: 216073
2014-08-20 16:34:15 +00:00
Aaron Ballman bf6ee22113 Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.
llvm-svn: 216067
2014-08-20 12:14:35 +00:00
Juergen Ributzka 0781b860e4 [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.

This is related to <rdar://problem/18027157>.

llvm-svn: 216040
2014-08-20 01:10:36 +00:00
Juergen Ributzka c0886dd5b0 [FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding.
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.

This fixes <rdar://problem/17913111>.

llvm-svn: 216033
2014-08-19 22:29:55 +00:00
Alexey Samsonov 96b02d1573 Delete unused argument in AArch64MCInstLower constructor: it doesn't
use Mangler, and Mangler is in fact not even created when AArch64MCInstLower
is constructed.

This bug is reported by UBSan.

llvm-svn: 216030
2014-08-19 21:51:08 +00:00
Juergen Ributzka b46ea081ad Reapply [FastISel][AArch64] Add support for more addressing modes (r215597).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 216013
2014-08-19 19:44:17 +00:00
Juergen Ributzka 7e23f77d82 Reapply [FastISel][AArch64] Make use of the zero register when possible (r215591).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 216009
2014-08-19 19:44:02 +00:00
Alexey Samsonov f17f03e00e Hide two different AlignMode enums in anonymous namespaces. This bug is reported by UBSan.
llvm-svn: 216001
2014-08-19 18:40:39 +00:00
Juergen Ributzka 5460cbfda4 [FastISel][AArch64] Fix a few BuildMI callsites where the result register was added as an operand register.
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.

Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).

llvm-svn: 215997
2014-08-19 17:41:53 +00:00
Robin Morisset b155f529fc Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends
Summary:
Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends
These helper functions are introduced in D4844.
Depends D4844

Test Plan: make check-all passes

Reviewers: jfb

Subscribers: aemerson, llvm-commits, mcrosier, reames

Differential Revision: http://reviews.llvm.org/D4937

llvm-svn: 215902
2014-08-18 16:48:58 +00:00
Oliver Stannard f5469bec97 Teach the AArch64 backend to handle f16
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.

llvm-svn: 215891
2014-08-18 14:22:39 +00:00
Oliver Stannard 12993dd916 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890
2014-08-18 12:42:15 +00:00
Tim Northover 26bb14e6a7 TableGen: allow use of uint64_t for available features mask.
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".

Mostly just refactoring at present, and there's probably no way to test.

llvm-svn: 215887
2014-08-18 11:49:42 +00:00
Robin Morisset d18cda620c Fix typos in comments
llvm-svn: 215777
2014-08-15 22:17:28 +00:00
Juergen Ributzka 6597d319fe [FastISel][AArch64] Fix a latent bug in floating-point materialization.
The floating-point value positive zero (+0.0) is a valid immedate value
according to isFPImmLegal. As a result AArch64 FastISel went ahead and
used the immediate version of fmov to materialize the constant.

The problem is that the immediate version of fmov cannot encode an imediate for
postive zero. Instead a fmov from the zero register was supposed to be used in
this case.

This fix adds handling for this special case and uses fmov from the zero
register to materialize a positive zero (negative zeroes go to the constant
pool).

There is no test case for this, because this code is currently dead. It will be
enabled in a future commit and I will add a test case in a separate commit
after that.

This fixes <rdar://problem/18027157>.

llvm-svn: 215753
2014-08-15 18:55:55 +00:00
Juergen Ributzka 6bca986ef1 Reapplying [FastISel][AArch64] Cleanup constant materialization code. NFCI.
Note: This reapplies r215582 without any modifications. The refactoring wasn't
responsible for the buildbot failures.

Original commit message:
Cleanup and prepare constant materialization code for future commits.

llvm-svn: 215752
2014-08-15 18:55:52 +00:00
Amara Emerson 82da7d07a1 [AArch64] Narrow arguments passed in wrong position on the stack in
big-endian mode.

Patch by Asiri Rathnayake.

Differential Revision: http://reviews.llvm.org/D4922

llvm-svn: 215716
2014-08-15 14:29:57 +00:00
Rafael Espindola d610ba99cb Remove HasLEB128.
We already require CFI, so it should be safe to require .leb128 and .uleb128.

llvm-svn: 215712
2014-08-15 14:01:07 +00:00
Juergen Ributzka 790bacf232 Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673
2014-08-14 19:56:28 +00:00
Juergen Ributzka 34ed422c42 Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

llvm-svn: 215659
2014-08-14 17:10:54 +00:00
Moritz Roth 6f257cf27b Testing commit access.
Remove a trailing whitespace.

llvm-svn: 215653
2014-08-14 16:20:50 +00:00
Aaron Ballman 61acc22129 Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.
llvm-svn: 215642
2014-08-14 13:43:57 +00:00
David Majnemer c307c66765 AArch64: Silence warning in AArch64FastISel
GCC was emitting a signed vs unsigned comparison warning.

llvm-svn: 215620
2014-08-14 06:44:51 +00:00
Akira Hatanaka b74db09c97 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>

llvm-svn: 215600
2014-08-13 23:23:58 +00:00
Juergen Ributzka 98347d902e [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 215597
2014-08-13 22:53:29 +00:00
Juergen Ributzka 24080d60fa [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 215591
2014-08-13 22:13:14 +00:00
Gerolf Hoflehner fe2c11ffd6 [MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]
This is a cleaner solution to the problem described in r215431.
When instructions are combined a dangling DBG_VALUE is removed.
This resolves bug 20598.

llvm-svn: 215587
2014-08-13 22:07:36 +00:00
Juergen Ributzka 5ae43a136b [FastISel][AArch64] Cleanup constant materialization code. NFCI.
Cleanup and prepare constant materialization code for future commits.

llvm-svn: 215582
2014-08-13 21:34:04 +00:00
Benjamin Kramer a7c40ef022 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Gerolf Hoflehner eb90500d06 [MachineCombiner] Fix for ICE bug 20598
The combiner ignored DBG nodes when checking
the uses of a virtual register.

It combined a sequence like
   %vreg1 = madd %vreg2, %vreg3,...
   DBG_VALUE (%vreg1 ...)
   %vreg4 = add %vreg1,...
to
  %vreg4 = madd %vreg2, %vreg3

leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.

llvm-svn: 215431
2014-08-12 07:54:12 +00:00
Jim Grosbach 1eee3dfded Add missing closing namespace comment.
llvm-svn: 215402
2014-08-11 22:42:31 +00:00
Jim Grosbach 8e810ba411 AArch64: Tidy up a few comments.
Have the comments match the actual parameter names. Found via clang-tidy.

llvm-svn: 215401
2014-08-11 22:42:28 +00:00
Quentin Colombet c64c175173 [AArch64] Fix registerAllocator assigns same register for base and wback in
pre/post-index load and store.

Patch by Steven Wu <stevenwu@apple.com>

llvm-svn: 215390
2014-08-11 21:39:53 +00:00
Sylvestre Ledru 469de19a09 Fix typos:
* libaries => libraries
* avaiable => available

llvm-svn: 215366
2014-08-11 18:04:46 +00:00
Joerg Sonnenberger 752b91bd82 If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.

llvm-svn: 215310
2014-08-10 11:35:12 +00:00
Aaron Ballman 18ac683fe5 Resolving some type truncation warnings in MSVC (enum to bool in this case). No functional changes intended.
llvm-svn: 215293
2014-08-09 19:53:34 +00:00
Tim Northover e42fac5191 AArch64: avoid deleting the current iterator in a loop.
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.

llvm-svn: 215233
2014-08-08 17:31:52 +00:00
Juergen Ributzka 241fd486eb [FastISel][AArch64] Attach MachineMemOperands to load and store instructions.
llvm-svn: 215231
2014-08-08 17:24:10 +00:00
NAKAMURA Takumi 08e30fd3d2 AArch64A57FPLoadBalancing.cpp: Define ColorNames in !NDEBUG.
llvm-svn: 215226
2014-08-08 17:00:59 +00:00
Jiangning Liu dcc651f99f [AArch64] Fix a type conversion bug for anlyzing compare.
The bug can cause spec2006/483.xalancbmk failure.

Patched by David Xu.

llvm-svn: 215206
2014-08-08 14:19:29 +00:00
James Molloy 3feea9c11a [AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.

This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.

Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).

This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.

llvm-svn: 215199
2014-08-08 12:33:21 +00:00
Tim Northover 0f18ff9817 AArch64: stop trying to take control of all UnknownArch triples.
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).

Should fix PR20567.

llvm-svn: 215191
2014-08-08 08:27:44 +00:00
NAKAMURA Takumi 40da267976 AArch64InstrInfo.cpp: Fix \param(s). [-Wdocumentation]
llvm-svn: 215180
2014-08-08 02:04:18 +00:00
Eric Christopher b9fd9ed37e Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 215154
2014-08-07 22:02:54 +00:00
Gerolf Hoflehner 97c383bc36 MachineCombiner Pass for selecting faster instruction sequence on AArch64
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers

The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp

llvm-svn: 215151
2014-08-07 21:40:58 +00:00
Rafael Espindola f8b27c41e8 Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

llvm-svn: 215111
2014-08-07 14:21:18 +00:00
Eric Christopher b5217507c7 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

llvm-svn: 214988
2014-08-06 18:45:26 +00:00
Chad Rosier b481bdfec4 [AArch64] Add a few isTarget* API to AArch64 Subtarget.
llvm-svn: 214977
2014-08-06 16:56:58 +00:00
Chad Rosier afe7c93c7f [AArch64] Fix OS ABI flag for aarch64-linux-gnu target.
For triple aarch64-linux-gnu we were incorrectly setting IRIX.
For triple aarch64 we are correctly setting SYSV.

Patch by Ana Pazos <apazos@codeaurora.org>.

llvm-svn: 214974
2014-08-06 16:05:02 +00:00
James Molloy 99917946da [AArch64] Add a testcase for r214957.
llvm-svn: 214965
2014-08-06 13:31:32 +00:00
James Molloy f089ab70f4 [AArch64] Conditional selects are expensive on out-of-order cores.
Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.

This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.

llvm-svn: 214957
2014-08-06 10:42:18 +00:00
Yi Kong e56de69500 AArch64: Add support for instruction prefetch intrinsic
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.

Differential Revision: http://reviews.llvm.org/D4777

llvm-svn: 214860
2014-08-05 12:46:47 +00:00
James Molloy 2b8933c354 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Juergen Ributzka 9503327756 [FastIsel][AArch64] Fix previous commit r214844 (Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.)
The original code would fail for unsupported value types like i1, i8, and i16.
This fix changes the code to only create a sub-register copy for i64 value types
and all other types (i1/i8/i16/i32) just use the source register without any
modifications.

getRegClassFor() is now guarded by the i64 value type check, that guarantees
that we always request a register for a valid value type.

llvm-svn: 214848
2014-08-05 07:31:30 +00:00
Juergen Ributzka a126d1ef3c [FastISel][AArch64] Implement the FastLowerArguments hook.
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.

This should cover most of the trivial cases without falling back to
SelectionDAG.

This fixes <rdar://problem/17890986>.

llvm-svn: 214846
2014-08-05 05:43:48 +00:00
Kevin Qin ec100526e3 Revert "r214832 - MachineCombiner Pass for selecting faster instruction"
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.

llvm-svn: 214845
2014-08-05 05:43:47 +00:00
Juergen Ributzka 51f5326e25 [FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.
llvm-svn: 214844
2014-08-05 05:43:44 +00:00
Eric Christopher fc6de428c8 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Gerolf Hoflehner 4dbf44b9d8 MachineCombiner Pass for selecting faster instruction
sequence on AArch64

Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.

llvm-svn: 214832
2014-08-05 01:16:13 +00:00
Juergen Ributzka 53533e885a [FastISel][AArch64] Fix shift lowering for i8 and i16 value types.
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.

The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.

The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.

This fixes <rdar://problem/17907720>.

llvm-svn: 214788
2014-08-04 21:49:51 +00:00
Eric Christopher d913448b38 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Chad Rosier 5908ab4dd6 [AArch64] Extend the number of scalar instructions supported in the AdvSIMD
scalar integer instruction pass.

This is a patch I had lying around from a few months ago.  The pass is
currently disabled by default, so nothing to interesting.

llvm-svn: 214779
2014-08-04 21:20:25 +00:00
Kevin Qin f31ecf3fea Revert "r214669 - MachineCombiner Pass for selecting faster instruction"
This commit broke "make check" for several hours, so get it reverted.

llvm-svn: 214697
2014-08-04 05:10:33 +00:00
Gerolf Hoflehner 35ba467122 MachineCombiner Pass for selecting faster instruction
sequence -  AArch64 target support

 This patch turns off madd/msub generation in the DAGCombiner and generates
 them in the MachineCombiner instead. It replaces the original code sequence
 with the combined sequence when it is beneficial to do so.

 When there is no machine model support it always generates the madd/msub
 instruction. This is true also when the objective is to optimize for code
 size: when the combined sequence is shorter is always chosen and does not
 get evaluated.

 When there is a machine model the combined instruction sequence
 is evaluated for critical path and resource length using machine
 trace metrics and the original code sequence is replaced when it is
 determined to be faster.

 rdar://16319955

llvm-svn: 214669
2014-08-03 22:03:40 +00:00
Juergen Ributzka 5dcb33bdbb [FastISel][AArch64] Fold offset into the memory operation.
Fold simple offsets into the memory operation:
  add x0, x0, #8
  ldr x0, [x0]
-->
  ldr x0, [x0, #8]

Fixes <rdar://problem/17887945>.

llvm-svn: 214545
2014-08-01 19:40:16 +00:00
Juergen Ributzka 50a4005e35 [FastISel][AArch64] Add branch weights.
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).

Fixes <rdar://problem/17887137>.

llvm-svn: 214537
2014-08-01 18:39:24 +00:00
Renato Golin 541d7e747a Add missing breaks to AArch64InstrInfo::isGPRCopy
llvm-svn: 214528
2014-08-01 17:27:31 +00:00
Chad Rosier 579c02c9a5 [AArch64] Generate tbz/tbnz when comparing against zero.
The tbz/tbnz checks the sign bit to convert

op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

to

op w1, w1, w10
tbnz w1, #31, .LBB0_0

Differential Revision: http://reviews.llvm.org/D4440

llvm-svn: 214518
2014-08-01 14:48:56 +00:00
Juergen Ributzka 82ecc7ff2a [FastISel][AArch64] Fix the immediate versions of the {s|u}{add|sub}.with.overflow intrinsics.
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.

Also update the test cases to test the immediate versions.

llvm-svn: 214470
2014-08-01 01:25:55 +00:00
Louis Gerbarg 67474e3755 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Aaron Ballman ed9fabd255 Fixing an -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended.
llvm-svn: 214400
2014-07-31 12:58:50 +00:00
Juergen Ributzka c537bd2da4 [FastISel][AArch64] Add basic bitcast support for conversion between float and int.
Fixes <rdar://problem/17867078>.

llvm-svn: 214389
2014-07-31 06:25:37 +00:00
Juergen Ributzka 130e77e431 [FastISel][AArch64] Add sqrt intrinsic support.
Fixes <rdar://problem/17867067>.

llvm-svn: 214388
2014-07-31 06:25:33 +00:00
Juergen Ributzka 052e6c289b [FastISel][AArch64] Add MachO large code model support for function calls.
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.

Related to <rdar://problem/17733076>.

llvm-svn: 214381
2014-07-31 04:10:40 +00:00
Juergen Ributzka 39032673da [FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering.
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.

llvm-svn: 214366
2014-07-31 00:11:11 +00:00
Juergen Ributzka 3771fbb2f5 [FastISel][AArch64] Add select folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.

This also updates and enables the XALU unit tests for FastISel.

This fixes <rdar://problem/17831117>.

llvm-svn: 214350
2014-07-30 22:04:37 +00:00
Juergen Ributzka ad2109a949 [FastISel][AArch64] Add branch folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a branch instruction.

This is related to <rdar://problem/17831117>.

llvm-svn: 214349
2014-07-30 22:04:34 +00:00
Juergen Ributzka d43da7548c [FastISel][AArch64] Add {s|u}{add|sub|mul}.with.overflow intrinsic support.
This commit adds support for the {s|u}{add|sub|mul}.with.overflow intrinsics.
The unit tests for FastISel will be enabled in a later commit, once there is
also branch and select folding support.

This is related to <rdar://problem/17831117>.

llvm-svn: 214348
2014-07-30 22:04:31 +00:00
Juergen Ributzka ad3b09037d [FastISel][AArch64] Create helper functions to create the various multiplies on AArch64.
llvm-svn: 214346
2014-07-30 22:04:25 +00:00
Juergen Ributzka a75cb11f14 [FastISel][AArch64] Add support for shift-immediate.
Currently the shift-immediate versions are not supported by tblgen and
hopefully this can be later removed, once the required support has been
added to tblgen.

llvm-svn: 214345
2014-07-30 22:04:22 +00:00
Jiangning Liu cd296378a7 Implement AArch64 TTI interface isAsCheapAsAMove.
llvm-svn: 214159
2014-07-29 02:09:26 +00:00
Matt Arsenault 6f2a526101 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Tim Northover 2c46beb0d1 AArch64: fix conversion of 'J' inline asm constraints.
'J' represents a negative number suitable for an add/sub alias
instruction, but while preparing it to become an int64_t we were
mangling the sign extension. So "i32 -1" became 0xffffffffLL, for
example.

Should fix one half of PR20456.

llvm-svn: 214052
2014-07-27 07:10:29 +00:00
Akira Hatanaka e5b6e0d231 [stack protector] Fix a potential security bug in stack protector where the
address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>

llvm-svn: 213967
2014-07-25 19:31:34 +00:00
Juergen Ributzka 5d6c43e294 [FastISel][AArch64] Add support for frameaddress intrinsic.
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.

There were two test cases that pretty much tested the same, so I combined them
to a single test case.

Fixes <rdar://problem/17811834>

llvm-svn: 213959
2014-07-25 17:47:14 +00:00
Benjamin Kramer 1f8930e3d3 Run sort_includes.py on the AArch64 backend.
No functionality change.

llvm-svn: 213938
2014-07-25 11:42:14 +00:00
Tim Northover 7324e845a4 AArch64: refactor ReconstructShuffle function
Quite a bit of cruft had accumulated as we realised the various different cases
it had to handle and squeezed them in where possible. This refactoring mostly
flattens the logic and special-cases. The result is slightly longer, but I
think clearer.

Should be no functionality change.

llvm-svn: 213867
2014-07-24 15:39:55 +00:00
NAKAMURA Takumi 8d745ca7cc Prune redundant libdeps.
llvm-svn: 213857
2014-07-24 11:45:27 +00:00
NAKAMURA Takumi 9c3bd7618a Update library dependencies.
llvm-svn: 213832
2014-07-24 02:10:42 +00:00
Kevin Qin 9a2a2c502b [AArch64] Fix a bug generating incorrect instruction when building small vector.
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.

llvm-svn: 213830
2014-07-24 02:05:42 +00:00
Jiangning Liu 451f30e89f [AArch64] Disable some optimization cases for type conversion from sint to fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file.
llvm-svn: 213827
2014-07-24 01:29:59 +00:00
Rafael Espindola 5addace56d Finish inverting the MC -> Object dependency.
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.

llvm-svn: 213808
2014-07-23 22:26:07 +00:00
Jim Grosbach 724e438c62 [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.

llvm-svn: 213800
2014-07-23 20:41:43 +00:00
Jim Grosbach 8f6f0858ec X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.

llvm-svn: 213799
2014-07-23 20:41:38 +00:00
Juergen Ributzka 1b014504ab [FastISel][AArch64] Fix return type in FastLowerCall.
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.

Reduced test case from Chad. Thanks.

llvm-svn: 213788
2014-07-23 20:03:13 +00:00
Chad Rosier 17020f96c7 [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Tim Northover 35910d7fa8 AArch64: remove "arm64_be" support in favour of "aarch64_be".
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".

llvm-svn: 213748
2014-07-23 12:58:11 +00:00
Tim Northover e19bed7d33 AArch64: remove arm64 triple enumerator.
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.

We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.

llvm-svn: 213743
2014-07-23 12:32:47 +00:00
Juergen Ributzka 2581fa505f [FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks.
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.

This enables patchpoint intrinsic lowering for AArch64.

This fixes <rdar://problem/17733076>

llvm-svn: 213704
2014-07-22 23:14:58 +00:00
Tim Northover f7a02c1762 CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:

  fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
  fpext -> FP16_TO_FP (if f16 illegal) -> libcall

Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).

As a result we can remove some redundant AArch64 patterns.

llvm-svn: 213507
2014-07-21 09:13:56 +00:00
David Peixotto ae5ba76221 MC: support different sized constants in constant pools
On AArch64 the pseudo instruction ldr <reg>, =... supports both
32-bit and 64-bit constants. Add support for 64 bit constants for
the pools to support the pseudo instruction fully.

Changes the AArch64 ldr-pseudo tests to use 32-bit registers and
adds tests with 64-bit registers.

Patch by Janne Grunau!

Differential Revision: http://reviews.llvm.org/D4279

llvm-svn: 213387
2014-07-18 16:05:14 +00:00
Tim Northover f8bfe21fad AArch64: implement efficient f16 bitcasts
Because i16 is illegal, there's no native DAG method to
represent a bitcast to or from an f16 type. This meant LLVM was
inserting a stack store/load pair which is really not ideal.

llvm-svn: 213378
2014-07-18 13:07:05 +00:00