Commit Graph

818 Commits

Author SHA1 Message Date
Juergen Ributzka 0781b860e4 [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.

This is related to <rdar://problem/18027157>.

llvm-svn: 216040
2014-08-20 01:10:36 +00:00
Juergen Ributzka c0886dd5b0 [FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding.
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.

This fixes <rdar://problem/17913111>.

llvm-svn: 216033
2014-08-19 22:29:55 +00:00
Juergen Ributzka 6afeffb078 [FastISel][AArch64] Extend floating-point materialization test.
This adds the missing test that I promised for r215753 to test the
materialization of the floating-point value +0.0.

Related to <rdar://problem/18027157>.

llvm-svn: 216019
2014-08-19 20:35:07 +00:00
Juergen Ributzka b46ea081ad Reapply [FastISel][AArch64] Add support for more addressing modes (r215597).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 216013
2014-08-19 19:44:17 +00:00
Juergen Ributzka 7e23f77d82 Reapply [FastISel][AArch64] Make use of the zero register when possible (r215591).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 216009
2014-08-19 19:44:02 +00:00
Juergen Ributzka 5460cbfda4 [FastISel][AArch64] Fix a few BuildMI callsites where the result register was added as an operand register.
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.

Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).

llvm-svn: 215997
2014-08-19 17:41:53 +00:00
Oliver Stannard f5469bec97 Teach the AArch64 backend to handle f16
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.

llvm-svn: 215891
2014-08-18 14:22:39 +00:00
Oliver Stannard 12993dd916 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890
2014-08-18 12:42:15 +00:00
Amara Emerson 82da7d07a1 [AArch64] Narrow arguments passed in wrong position on the stack in
big-endian mode.

Patch by Asiri Rathnayake.

Differential Revision: http://reviews.llvm.org/D4922

llvm-svn: 215716
2014-08-15 14:29:57 +00:00
Juergen Ributzka 790bacf232 Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673
2014-08-14 19:56:28 +00:00
Juergen Ributzka 34ed422c42 Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

llvm-svn: 215659
2014-08-14 17:10:54 +00:00
Akira Hatanaka b74db09c97 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>

llvm-svn: 215600
2014-08-13 23:23:58 +00:00
Juergen Ributzka 98347d902e [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 215597
2014-08-13 22:53:29 +00:00
Juergen Ributzka 24080d60fa [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 215591
2014-08-13 22:13:14 +00:00
Gerolf Hoflehner eb90500d06 [MachineCombiner] Fix for ICE bug 20598
The combiner ignored DBG nodes when checking
the uses of a virtual register.

It combined a sequence like
   %vreg1 = madd %vreg2, %vreg3,...
   DBG_VALUE (%vreg1 ...)
   %vreg4 = add %vreg1,...
to
  %vreg4 = madd %vreg2, %vreg3

leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.

llvm-svn: 215431
2014-08-12 07:54:12 +00:00
Quentin Colombet c64c175173 [AArch64] Fix registerAllocator assigns same register for base and wback in
pre/post-index load and store.

Patch by Steven Wu <stevenwu@apple.com>

llvm-svn: 215390
2014-08-11 21:39:53 +00:00
Jiangning Liu dd6e12d71c In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the 
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.

llvm-svn: 215344
2014-08-11 05:17:19 +00:00
Jiangning Liu dcc651f99f [AArch64] Fix a type conversion bug for anlyzing compare.
The bug can cause spec2006/483.xalancbmk failure.

Patched by David Xu.

llvm-svn: 215206
2014-08-08 14:19:29 +00:00
James Molloy 3feea9c11a [AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.

This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.

Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).

This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.

llvm-svn: 215199
2014-08-08 12:33:21 +00:00
Tim Northover 0f18ff9817 AArch64: stop trying to take control of all UnknownArch triples.
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).

Should fix PR20567.

llvm-svn: 215191
2014-08-08 08:27:44 +00:00
Akira Hatanaka 5acc58fcfb [stack protector] Look through bitcasts to get global variable
__stack_chk_guard.

Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.

%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)

Original test case provided by Ana Pazos.

This fixes PR20558.

llvm-svn: 215167
2014-08-07 23:08:24 +00:00
Gerolf Hoflehner 97c383bc36 MachineCombiner Pass for selecting faster instruction sequence on AArch64
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers

The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp

llvm-svn: 215151
2014-08-07 21:40:58 +00:00
James Molloy 99917946da [AArch64] Add a testcase for r214957.
llvm-svn: 214965
2014-08-06 13:31:32 +00:00
Yi Kong e56de69500 AArch64: Add support for instruction prefetch intrinsic
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.

Differential Revision: http://reviews.llvm.org/D4777

llvm-svn: 214860
2014-08-05 12:46:47 +00:00
Juergen Ributzka a126d1ef3c [FastISel][AArch64] Implement the FastLowerArguments hook.
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.

This should cover most of the trivial cases without falling back to
SelectionDAG.

This fixes <rdar://problem/17890986>.

llvm-svn: 214846
2014-08-05 05:43:48 +00:00
Kevin Qin ec100526e3 Revert "r214832 - MachineCombiner Pass for selecting faster instruction"
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.

llvm-svn: 214845
2014-08-05 05:43:47 +00:00
Juergen Ributzka 51f5326e25 [FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.
llvm-svn: 214844
2014-08-05 05:43:44 +00:00
Gerolf Hoflehner 4dbf44b9d8 MachineCombiner Pass for selecting faster instruction
sequence on AArch64

Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.

llvm-svn: 214832
2014-08-05 01:16:13 +00:00
Juergen Ributzka 53533e885a [FastISel][AArch64] Fix shift lowering for i8 and i16 value types.
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.

The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.

The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.

This fixes <rdar://problem/17907720>.

llvm-svn: 214788
2014-08-04 21:49:51 +00:00
Chad Rosier 5908ab4dd6 [AArch64] Extend the number of scalar instructions supported in the AdvSIMD
scalar integer instruction pass.

This is a patch I had lying around from a few months ago.  The pass is
currently disabled by default, so nothing to interesting.

llvm-svn: 214779
2014-08-04 21:20:25 +00:00
Kevin Qin f31ecf3fea Revert "r214669 - MachineCombiner Pass for selecting faster instruction"
This commit broke "make check" for several hours, so get it reverted.

llvm-svn: 214697
2014-08-04 05:10:33 +00:00
Gerolf Hoflehner 35ba467122 MachineCombiner Pass for selecting faster instruction
sequence -  AArch64 target support

 This patch turns off madd/msub generation in the DAGCombiner and generates
 them in the MachineCombiner instead. It replaces the original code sequence
 with the combined sequence when it is beneficial to do so.

 When there is no machine model support it always generates the madd/msub
 instruction. This is true also when the objective is to optimize for code
 size: when the combined sequence is shorter is always chosen and does not
 get evaluated.

 When there is a machine model the combined instruction sequence
 is evaluated for critical path and resource length using machine
 trace metrics and the original code sequence is replaced when it is
 determined to be faster.

 rdar://16319955

llvm-svn: 214669
2014-08-03 22:03:40 +00:00
James Molloy 6b999ae682 Update test to use a more modern AArch64 triple, as requested by Renato.
llvm-svn: 214637
2014-08-02 17:15:11 +00:00
James Molloy ce45be0465 [AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available.
The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!

llvm-svn: 214634
2014-08-02 14:51:24 +00:00
Juergen Ributzka 5dcb33bdbb [FastISel][AArch64] Fold offset into the memory operation.
Fold simple offsets into the memory operation:
  add x0, x0, #8
  ldr x0, [x0]
-->
  ldr x0, [x0, #8]

Fixes <rdar://problem/17887945>.

llvm-svn: 214545
2014-08-01 19:40:16 +00:00
Juergen Ributzka 50a4005e35 [FastISel][AArch64] Add branch weights.
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).

Fixes <rdar://problem/17887137>.

llvm-svn: 214537
2014-08-01 18:39:24 +00:00
Chad Rosier 4d71a4e2c6 [AArch64] Fix test from r214518 in an attempt to appease buildbots.
llvm-svn: 214521
2014-08-01 15:30:41 +00:00
Chad Rosier 579c02c9a5 [AArch64] Generate tbz/tbnz when comparing against zero.
The tbz/tbnz checks the sign bit to convert

op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

to

op w1, w1, w10
tbnz w1, #31, .LBB0_0

Differential Revision: http://reviews.llvm.org/D4440

llvm-svn: 214518
2014-08-01 14:48:56 +00:00
Juergen Ributzka 82ecc7ff2a [FastISel][AArch64] Fix the immediate versions of the {s|u}{add|sub}.with.overflow intrinsics.
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.

Also update the test cases to test the immediate versions.

llvm-svn: 214470
2014-08-01 01:25:55 +00:00
Juergen Ributzka c537bd2da4 [FastISel][AArch64] Add basic bitcast support for conversion between float and int.
Fixes <rdar://problem/17867078>.

llvm-svn: 214389
2014-07-31 06:25:37 +00:00
Juergen Ributzka 130e77e431 [FastISel][AArch64] Add sqrt intrinsic support.
Fixes <rdar://problem/17867067>.

llvm-svn: 214388
2014-07-31 06:25:33 +00:00
Juergen Ributzka a80dd08b56 [FastISel][AArch64] Update and enable patchpoint and stackmap intrinsic tests for FastISel.
This commit updates the existing SelectionDAG tests for the stackmap and patchpoint
intrinsics and enables FastISel testing. It also splits up the tests into separate
files, due to different codegen between SelectionDAG and FastISel.

llvm-svn: 214382
2014-07-31 04:10:43 +00:00
Juergen Ributzka 052e6c289b [FastISel][AArch64] Add MachO large code model support for function calls.
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.

Related to <rdar://problem/17733076>.

llvm-svn: 214381
2014-07-31 04:10:40 +00:00
Juergen Ributzka 3771fbb2f5 [FastISel][AArch64] Add select folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.

This also updates and enables the XALU unit tests for FastISel.

This fixes <rdar://problem/17831117>.

llvm-svn: 214350
2014-07-30 22:04:37 +00:00
Juergen Ributzka a75cb11f14 [FastISel][AArch64] Add support for shift-immediate.
Currently the shift-immediate versions are not supported by tblgen and
hopefully this can be later removed, once the required support has been
added to tblgen.

llvm-svn: 214345
2014-07-30 22:04:22 +00:00
Jiangning Liu cd296378a7 Implement AArch64 TTI interface isAsCheapAsAMove.
llvm-svn: 214159
2014-07-29 02:09:26 +00:00
Tim Northover 2c46beb0d1 AArch64: fix conversion of 'J' inline asm constraints.
'J' represents a negative number suitable for an add/sub alias
instruction, but while preparing it to become an int64_t we were
mangling the sign extension. So "i32 -1" became 0xffffffffLL, for
example.

Should fix one half of PR20456.

llvm-svn: 214052
2014-07-27 07:10:29 +00:00
Akira Hatanaka e5b6e0d231 [stack protector] Fix a potential security bug in stack protector where the
address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>

llvm-svn: 213967
2014-07-25 19:31:34 +00:00
Juergen Ributzka 5d6c43e294 [FastISel][AArch64] Add support for frameaddress intrinsic.
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.

There were two test cases that pretty much tested the same, so I combined them
to a single test case.

Fixes <rdar://problem/17811834>

llvm-svn: 213959
2014-07-25 17:47:14 +00:00
Chandler Carruth 9f4530b95d [SDAG] Introduce a combined set to the DAG combiner which tracks nodes
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.

This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.

This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).

I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.

With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).

Differential Revision: http://reviews.llvm.org/D4638

llvm-svn: 213898
2014-07-24 22:15:28 +00:00
Kevin Qin 9a2a2c502b [AArch64] Fix a bug generating incorrect instruction when building small vector.
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.

llvm-svn: 213830
2014-07-24 02:05:42 +00:00
Jiangning Liu 451f30e89f [AArch64] Disable some optimization cases for type conversion from sint to fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file.
llvm-svn: 213827
2014-07-24 01:29:59 +00:00
Jim Grosbach d3c7942f4a Use an explicit triple in testcase.
Make the test work better on non-darwin hosts. Hopefully.

llvm-svn: 213801
2014-07-23 20:46:32 +00:00
Jim Grosbach 724e438c62 [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.

llvm-svn: 213800
2014-07-23 20:41:43 +00:00
Jim Grosbach 8f6f0858ec X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.

llvm-svn: 213799
2014-07-23 20:41:38 +00:00
Juergen Ributzka 1b014504ab [FastISel][AArch64] Fix return type in FastLowerCall.
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.

Reduced test case from Chad. Thanks.

llvm-svn: 213788
2014-07-23 20:03:13 +00:00
Chad Rosier 17020f96c7 [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Tim Northover 35910d7fa8 AArch64: remove "arm64_be" support in favour of "aarch64_be".
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".

llvm-svn: 213748
2014-07-23 12:58:11 +00:00
Juergen Ributzka 2581fa505f [FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks.
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.

This enables patchpoint intrinsic lowering for AArch64.

This fixes <rdar://problem/17733076>

llvm-svn: 213704
2014-07-22 23:14:58 +00:00
Juergen Ributzka 88f6faf1f4 [AArch64] Use CHECK-LABEL in ARM64 ABI unit tests.
llvm-svn: 213703
2014-07-22 23:14:54 +00:00
Tim Northover f7a02c1762 CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:

  fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
  fpext -> FP16_TO_FP (if f16 illegal) -> libcall

Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).

As a result we can remove some redundant AArch64 patterns.

llvm-svn: 213507
2014-07-21 09:13:56 +00:00
Tim Northover f8bfe21fad AArch64: implement efficient f16 bitcasts
Because i16 is illegal, there's no native DAG method to
represent a bitcast to or from an f16 type. This meant LLVM was
inserting a stack store/load pair which is really not ideal.

llvm-svn: 213378
2014-07-18 13:07:05 +00:00
Tim Northover b94f0859e5 AArch64: support f16 extend/trunc operations.
llvm-svn: 213375
2014-07-18 13:01:31 +00:00
Tim Northover 20bd0ced30 CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added
when it's needed.  But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.

Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.

llvm-svn: 213372
2014-07-18 12:41:46 +00:00
Jim Grosbach f7502c4884 AArch64: Constant fold converting vector setcc results to float.
Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
  UNARYOP(AND(VECTOR_CMP(x,y), constant))
      --> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).

This implements the transform where UNARYOP is [su]int_to_fp.

For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
  %cmp = fcmp oeq <4 x float> %val, %test
  %ext = zext <4 x i1> %cmp to <4 x i32>
  %result = sitofp <4 x i32> %ext to <4 x float>
  ret <4 x float> %result
}

Before this change, the code is generated as:
  fcmeq.4s  v0, v0, v1
  movi.4s v1, #0x1        // Integer splat value.
  and.16b v0, v0, v1      // Mask lanes based on the comparison.
  scvtf.4s  v0, v0        // Convert each lane to f32.
  ret

After, the code is improved to:
  fcmeq.4s  v0, v0, v1
  fmov.4s v1, #1.00000000 // f32 splat value.
  and.16b v0, v0, v1      // Mask lanes based on the comparison.
  ret

The svvtf.4s has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via fmov.4s.

Rather than do the folding manually in the target code, teach getNode()
in the generic SelectionDAG to handle folding constant operands of
vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do
additional constant folding there as well, but I don't have test cases
for those operations, so leaving them for another time when it becomes
appropriate.

rdar://17693791

llvm-svn: 213341
2014-07-18 00:40:52 +00:00
Tim Northover fd7e424935 CodeGen: extend f16 conversions to permit types > float.
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.

During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.

Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.

llvm-svn: 213248
2014-07-17 10:51:23 +00:00
Yi Kong 2355066e43 Port memory barriers intrinsics to AArch64
Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to
implement their corresponding ACLE and MSVC intrinsics.

This patch ports ARM dmb, dsb, isb intrinsic to AArch64.

Differential Revision: http://reviews.llvm.org/D4520

llvm-svn: 213247
2014-07-17 10:50:20 +00:00
Tim Northover e4b8e138e1 AArch64: fall back to generic code for out of range extract/insert.
rdar://problem/17624784

llvm-svn: 213059
2014-07-15 10:00:26 +00:00
Tim Northover 3cb24110b1 AArch64: remove unnecessary pseudo-instruction.
Sufficiently twisted use of TableGen lets us write patterns directly for f16
(as an i16 promoted to i32) -> f32 conversion.

llvm-svn: 212933
2014-07-14 11:16:02 +00:00
Saleem Abdulrasool f74d48a011 AArch64: add support for llvm.aarch64.hint intrinsic
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.

Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection.  The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.

llvm-svn: 212883
2014-07-12 21:20:49 +00:00
Oliver Stannard 6eda6ffc0c ARM: Allow __fp16 as a function arg or return type for AArch64
ACLE 2.0 allows __fp16 to be used as a function argument or return
type. This enables this for AArch64.

llvm-svn: 212812
2014-07-11 13:33:46 +00:00
Tim Northover fee2adefba AArch64: correctly fast-isel i8 & i16 multiplies
We were asking for a register for type i8 or i16 which caused an assert.

rdar://problem/17620015

llvm-svn: 212718
2014-07-10 14:18:46 +00:00
Hao Liu 71224b02fb [AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector.
llvm-svn: 212677
2014-07-10 03:41:50 +00:00
Jim Grosbach 34cc92b475 AArch64: Better codegen for storing to __fp16.
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.

rdar://17594379

llvm-svn: 212638
2014-07-09 18:55:52 +00:00
Jim Grosbach 04691a530d AArch64: Better codegen for loading from __fp16.
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.

rdar://17594379

llvm-svn: 212573
2014-07-08 23:28:48 +00:00
Louis Gerbarg 4c5b4054b2 Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:

  typedef unsigned int uint128_t __attribute__((mode(TI)));
  typedef int sint128_t __attribute__((mode(TI)));

This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.

Tests included.

rdar://17516686

llvm-svn: 212492
2014-07-07 21:37:51 +00:00
Tim Northover 55beb64bd0 CodeGen: it turns out that NAND is not the same thing as BIC. At all.
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.

So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.

llvm-svn: 212443
2014-07-07 09:06:35 +00:00
Kevin Qin 4473c1943f [AArch64] Normalize all constants to build a vector.
The value of constant operands will be truncated to fit element width.

llvm-svn: 212428
2014-07-07 02:45:40 +00:00
Chandler Carruth 395421fd98 [aarch64] Add a test that should have been in r212242 but I forgot to
add it. Sorry about that.

llvm-svn: 212251
2014-07-03 02:12:26 +00:00
Chandler Carruth 9d010fffe1 [codegen,aarch64] Add a target hook to the code generator to control
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.

This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.

This also provides a foundation for other targets to have more granular
control over how vector types are legalized.

Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]

http://reviews.llvm.org/D4322

llvm-svn: 212242
2014-07-03 00:23:43 +00:00
Duncan P. N. Exon Smith de58870394 AArch64: Re-enable AArch64AddressTypePromotion
This reverts commits r212189 and r212190.

While this pass was accidentally disabled (until r212073), r205437
slipped in a use of `auto` that should have been `auto&`.

This fixes PR20188.

llvm-svn: 212201
2014-07-02 18:17:40 +00:00
Duncan P. N. Exon Smith 292fa19077 XFAIL the test to go with r202189
llvm-svn: 212190
2014-07-02 17:07:03 +00:00
Chad Rosier aba845e835 Revert "Revert "MachineScheduler: better book-keeping for asserts.""
This reverts commit r212109, which reverted r212088.

However, disable the assert as it's not necessary for correctness.  There are
several corner cases that the assert needed to handle better for in-order
scheduling, but none of them are incorrect scheduler behavior. The assert is
mainly there to collect good unit tests like this and ensure that the
target-independent scheduler is working as expected with the various machine
models.

llvm-svn: 212187
2014-07-02 16:46:08 +00:00
Chad Rosier f575a73751 Revert "MachineScheduler: better book-keeping for asserts."
This reverts commit r212088, which is causing a number of spec
failures.  Will provide reduced test cases shortly.
PR20057

llvm-svn: 212109
2014-07-01 17:23:11 +00:00
Andrew Trick f1b307bcb0 MachineScheduler: better book-keeping for asserts.
Fixes another test case under PR20057.

llvm-svn: 212088
2014-07-01 03:23:13 +00:00
Duncan P. N. Exon Smith 7d7ae93139 AArch64: Actually do address type promotion
AArch64AddressTypePromotion was doing nothing because it was using the
old semantics of `Use` and `uses()`, when it really wanted to get at the
`users()`.

llvm-svn: 212073
2014-06-30 23:42:14 +00:00
Chad Rosier 304fe3ff71 [AArch64] Unsized types don't specify an alignment.
PR20109

llvm-svn: 212045
2014-06-30 15:03:00 +00:00
Chad Rosier e6b8761ab9 [AArch64] Convert mul x, -(pow2 +/- 1) to shift + add/sub.
The combine for mul x, pow2 +/- 1 is unchanged. Test cases for
both combines as well as mul x, pow2 have been added as well.

llvm-svn: 212044
2014-06-30 14:51:14 +00:00
Chad Rosier 5235973ee0 [AArch64] Fix memset ICE when memset value is f128.
llvm-svn: 211960
2014-06-27 21:05:09 +00:00
Andrew Trick 5632722cab MachineScheduler: add some book-keeping to fix an assert.
Fixe for Bug 20057 - Assertion failied in llvm::SUnit* llvm::SchedBoundary::pickOnlyChoice(): Assertion `i <= (HazardRec->getMaxLookAhead() + MaxObservedStall) && "permanent hazard"'

Thanks to Chad for the test case.

llvm-svn: 211865
2014-06-27 04:57:05 +00:00
Weiming Zhao b1d4dbdcc7 Resubmit commit r211533
"Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64"
Missed files are added in this commit.

llvm-svn: 211605
2014-06-24 16:21:38 +00:00
Kevin Qin 93d45ecdbf [AArch64] Fix a build_vector pattern match fail
caused by defect in isBuildVectorAllZeros().

llvm-svn: 211567
2014-06-24 05:37:27 +00:00
Arnold Schwaighofer fc308f5c9f Add a triple so that right syntax is choosen on mac osx systems
llvm-svn: 211188
2014-06-18 17:20:49 +00:00
Kevin Qin f0ec9aff2a [AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.
ReconstructShuffle() may wrongly creat a CONCAT_VECTOR trying to
concat 2 of v2i32 into v4i16. This commit is to fix this issue and
try to generate UZP1 instead of lots of MOV and INS.
Patch is initalized by Kevin Qin, and refactored by Tim Northover.

llvm-svn: 211144
2014-06-18 05:54:42 +00:00
Tim Northover d5531f72dc AArch64: estimate inline asm length during branch relaxation
To make sure branches are in range, we need to do a better job of estimating
the length of an inline assembly block than "it's probably 1 instruction, who'd
write asm with more than that?".

Fortunately there's already a (highly suspect, see how many ways you can think
of to break it!) callback for this purpose, which is used by the other targets.

rdar://problem/17277590

llvm-svn: 211095
2014-06-17 11:31:42 +00:00
James Molloy 1e3b5a49e1 [AArch64] Fix a fencepost error in lowering for llvm.aarch64.neon.uqshl.
Patch by Jiangning Liu!

llvm-svn: 211014
2014-06-16 10:39:21 +00:00
Tim Northover dbecc3b3fc AArch64: improve handling & modelling of FP_TO_XINT nodes.
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.

llvm-svn: 210985
2014-06-15 09:27:15 +00:00
Tim Northover ef0d760cd9 AArch64: improve vector [su]itofp handling.
This somehow got missed in the AArch64 merge, so should fix a
performance regression since 3.4.

llvm-svn: 210984
2014-06-15 09:27:06 +00:00
Jiangning Liu 96e92c1d75 Move GlobalMerge from Transform to CodeGen.
This patch is to move GlobalMerge pass from Transform/Scalar                                                           
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.

llvm-svn: 210951
2014-06-13 22:57:59 +00:00
Tim Northover 420a216817 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

llvm-svn: 210903
2014-06-13 14:24:07 +00:00
Chad Rosier 2205d4ef05 [AArch64] Basic Sched Model for Cortex-A57.
Patch by Dave Estes<cestes@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D4008

llvm-svn: 210705
2014-06-11 21:06:56 +00:00
Jiangning Liu b2ae37fb67 Global merge for global symbols.
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.

llvm-svn: 210640
2014-06-11 06:44:53 +00:00
Chad Rosier d863ae39d1 [AArch64] Emit .ident compiler version attribute.
Patch by Ana Pazos<apazos@codeaurora.org>!

llvm-svn: 210535
2014-06-10 14:32:08 +00:00
Tim Northover 9ffd0b020f AArch64: disallow x30 & x29 as the destination for indirect tail calls
As Ana Pazos pointed out, these have to be restored to their incoming values
before a function returns; i.e. before the tail call. So they can't be used
correctly as the destination register.

llvm-svn: 210525
2014-06-10 10:50:24 +00:00
Tim Northover c141ad4b75 AArch64: teach FastISel how to handle offset FrameIndices
Previously we were abandonning the attempt, leading to some combination of
extra work (when selection of a load/store fails completely) and inferior code
(when this leads to a real memcpy call instead of inlining).

rdar://problem/17187463

llvm-svn: 210520
2014-06-10 09:52:44 +00:00
Tim Northover c19445d07a AArch64: make FastISel memcpy emission more robust.
We were hitting an assert if FastISel couldn't create the load or store we
requested. Currently this happens for large frame-local addresses, though
CodeGen could be improved there.

rdar://problem/17187463

llvm-svn: 210519
2014-06-10 09:52:40 +00:00
Alp Toker d3d017cf00 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Chad Rosier 3fe0c876c4 [AArch64] Fix the ordering of the accumulate operand in SchedRW list.
Patch by Dave Estes <cestes@codeaurora.org>
http://reviews.llvm.org/D4037

llvm-svn: 210446
2014-06-09 01:54:00 +00:00
Chad Rosier d96e9f14ee [AArch64] When combining constant mul of power of 2 plus/minus 1, prefer shift
plus add.  The shift can be folded into the add.  This only effects codegen
when the constant is 3.

llvm-svn: 210445
2014-06-09 01:25:51 +00:00
Tilmann Scheller 2a7efeb7b7 [AArch64] Add regression tests for the load/store optimizer which cover post-index update folding with sub rather than add.
The tests check that the following transform happens:

  (ldr|str) X, [x20]
   ...
  sub x20, x20, #16
   ->
  (ldr|str) X, [x20], #-16

with X being either w0, x0, s0, d0 or q0.

llvm-svn: 210113
2014-06-03 16:03:00 +00:00
Tim Northover 6890add11d AArch64: mark small types (i1, i8, i16) as promoted
This means the output of LowerFormalArguments returns a lowered
SDValue with the correct type (expected in SelectionDAGBuilder).
Without this, an assertion under a DEBUG macro triggers when those
types are passed on the stack.

llvm-svn: 210102
2014-06-03 13:54:53 +00:00
Jiangning Liu cc4f38bc28 [AArch64] Correctly deal with VPR stack parameter passing.
llvm-svn: 210067
2014-06-03 03:25:09 +00:00
Tilmann Scheller 75fa6e5a23 [AArch64] Add some more regression tests for store pre-index update folding in the load/store optimizer.
Add tests for the following transform:

 add x8, x8, #16
  ...
 str X, [x8]
  ->
 str X, [x8, #16]!

with X being either w0, x0, s0, d0 or q0.

llvm-svn: 210021
2014-06-02 12:33:33 +00:00
Tilmann Scheller cfbacc8a65 [AArch64] Add some more regression tests for load pre-index update folding in the load/store optimizer.
Add tests for the following transform:

 add x8, x8, #16
  ...
 ldr X, [x8]
  ->
 ldr X, [x8, #16]!

with X being either w0, x0, s0, d0 or q0.

llvm-svn: 210018
2014-06-02 11:57:09 +00:00
Tim Northover b4ddc0845a ARM & AArch64: make use of common cmpxchg idioms after expansion
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.

When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.

This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.

I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.

rdar://problem/16227836

llvm-svn: 209883
2014-05-30 10:09:59 +00:00
Tim Northover ce6538c38d AArch64 & ARM: remove undefined behaviour from some tests.
llvm-svn: 209880
2014-05-30 08:59:55 +00:00
Hao Liu 5c7314b68d Test cases named with dates is a legacy rule not used now. Rename several test cases.
llvm-svn: 209877
2014-05-30 05:58:19 +00:00
Hao Liu d670c7eee0 Rename a test case to contain correct date info.
llvm-svn: 209799
2014-05-29 09:21:23 +00:00
Hao Liu 4091450181 Fix an assertion failure caused by v1i64 in DAGCombiner Shrink.
llvm-svn: 209798
2014-05-29 09:19:07 +00:00
Hal Finkel 2c77fe59d9 Revert "[DAGCombiner] Split up an indexed load if only the base pointer value is live"
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.

llvm-svn: 209747
2014-05-28 15:33:19 +00:00
Tilmann Scheller 7c747fc7c6 [AArch64] Add store post-index update folding regression tests for the load/store optimizer.
Add regression tests for the following transformation:

  str X, [x20]
   ...
  add x20, x20, #32
   ->
  str X, [x20], #32

with X being either w0, x0, s0, d0 or q0.

llvm-svn: 209715
2014-05-28 06:43:00 +00:00
Tilmann Scheller 35e451461f [AArch64] Add load post-index update folding regression tests for the load/store optimizer.
Add regression tests for the following transformation:

 ldr X, [x20]
  ...
 add x20, x20, #32
  ->
 ldr X, [x20], #32

 with X being either w0, x0, s0, d0 or q0.

llvm-svn: 209711
2014-05-28 05:44:14 +00:00
Tim Northover 9a6217b650 AArch64: add test for NZCV cross-copy save.
llvm-svn: 209665
2014-05-27 16:50:09 +00:00
Tim Northover de9402d345 AArch64: add AArch64-specific test for 'c' and 'n'.
llvm-svn: 209664
2014-05-27 16:50:03 +00:00
Tim Northover 68ae503de9 AArch64: force i1 to be zero-extended at an ABI boundary.
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:

1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
   and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
   to 8 bits. This goes against the spirit of "zeroext" I think, but
   it's a bit of a vague construct anyway (by definition you're going
   to extend to the amount required by the ABI, that's why it's the
   ABI!).

This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.

Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.

Should fix PR19850

llvm-svn: 209637
2014-05-26 17:22:07 +00:00
Tim Northover 47e003c65d AArch64: simplify calling conventions slightly.
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.

llvm-svn: 209636
2014-05-26 17:21:53 +00:00
Tilmann Scheller cc3ebc8a98 [AArch64] Add store + add folding regression tests for the load/store optimization pass.
Add tests for the following transform:

 str X, [x0, #32]
  ...
 add x0, x0, #32
  ->
 str X, [x0, #32]!

with X being either w1, x1, s0, d0 or q0.

llvm-svn: 209627
2014-05-26 13:36:47 +00:00
Tilmann Scheller 112ada831f [AArch64] Add more regression tests for the load/store optimization pass.
Cover the following cases:

  ldr X, [x0, #32]
   ...
  add x0, x0, #32
   ->
  ldr X, [x0, #32]!

with X being either w1, x1, s0, d0 or q0.

llvm-svn: 209624
2014-05-26 12:15:51 +00:00
Tilmann Scheller 968d599dc4 Remove accidentally committed whitespace.
llvm-svn: 209619
2014-05-26 09:40:40 +00:00
Tilmann Scheller 2d746bcda3 [AArch64] Add a regression test for the load store optimizer.
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.

As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.

llvm-svn: 209618
2014-05-26 09:37:19 +00:00
Tim Northover 3b0846e8f7 AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.

"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.

This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.

llvm-svn: 209577
2014-05-24 12:50:23 +00:00
Tim Northover cc08e1fe1b AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.

The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.

Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.

llvm-svn: 209576
2014-05-24 12:42:26 +00:00
Tim Northover 71d04225cf AArch64/ARM64: enable more AArch64 tests.
llvm-svn: 209408
2014-05-22 07:40:55 +00:00
Rafael Espindola 5a52b9f139 Revert "Implement global merge optimization for global variables."
This reverts commit r208934.

The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.

The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.

llvm-svn: 208978
2014-05-16 13:02:18 +00:00
Tim Northover 5896b066e6 TableGen: fix operand counting for aliases
TableGen has a fairly dubious heuristic to decide whether an alias should be
printed: does the alias have lest operands than the real instruction. This is
bad enough (particularly with no way to override it), but it should at least be
calculated consistently for both strings.

This patch implements that logic: first get the *correct* string for the
variant, in the same way as the Matcher, without guessing; then count the
number of whitespace chars.

There are basically 4 changes this brings about after the previous
commits; all of these appear to be good, so I have changed the tests:

+ ARM64: we print "neg X, Y" instead of "sub X, xzr, Y".
+ ARM64: we skip implicit "uxtx" and "uxtw" modifiers.
+ Sparc: we print "mov A, B" instead of "or %g0, A, B".
+ Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B"

llvm-svn: 208969
2014-05-16 09:42:04 +00:00
Jiangning Liu 932e1c3924 Implement global merge optimization for global variables.
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.

For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.

llvm-svn: 208934
2014-05-15 23:45:42 +00:00
Tim Northover 2509a3fc64 ARM64: print correct aliases for NEON mov & mvn instructions
In all cases, if a "mov" alias exists, it is the canonical form of the
instruction. Now that TableGen can support aliases containing syntax variants,
we can enable them and improve the quality of the asm output.

llvm-svn: 208874
2014-05-15 12:11:02 +00:00
Tim Northover d8d65a69cf TableGen/ARM64: print aliases even if they have syntax variants.
To get at least one use of the change (and some actual tests) in with its
commit, I've enabled the AArch64 & ARM64 NEON mov aliases.

llvm-svn: 208867
2014-05-15 11:16:32 +00:00
Jiangning Liu 09cc564310 [ARM64] Support aggressive fastcc/tailcallopt breaking ABI by popping out argument stack from callee.
llvm-svn: 208837
2014-05-15 01:33:17 +00:00
Tim Northover ee20caaf82 TableGen: use PrintMethods to print more aliases
llvm-svn: 208607
2014-05-12 18:04:06 +00:00
Tim Northover 88a51d983e AArch64/ARM64: optimise vector selects & enable test
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.

llvm-svn: 208210
2014-05-07 14:10:27 +00:00
James Molloy ccc7f982c1 [ARM64-BE] Make big endian (scalar) argument passing work correctly.
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.

llvm-svn: 208192
2014-05-07 11:28:36 +00:00
Tim Northover df723343fa AArch64/ARM64: run test on ARM64 too.
llvm-svn: 208188
2014-05-07 10:47:04 +00:00
Tim Northover 76a94e6ead AArch64/ARM64: put annotation in test
It makes finding already covered tests much easier with "grep -L
arm64".

llvm-svn: 208187
2014-05-07 10:47:00 +00:00
Renato Golin c7aea40ec6 Implememting named register intrinsics
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).

So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.

llvm-svn: 208104
2014-05-06 16:51:25 +00:00
Rafael Espindola 52dc5d828f Special case aliases in GlobalValue::getAlignment.
An alias has the address of what it points to, so it also has the same
alignment.

This allows a few optimizations to see past aliases for free.

llvm-svn: 208103
2014-05-06 16:48:58 +00:00
Tim Northover 534acbdf73 AArch64/ARM64: print BFM instructions as BFI or BFXIL
The canonical form of the BFM instruction is always one of the more explicit
extract or insert operations, which makes reading output much easier.

llvm-svn: 207752
2014-05-01 12:29:38 +00:00
Tim Northover 20ad359b77 AArch64/ARM64: use HS instead of CS & LO instead of CC.
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.

llvm-svn: 207644
2014-04-30 13:14:03 +00:00
Tim Northover 970c4a8d35 ARM64: use hex immediates for movz/movk instructions
Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to
piece together an immediate in 16-bit chunks, hex is probably the most
appropriate format.

llvm-svn: 207635
2014-04-30 11:19:40 +00:00
Tim Northover cfd6e66544 ARM64: print canonical syntax for add/sub (imm) instructions.
Since these instructions only accept a 12-bit immediate, possibly shifted left
by 12, the canonical syntax used by the architecture reference manual is "#N {,
lsl #12 }". We should accept an immediate that has already been shifted, (e.g.

Also, print a comment giving the full addend since it can be helpful.

llvm-svn: 207633
2014-04-30 11:19:15 +00:00
James Molloy 7c39df37b2 [ARM64] Ensure arm64_be is dealt with when emitting debug info.
This is a partial port of r204816 (cpirker "Elf support for MC-JIT
runtime dynamic linker") from AArch64 to ARM64.

llvm-svn: 207625
2014-04-30 10:15:35 +00:00
Benjamin Kramer e1ab3f062e AArch64: Mark vector long multiplication as expand.
There are no patterns for this. This was already fixed for ARM64 but I forgot
to apply it to AArch64 too.

llvm-svn: 207515
2014-04-29 09:37:54 +00:00
Bradley Smith 672df15122 [ARM64] Print preferred aliases for SFBM/UBFM in InstPrinter
llvm-svn: 207219
2014-04-25 10:25:29 +00:00
Kevin Qin 022d395c9c [ARM64] Add RUN lines for "–target arm64 –mattr=-fp-armv8" on AArch64 no-fp test.
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.

llvm-svn: 207215
2014-04-25 09:44:20 +00:00
Tim Northover 6331d4b975 AArch64: print NEON lists with a space.
This matches ARM64 behaviour, which I think is clearer. It also puts all the
churn from that difference into one easily ignored commit.

llvm-svn: 207116
2014-04-24 14:06:20 +00:00
Tim Northover 9b594d1163 AArch64/ARM64: port bitfield test to ARM64.
llvm-svn: 207103
2014-04-24 12:11:56 +00:00
Tim Northover eb6611e727 AArch64/ARM64: implement BFI optimisation
ARM64 was not producing pure BFI instructions for bitfield insertion
operations, unlike AArch64. The approach had to be a little different (in
ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but
hopefully this gives it similar power.

This should address PR19424.

llvm-svn: 207102
2014-04-24 12:11:53 +00:00
Tim Northover 1cb984fbcf AArch64/ARM64: port more tests
llvm-svn: 207101
2014-04-24 12:11:46 +00:00
Tim Northover 52d3283026 AArch64/ARM64: more testing from AArch64 to ARM64
llvm-svn: 206889
2014-04-22 12:45:47 +00:00
Tim Northover a962398a3f AArch64/ARM64: make use of ANDS and BICS instructions for comparisons.
llvm-svn: 206888
2014-04-22 12:45:42 +00:00
Tim Northover 31ebef86b8 AArch64/ARM64: add extra testing from AArch64 to ARM64
llvm-svn: 206887
2014-04-22 12:45:32 +00:00
Tim Northover 2b73e74238 AArch64/ARM64: enable various AArch64 tests on ARM64.
llvm-svn: 206877
2014-04-22 10:10:26 +00:00
Tim Northover 00b4ee848f AArch64/ARM64: add patterns for scalar_to_vector/extract pairs
llvm-svn: 206876
2014-04-22 10:10:18 +00:00
Tim Northover e74fb0d7b9 AArch64/ARM64: mark fmul intrinsic as commutative.
This gives DAG patterns matching indexed patterns where either side is an
indexed vector.

llvm-svn: 206875
2014-04-22 10:10:14 +00:00
Jiangning Liu 87486e0bac [AArch64] Enable global merge pass.
llvm-svn: 206861
2014-04-22 03:33:26 +00:00
Tim Northover ff046b64d9 AArch64/ARM64: add more NEON tests.
Mostly no testing this time, since they were just wrangling
target-specific intrinsics.

llvm-svn: 206613
2014-04-18 14:54:53 +00:00
Tim Northover be3941cc79 ARM64: add extra NEG pattern.
llvm-svn: 206609
2014-04-18 14:54:35 +00:00
Tim Northover 0dbdfb8522 AArch64/ARM64: port more AArch64 tests to ARM64.
llvm-svn: 206592
2014-04-18 13:16:55 +00:00
Tim Northover 01f315a556 AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.

llvm-svn: 206588
2014-04-18 12:50:58 +00:00
Tim Northover 66c36b814f AArch64/ARM64: port atomics test to ARM64.
Covers quite a few extra instructions (like any of the max/min ones
which were broken until recently on ARM64).

llvm-svn: 206575
2014-04-18 09:31:31 +00:00
Tim Northover a2c4c71c12 AArch64/ARM64: spot a greater variety of concat_vector operations.
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.

This also enables the AArch64 tests which inspired the previous
untested commits.

llvm-svn: 206574
2014-04-18 09:31:27 +00:00
Tim Northover 8b2fa3dfef AArch64/ARM64: emit all vector FP comparisons as such.
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.

More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.

llvm-svn: 206570
2014-04-18 09:31:07 +00:00
Tim Northover 0a44e66bb8 AArch64/ARM64: port BSL logic from AArch64 & enable test.
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.

Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.

llvm-svn: 206569
2014-04-18 09:31:01 +00:00
Tim Northover 547a4ae6fa AArch64/ARM64: copy byval implementation from AArch64.
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.

llvm-svn: 206568
2014-04-18 09:30:52 +00:00
Jiangning Liu 40d81e10c5 This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64.
Patched by Z.Zheng

llvm-svn: 206559
2014-04-18 05:58:09 +00:00
Jiangning Liu e56c30614f This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance.
Patched by Z. Zheng

llvm-svn: 206557
2014-04-18 03:58:38 +00:00
Tim Northover cb37ab2d9c AArch64/ARM64: port some NEON tests to ARM64
These ones used completely different sets of intrinsics, so the only way to do
it is create a separate ARM64 copy and change them all.

Other than that, CodeGen was straightforward, no deficiencies detected here.

llvm-svn: 206392
2014-04-16 15:28:02 +00:00
Tim Northover 46ecdf5a0f AArch64/ARM64: add another set of tests from AArch64
Another batch with no code changes.

llvm-svn: 206381
2014-04-16 11:53:07 +00:00
Tim Northover 3ec1de7767 AArch64/ARM64: port across stub handling for ELF C++ exceptions.
The most important part here is that we should actuall emit the stubs we refer
to in the exception table, but as a side issue this uses more sensible & GCC
compatible representations for some of the bits of information.

llvm-svn: 206380
2014-04-16 11:52:55 +00:00
Tim Northover 18f68f6d1a ARM64: use 32-bit moves for constants where possible.
If we know that a particular 64-bit constant has all high bits zero, then we
can rely on the fact that 32-bit ARM64 instructions automatically zero out the
high bits of an x-register. This gives the expansion logic less constraints to
satisfy and so sometimes allows it to pick better sequences.

Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a
32-bit MOVN to be used in @test8 soon.

llvm-svn: 206379
2014-04-16 11:52:51 +00:00
Tim Northover 9cfb57dafa ARM64: use the integrated assembler on ELF.
llvm-svn: 206378
2014-04-16 11:52:40 +00:00
Tim Northover 863a789a99 DAGCombiner: don't optimise non-existant litpool load
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.

Picked up while merging some AArch64 tests.

llvm-svn: 206365
2014-04-16 09:03:09 +00:00
Quentin Colombet 72dad56c53 [ARM64] Set default CPU to generic instead of cyclone.
llvm-svn: 206313
2014-04-15 19:08:46 +00:00
Tim Northover bd668872c0 AArch64/ARM64: enable more AArch64 tests on ARM64.
No code changes for this bunch, just some test rejigs.

llvm-svn: 206291
2014-04-15 14:00:29 +00:00
Tim Northover ebb3123a5f AArch64/ARM64: add missing pattern for extending load.
llvm-svn: 206290
2014-04-15 14:00:19 +00:00
Tim Northover cbcb7a37f7 AArch64/ARM64: only mangle MOVZ/MOVN during encoding when needed
Sometimes we need emit the bits that would actually be a MOVN when producing a
relocated MOVZ instruction (don't ask). But not always, a check which ARM64 got
wrong until now.

llvm-svn: 206289
2014-04-15 14:00:15 +00:00
Tim Northover 6e27b8ded5 AArch64/ARM64: add support for large code-model jump tables.
I've left the MachO CodeGen as it is, there's a reasonable chance it should use
the GOT like ConstPools, but I'm not certain.

llvm-svn: 206288
2014-04-15 14:00:11 +00:00
Tim Northover 221b583951 AArch64/ARM64: add patterns for various commutations of FNMADD.
llvm-svn: 206287
2014-04-15 14:00:06 +00:00
Tim Northover b37cff1ae2 AArch64/ARM64: add half as a storage type on ARM64.
This brings it into line with the AArch64 behaviour and should open the way for
certain OpenCL features.

llvm-svn: 206286
2014-04-15 14:00:03 +00:00
Tim Northover 80a70a265a AArch64/ARM64: copy patterns for fixed-point conversions
Code is mostly copied directly across, with a slight extension of the
ISelDAGToDAG function so that it can cope with the floating-point constants
being behind a litpool.

llvm-svn: 206285
2014-04-15 13:59:57 +00:00
Tim Northover f70577b1cd ARM64: add constraints to various FastISel operations
llvm-svn: 206284
2014-04-15 13:59:53 +00:00
Tim Northover 27010074fb AArch64/ARM64: add more arm64 lines to AArch64 regression tests
llvm-svn: 206282
2014-04-15 13:59:44 +00:00
Tim Northover 20603726ce AArch64/ARM64: add dp tests from AArch64
llvm-svn: 206281
2014-04-15 13:59:40 +00:00
Tim Northover db2860f49e ARM64: specify full triple in tests to pacify Windows.
llvm-svn: 206175
2014-04-14 13:18:48 +00:00
Tim Northover a89617bd33 AArch64: add newline to end of test files.
Should be no other change.

llvm-svn: 206174
2014-04-14 13:18:40 +00:00
Tim Northover b6abe806c7 AArch64/ARM64: enable directcond.ll test on ARM64.
Code change is because optimizeCompareInstr didn't know how to pull the
condition code out of FCSEL instructions.

llvm-svn: 206171
2014-04-14 12:51:06 +00:00
Tim Northover 0d7bd4f444 ARM64: add patterns for csXYZ with reversed operands.
AArch64 tests for this, and it's obviously a good idea. Have to invert the
condition code, of course.

llvm-svn: 206170
2014-04-14 12:51:02 +00:00
Tim Northover c398cd53aa ARM64: enable more regression tests from AArch64
llvm-svn: 206169
2014-04-14 12:50:58 +00:00
Tim Northover 2f48303436 ARM64: add support for AArch64's addsub_ext.ll
There was one definite issue in ARM64 (the off-by-1 check for whether
a shift could be folded in) and one difference that is probably
correct: ARM64 didn't fold nodes with multiple uses into the
arithmetic operations unless optimising for code size.

llvm-svn: 206168
2014-04-14 12:50:50 +00:00
Tim Northover 23b1f08282 ARM64: optimise (cmp x, (sub 0, y)) to (cmn x, y).
This transformation is only valid when being used for an EQ or NE
comparison since the flags change otherwise.

llvm-svn: 206167
2014-04-14 12:50:47 +00:00
Tim Northover d1719a8f76 ARM64: start porting regression test suite from AArch64
llvm-svn: 206166
2014-04-14 12:50:41 +00:00
Chad Rosier 5f8d6a6c15 [AArch64] Implement the isZExtFree APIs.
llvm-svn: 205926
2014-04-09 20:51:21 +00:00
Chad Rosier 9ce19fb65c [AArch64] Implement the isTruncateFree API.
In AArch64 i64 to i32 truncate operation is a subregister access.

This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).

llvm-svn: 205925
2014-04-09 20:43:40 +00:00
Logan Chien 30eb9f47c6 [AArch64] Lower SHL_PARTS, SRA_PARTS and SRL_PARTS
Lower SHL_PARTS, SRA_PARTS and SRL_PARTS to perform 128-bit integer shift

Patch by GuanHong Liu.

llvm-svn: 204940
2014-03-27 16:28:09 +00:00
Christian Pirker 99974c7242 AArch64_BE Elf support for MC-JIT runtime dynamic linker
llvm-svn: 204816
2014-03-26 14:57:32 +00:00
Christian Pirker 3aa0e6a1f9 AArch64_BE function argument passing for ARM ABI
llvm-svn: 204814
2014-03-26 14:51:22 +00:00
Manman Ren 78cf02a07b Register Allocator: check other options before using a CSR for the first time.
When register allocator's stage is RS_Spill, we choose spill over using the CSR
for the first time, if the spill cost is lower than CSRCost. 
When register allocator's stage is < RS_Split, we choose pre-splitting over
using the CSR for the first time, if the cost of splitting is lower than
CSRCost.

CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
default value is 0 to generate the same codes as before this commit.

With a value of 15 (1 << 14 is the entry frequency), I measured performance
gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
on an arm device.

rdar://16162005

llvm-svn: 204690
2014-03-25 00:16:25 +00:00
Chad Rosier b7747e31ef [AArch64] Add SchedRW lists to NEON instructions.
Previously, only regular AArch64 instructions were annotated with SchedRW lists.
This patch does the same for NEON enabling these instructions to be scheduled by
the MIScheduler. Additionally, store operations are now modeled and a few
SchedRW lists were updated for bug fixes (e.g. multiple def operands).

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 204505
2014-03-21 19:34:41 +00:00
Kevin Qin b2c78b07d6 [AArch64] Remove .data_region directive from AArch64.
.data_region is only used in Darwin, so it shouldn't be generated
for other OS. Currently AArch64 doesn't support darwin yet, so
I removed it from AArch64. When Darwin is supported someday, we can
add it back and associate it with Darwin.

llvm-svn: 204424
2014-03-21 02:12:48 +00:00
Matt Arsenault 985b9de485 Make DAGCombiner work on vector bitshifts with constant splat vectors.
llvm-svn: 204071
2014-03-17 18:58:01 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Matt Arsenault 532db69984 Fix undefined behavior in vector shift tests.
These were all shifting the same amount as the bitwidth.

llvm-svn: 203519
2014-03-11 00:01:41 +00:00
Tim Northover 2a661f3f73 AArch64: fix LowerCONCAT_VECTORS for new CodeGen.
The function was making too many assumptions about its input:

1. The NEON_VDUP optimisation was far too aggressive, assuming (I
think) that the input would always be BUILD_VECTOR.

2. We were treating most unknown concats as legal (by returning Op
rather than SDValue()). I think only concats of pairs of vectors are
actually legal.

http://llvm.org/PR19094

llvm-svn: 203450
2014-03-10 09:34:07 +00:00
Rafael Espindola b1f25f1b93 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Chad Rosier 86a8f72041 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 203125
2014-03-06 16:04:00 +00:00
Chad Rosier 70cb2311ab Revert "[AArch64] This is a work in progress to provide a machine description"
This reverts commit ff717c8fc786a0cfa1602982b91895fa09e514fc.

llvm-svn: 202773
2014-03-04 00:32:07 +00:00
Chad Rosier fe45290566 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 202767
2014-03-03 23:32:47 +00:00
Tim Northover ed9c20681d AArch64: simplify tbl/tbx polymorphism
The table argument is always 128-bit (and interpreted as <16 x i8>) so the
extra specifier for it is just clutter.

No user-visible behaviour change, so no tests.

llvm-svn: 202258
2014-02-26 11:55:09 +00:00
Kevin Qin 07334d37de [AArch64] Add register constraints to avoid generating STLXR and STXR with unpredictable behavior.
llvm-svn: 201841
2014-02-21 07:45:48 +00:00
Oliver Stannard 7b2f2fba7f AArch64: __va_list.__stack must be 8-byte aligned
The va_start macro for AArch64 must set va_list.__stack to the address
following the last named argument on the stack, rounded up to an alignment
of 8 bytes.

llvm-svn: 201797
2014-02-20 17:19:26 +00:00
Ana Pazos 7c27a265dc [AArch64] Expanded sin, cos, pow with FP vector types inputs
llvm-svn: 201601
2014-02-18 20:31:05 +00:00
Jiangning Liu 742c588edc Fix a typo about lowering AArch64 va_copy.
llvm-svn: 201541
2014-02-18 02:37:42 +00:00
Nico Rieck 7647178738 Fix broken CHECK lines
llvm-svn: 201479
2014-02-16 07:31:05 +00:00
Kevin Qin edc95ee196 [AArch64 NEON] Fix a bug to avoid using floating type as condition type in lowering SELECT_CC.
llvm-svn: 201395
2014-02-14 09:41:15 +00:00
Hao Liu 7146ef8542 [AArch64]Fix the assertion failure caused by "v1i1 SETCC" DAG node.
As v1i1 is illegal, the type legalizer tries to scalarize such node. But if the type operands of SETCC is legal, the scalarization algorithm will cause an assertion failure.

llvm-svn: 201381
2014-02-14 02:21:56 +00:00
Rafael Espindola 432acf5cef Specify a triple. MachO AArch64 support is missing.
llvm-svn: 201335
2014-02-13 15:30:06 +00:00
Daniel Sanders 753e17629d Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333
2014-02-13 14:44:26 +00:00
NAKAMURA Takumi 6fe94621b9 llvm/test/CodeGen/AArch64/cpus.ll: Tweak to use -mtriple=aarch64-unknown-unknown, or this would crash for targeting pecoff like *-mingw32.
llvm-svn: 201315
2014-02-13 11:06:23 +00:00
Oliver Stannard 5bbb72f37e Add Cortex-A53 and Cortex-A57 cores to the AArch64 backend
llvm-svn: 201305
2014-02-13 09:46:11 +00:00
Hao Liu 7b6dfcf06a [AArch64]Fix the problems that can't select mul/add/sub of v1i8/v1i16/v1i32 types.
As this problems are similar to shl/sra/srl, also add patterns for shift nodes.

llvm-svn: 201298
2014-02-13 05:42:33 +00:00
Hao Liu 4f345f3c03 [AArch64]Add support for spilling FPR8/FPR16.
llvm-svn: 201287
2014-02-13 02:36:58 +00:00
Daniel Sanders abe212a3b8 Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
It introduced multiple test failures in the buildbots.

llvm-svn: 201241
2014-02-12 15:39:20 +00:00
Daniel Sanders a7d504cf58 Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237
2014-02-12 14:44:54 +00:00
Hao Liu 6e73761dc8 [AArch64]Implement the copy of two FPR8 registers by using FMOVss of two FPR32 registers in copyPhysReg.
llvm-svn: 201061
2014-02-10 03:16:22 +00:00
Tim Northover fdbdb4b6d5 ARM & AArch64: merge NEON absolute compare intrinsics
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.

llvm-svn: 200768
2014-02-04 14:55:42 +00:00
Tim Northover 24979d8e10 AArch64 & ARM: refactor crypto intrinsics to take scalars
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.

This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.

llvm-svn: 200706
2014-02-03 17:27:49 +00:00
Chad Rosier fe5ab2f5ba [AArch64] Custom lower concat_vector patterns with v4i16, v4i32, v8i8, v8i16, v16i8 types.
llvm-svn: 200491
2014-01-30 21:46:54 +00:00
Kevin Qin 92d64d2d56 [AArch64 NEON] Lower SELECT_CC with vector operand.
When the scalar compare is between floating point and operands are
vector, we custom lower SELECT_CC to use NEON SIMD compare for
generating less instructions.

llvm-svn: 200365
2014-01-29 01:57:30 +00:00
Kevin Qin 4a183d7094 [AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or SHUFFLE_VECTOR.
Replace r199791.

llvm-svn: 200180
2014-01-27 02:53:54 +00:00
Kevin Qin 9eeedfbaa6 Revert r199791.
It's old version which has some bugs. I'll commit lattest patch soon.

llvm-svn: 200179
2014-01-27 02:53:41 +00:00
Jiangning Liu fb3c17b6c9 Improve pattern match from v1i8 to v1i32 for AArch64 Neon.
llvm-svn: 200119
2014-01-26 04:55:53 +00:00
Jiangning Liu 6398d839c6 Implement pattern match from v1xx to v1xx for AArch64 Neon.
llvm-svn: 200113
2014-01-26 03:27:40 +00:00
Kevin Qin 18662f4b7c [AArch64 NEON] Add patterns for concat_vector on v2i32.
llvm-svn: 200111
2014-01-26 02:46:15 +00:00
Kevin Qin a4068c4243 [AArch64 NEON] Add test case for vector FP_ROUND.
llvm-svn: 200110
2014-01-26 02:23:33 +00:00
Ana Pazos cd3b9f763e [AArch64] Removed unused i8 type from FPR8 register class.
The i8 type is not registered with any register class.
This causes a segmentation fault in MachineLICM::getRegisterClassIDAndCost.

The code selects the first type associated with register class FPR8,
which happens to be i8.
It uses this type (i8) to get the representative class pointer, which is 0.
It then uses this pointer to access a field, resulting in segmentation fault.

Since i8 type is not being used for printing any neon instruction
we can safely remove it.

llvm-svn: 200046
2014-01-24 22:36:53 +00:00
Kevin Qin 21cd2152d3 [AArch64 NEON] Fix a bug in implementing register copy bwtween FPR16.
llvm-svn: 199978
2014-01-24 07:53:04 +00:00
Ana Pazos 5d31f6945b [AArch64] Added vselect patterns with float and double types
llvm-svn: 199925
2014-01-23 19:18:57 +00:00
Hao Liu b920682e4a [AArch64]Add CHECK for two test cases testing scalar_to_vector committed in r199461.
llvm-svn: 199861
2014-01-23 02:09:30 +00:00
Kevin Qin ce0190c6d5 [AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or SHUFFLE_VECTOR.
llvm-svn: 199791
2014-01-22 06:11:03 +00:00
Kevin Qin 6d379abd8f [AArch64 NEON] Fix a bug caused by undef lane when generating VEXT.
It was commited as r199628 but reverted in r199628 as causing
regression test failed. It's because of old vervsion of patch
I used to commit. Sorry for mistake.

llvm-svn: 199704
2014-01-21 01:48:52 +00:00
Chandler Carruth f835fc6f4f Revert r199628: "[AArch64 NEON] Fix a bug caused by undef lane when generating VEXT."
This test fails the newly added regression tests.

llvm-svn: 199631
2014-01-20 08:18:01 +00:00
Kevin Qin ff42e06ef4 [AArch64 NEON] Fix a bug caused by undef lane when generating VEXT.
llvm-svn: 199628
2014-01-20 07:32:26 +00:00
Kevin Qin e0faea11b1 [AArch64 NEON] Expand vector for UDIV/SDIV/UREM/SREM/FREM as neon doesn't support these operations.
llvm-svn: 199485
2014-01-17 09:54:30 +00:00
Hao Liu 17457a2ee2 [AArch64]Fix the problem can't select f16_to_f32 and f32_to_f16.
Also add copy support for FPR16.
Also add a missing test case file belongs to commit r197361.

llvm-svn: 199463
2014-01-17 06:23:30 +00:00
Kevin Qin 212d9b4a56 [AArch64 NEON] Custom lower conversion between vector integer and vector floating point if element bit-width doesn't match.
llvm-svn: 199462
2014-01-17 05:52:35 +00:00
Hao Liu 18d92262c5 [AArch64]Fix the problem can't select concat_vectors of two v1i32 types.
Also fix the problem can't select scalar_to_vector from f32 to v2f32/v4f32.

llvm-svn: 199461
2014-01-17 05:44:46 +00:00
Jiangning Liu 0a791c348b For AArch64, lowering sext_inreg and generate optimized code by using SXTL.
llvm-svn: 199296
2014-01-15 05:08:01 +00:00
Tim Northover 6e219cd588 AArch64: don't try to handle [SU]MUL_LOHI nodes
We should set them to expand for now since there are no patterns
dealing with them. Actually, there are no instructions either so I
doubt they'll ever be acceptable.

llvm-svn: 199265
2014-01-14 22:53:22 +00:00
Rafael Espindola 08ff298d51 Revert "[AArch64] Added vselect patterns with float and double types"
This reverts commit r199242.

It is causing CodeGen/AArch64/neon-bsl.ll to fail.

llvm-svn: 199248
2014-01-14 19:24:08 +00:00
Ana Pazos 787f540daa [AArch64] Added vselect patterns with float and double types
llvm-svn: 199242
2014-01-14 18:45:48 +00:00
Andrea Di Biagio 9bc0415c1f [AArch64] Fix assertion failure caused by an invalid comparison between APInt values.
APInt only knows how to compare values with the same BitWidth and asserts
in all other cases.

With this fix, function PerformORCombine does not use the APInt equality
operator if the APInt values returned by 'isConstantSplat' differ in BitWidth.
In that case they are different and no comparison is needed.

llvm-svn: 199119
2014-01-13 16:51:00 +00:00
Kevin Qin cfef55d6d4 [AArch64 NEON] Add missing patterns for bitcast from or to v1f64
llvm-svn: 199070
2014-01-13 01:58:38 +00:00
Kevin Qin 21e8f1c4eb [AArch64 NEON] Add more scenarios to use perm instructions when lowering shuffle_vector
This patch covered 2 more scenarios:

1.  Two operands of shuffle_vector are the same, like
%shuffle.i = shufflevector <8 x i8> %a, <8 x i8> %a, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>

2. One of operands is undef, like
%shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>

After this patch, perm instructions will have chance to be emitted instead of lots of INS.

llvm-svn: 199069
2014-01-13 01:56:29 +00:00
Nico Rieck b5262d6d8f Fix non-deterministic SDNodeOrder-dependent codegen
Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code
generation.

llvm-svn: 199050
2014-01-12 14:09:17 +00:00
Kristof Beyls 58306ad903 Make sure -use-init-array has intended effect on all AArch64 ELF targets, not just linux.
llvm-svn: 198937
2014-01-10 13:41:49 +00:00
Andrea Di Biagio 23df4e4a2d Teach the DAGCombiner how to fold 'vselect' dag nodes according
to the following two rules:
  1) fold (vselect (build_vector AllOnes), A, B) -> A
  2) fold (vselect (build_vector AllZeros), A, B) -> B

llvm-svn: 198777
2014-01-08 18:33:04 +00:00
Kevin Qin 44946439e1 [AArch64 NEON] Fix generating incorrect value type of NEON_VDUPLANE
when lower build_vector if result value type mismatch with operand
value type.

llvm-svn: 198743
2014-01-08 08:06:14 +00:00
Hao Liu 7d11d99d20 [AArch64]Add support to spill/fill D tuples such as DPair/DTriple/DQuad. There is no test cases for D tuple as the original test cases are too large. As the spill/fill of the D tuple is similar to the Q tuple, the correctness can be guaranteed.
llvm-svn: 198684
2014-01-07 10:50:43 +00:00
Hao Liu 27d88376bc [AArch64]Add support to copy D tuples such as DPair/DTriple/DQuad and Q tuples such as QPair/QTriple/QQuad. There is no test case for D tuple as the original test cases are too large. As the copy of the D tuple is similar to the Q tuple, the correctness can be guaranteed.
llvm-svn: 198682
2014-01-07 10:00:03 +00:00
Kevin Qin cfa41a2569 [AArch64 NEON] Fixed incorrect immediate used in BIC instruction.
llvm-svn: 198675
2014-01-07 05:10:47 +00:00
Kevin Qin 5cd73c9e0a [AArch64 NEON] Fix invalid constant used in vselect condition.
There is a wrong assumption that the vector element type and the
type of each ConstantSDNode in the build_vector were the same.
However, when promoting the integer operand of a legally typed
build_vector, the operand type and the vector element type do not
need to be the same
(See method 'DAGTypeLegalizer::PromoteIntOp_BUILD_VECTOR' in
LegalizeIntegerTypes.cpp).

  in AArch64 backend, the following dag sequence:

  C0: i1 = Constant<0>
  C1: i1 = Constant<-1>
  V: v8i1 = BUILD_VECTOR C1, C1, C0, C0, C0, C0, C0, C0

  is type-legalized into:

  NewC0: i32 = Constant<0>
  NewC1: i32 = Constant<1>
  V: v8i8 = BUILD_VECTOR NewC1, NewC1, NewC0, NewC0, NewC0, NewC0, NewC0, NewC0

Forcing a getZeroExtend to VTBits to ensure that the new constant
is correctly.

llvm-svn: 198582
2014-01-06 02:26:10 +00:00
Jiangning Liu a0acf70af1 For AArch64 Neon, simplify scalar dup by lane0 for fp.
llvm-svn: 198194
2013-12-30 02:44:35 +00:00
Hao Liu fe3bfc8c41 [AArch64]Add code to spill/fill Q register tuples such as QPair/QTriple/QQuad.
llvm-svn: 198193
2013-12-30 02:38:12 +00:00
Hao Liu b591f835d6 [AArch64]Can't select shift left 0 of type v1i64
llvm-svn: 198192
2013-12-30 02:12:46 +00:00
Kevin Qin ede9ce1933 Fix a bug in DAGcombiner about zero-extend after setcc.
For AArch64 backend, if DAGCombiner see "sext(setcc)", it will
combine them together to a single setcc with extended value type.
Then if it see "zext(setcc)", it assumes setcc is Vxi1, and try to
create "(and (vsetcc), (1, 1, ...)". While setcc isn't Vxi1,
DAGcombiner will create wrong node and get wrong code emitted.

llvm-svn: 198190
2013-12-30 02:05:13 +00:00
Hao Liu 74107fe526 [AArch64]Fix the problem that can't select mul of v1i64/v2i64 types.
E.g. Can't select such IR:
     %tmp = mul <2 x i64> %a, %b

llvm-svn: 198188
2013-12-30 01:38:41 +00:00
Hao Liu 83799741fb [AArch64]Fix a problem that the register order of fmls/fmla by element is incorrect.
E.g. the codegen result is 
     fmls v1.2s, v0.2s, v2.s[3]
which is expected to be
     fmls v0.2s, v1.2s, v2.s[3]

llvm-svn: 198001
2013-12-25 07:12:34 +00:00
Jiangning Liu dd1afd5338 Add missing pattern matches to support ACLE intrinsics of AArch64 NEON.
llvm-svn: 197993
2013-12-25 01:22:51 +00:00
Hao Liu ce7a12be8f [AArch64]Add patterns to match normal shift nodes: shl, sra and srl.
llvm-svn: 197969
2013-12-24 09:00:21 +00:00
Kevin Qin 82bd84aadf [AArch64 NEON] Fix a bug when lowering BUILD_VECTOR.
DAG.getVectorShuffle() doesn't always return a vector_shuffle node.
If mask is the exact sequence of it's operand(For example, operand_0
is v8i8, and  the mask is 0, 1, 2, 3, 4, 5, 6, 7), it will directly
return that operand. So a check is added here.

llvm-svn: 197967
2013-12-24 08:16:06 +00:00
Kevin Qin cd5f3153f5 [AArch64 NEON] Fix a pattern match failure with NEON_VDUP.
This failure caused by improper condition when lowering shuffle_vector
to scalar_to_vector. After this patch NEON_VDUP with v1i64 will not
be generated.

llvm-svn: 197966
2013-12-24 08:11:47 +00:00
Ana Pazos bc2996b30f [AArch64] Check fmul node single use in fused multiply patterns
Check for single use of fmul node in fused multiply patterns
to allow generation of fused multiply add/sub instructions.
Otherwise fmul operation ends up being repeated more than
once which does not help peformance on targets with
only one MAC unit, as for example cortex-a53.

llvm-svn: 197929
2013-12-24 00:47:29 +00:00
Ana Pazos 3ca23915cd [AArch64 NEON] Fixed fused multiply negate add/sub patterns
The correct pattern matching should be:

- fnmadd is (-Ra) + (-Rn)*Rm  which should be matched as:

  fma (fneg node:$Rn),  node:$Rm, (fneg node:$Ra) and as

  (f32 (fsub (f32 (fneg FPR32:$Ra)), (f32 (fmul FPR32:$Rn, FPR32:$Rm))))

- fnmsub is (-Ra) + Rn*Rm which should be matched as

  fma node:$Rn,  node:$Rm, (fneg node:$Ra) and as

  (f32 (fsub (f32 (fmul FPR32:$Rn, FPR32:$Rm)), FPR32:$Ra))))

llvm-svn: 197928
2013-12-24 00:40:10 +00:00
Hao Liu 408c8b0866 [AArch64]The compare to zero intrinsics should be implemented by 'icmp/fcmp' and 'sext' not 'zext'. Modify the test cases.
llvm-svn: 197897
2013-12-23 02:42:10 +00:00
Kevin Qin 53eaea0104 [AArch64 NEON]Implment loading vector constant form constant pool.
llvm-svn: 197551
2013-12-18 06:26:04 +00:00
Hao Liu 774cabb538 [AArch64]Fix the pattern match failure for v1i8/v1i16/v1i32 types.
Currently we have such types as legal vector types. The DAG combiner may generate some DAG nodes having such types but we don't have patterns to match them.
E.g. a load i32 and a bitcast i32 to v1i32 will be combined into a load v1i32:
     bitcast (load i32) to v1i32 -> load v1i32.
So this patch fixes such problems for load/dup instructions.
If v1i8/v1i16/v1i32 are not legal any more, the code in this patch can be deleted. So I also add some FIXME.

llvm-svn: 197361
2013-12-16 02:51:28 +00:00
Chad Rosier 4055f42d22 [AArch64] Removed unnecessary copy patterns with v1fx types.
- Copy patterns with float/double types are enough.
- Fix typos in test case names that were using v1fx.
- There is no ACLE intrinsic that uses v1f32 type.  And there is no conflict of
  neon and non-neon ovelapped operations with this type, so there is no need to
  support operations with this type.
- Remove v1f32 from FPR32 register and disallow v1f32 as a legal type for
  operations.

Patch by Ana Pazos!

llvm-svn: 197159
2013-12-12 15:46:29 +00:00
Hao Liu 46a10eec28 [AArch64]Fix the problem that AArch64 backend fails to select scalar_to_vector of vector types having more than one element.
llvm-svn: 197135
2013-12-12 07:36:26 +00:00
Kevin Qin e4ddaa1bd5 Fix Incorrect CHECK message [0-31]+ in test case.
In regular expression, [0-31]+ equals to [0-3]+, not the number from
0 to 31. So change it to [0-9]+.

llvm-svn: 197113
2013-12-12 02:19:13 +00:00
Quentin Colombet 18b779e3f4 Fix an over-constrained assertion in MachineFunction::addLiveIn.
The assertion was checking that the virtual register VReg used to represent the
physical register PReg uses the same register class as the one passed to
MachineFunction::addLiveIn.
This is over-constraining because it is sufficient to check that the register
class of VReg (VRegRC) is a subclass of the register class of PReg (PRegRC) and
that VRegRC contains PReg.
Indeed, if VReg gets constrained because of some operation constraints
between two calls of MachineFunction::addLiveIn, the original assertion
cannot match.

This fixes <rdar://problem/15633429>. 

llvm-svn: 197097
2013-12-12 00:15:47 +00:00
Chad Rosier 446d8ea0fb [AArch64] Refactor NEON floating-point Max/Min/Maxnm/Minnm across vector AArch64
intrinsics to use f32 types, rather than their vector equivalents.

llvm-svn: 197090
2013-12-11 23:21:25 +00:00
Chad Rosier 088f93d4b5 [AArch64] Add NEON scalar floating-point compare LLVM AArch64 intrinsics that
use f32/f64 types, rather than their vector equivalents.

llvm-svn: 197068
2013-12-11 21:03:46 +00:00
Chad Rosier 473a01e1c9 [AArch64] Refactor the NEON scalar floating-point reciprocal step and
floating-point reciprocal square root step LLVM AArch64 intrinsics to
use f32/f64 types, rather than their vector equivalents.

llvm-svn: 197067
2013-12-11 21:03:43 +00:00
Chad Rosier 7098fcc062 [AArch64] Refactor the NEON scalar floating-point reciprocal estimate, floating-
point reciprocal exponent, and floating-point reciprocal square root estimate
LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector
equivalents.

llvm-svn: 197066
2013-12-11 21:03:40 +00:00
Chad Rosier f70af21651 [AArch64] Refactor the NEON floating-point absolute difference LLVM AArch64
intrinsic to use f32/f64 types, rather than their vector equivalents.

llvm-svn: 196965
2013-12-10 21:33:59 +00:00
Chad Rosier 07cc3f9100 [AArch64] Refactor the NEON signed/unsigned floating-point convert to fixed-point
LLVM AArch64 intrinsics to use f32/f64, rather than their vector equivalents.

llvm-svn: 196964
2013-12-10 21:33:56 +00:00
Chad Rosier 98b8baa35c [AArch64] Overload NEON signed/unsigned floating-point convert to fixed-point
and fixed-point convert to floating-point LLVM AArch64 intrinsics.

llvm-svn: 196963
2013-12-10 21:33:53 +00:00
Chad Rosier cc34d187b8 [AArch64] Overload NEON signed/unsigned integer convert to floating-point
LLVM AArch64 intrinsics.

llvm-svn: 196962
2013-12-10 21:33:50 +00:00
Chad Rosier 7a9bba442f [AArch64] Refactor the Neon vector/scalar floating-point convert intrinsics so
that they use float/double rather than the vector equivalents when appropriate.

llvm-svn: 196930
2013-12-10 16:11:39 +00:00
Chad Rosier fcc4c366d1 [AArch64] Refactor the Neon vector/scalar floating-point convert implementation.
Specifically, reuse the ARM intrinsics when possible.

llvm-svn: 196926
2013-12-10 15:35:33 +00:00
Kevin Qin 04396d1e69 [AArch64 NEON] Support poly128_t and implement relevant intrinsic.
llvm-svn: 196887
2013-12-10 06:48:35 +00:00
Chad Rosier 5c8bf9c3db [AArch64] Refactor the NEON scalar reduce pairwise intrinsics, so that they use
float/double rather than the vector equivalents when appropriate.

llvm-svn: 196833
2013-12-09 22:47:38 +00:00
Chad Rosier 3b0b3ee71e [AArch64] Refactor NEON scalar reduce pairwise front-end codegen to remove
unnecessary patterns in tablegen.

llvm-svn: 196832
2013-12-09 22:47:34 +00:00
Chad Rosier 397ff3945c [AArch64] Remove q and non-q intrinsic definitions in the NEON scalar reduce
pairwise implementation, using an overloaded definition instead.

llvm-svn: 196831
2013-12-09 22:47:31 +00:00
Ana Pazos bde2828ae0 Fix pattern match for movi with 0D result
Patch by Jiangning Liu.

With some test case changes:
- intrinsic test added to the existing /test/CodeGen/AArch64/neon-aba-abd.ll.
- New test cases to cover movi 1D scenario without using the intrinsic in
test/CodeGen/AArch64/neon-mov.ll.

llvm-svn: 196806
2013-12-09 19:29:14 +00:00
Hao Liu 96a587a9f7 [AArch64]Add missing pair intrinsics such as:
int32_t vminv_s32(int32x2_t a)
which should be compiled into SMINP Vd.2S,Vn.2S,Vm.2S

llvm-svn: 196749
2013-12-09 03:51:42 +00:00
Hao Liu 868caea6d1 [AArch64]Pattern match failures for truncate store and extend load
llvm-svn: 196748
2013-12-09 03:34:08 +00:00
Jiangning Liu 65d8e3422a For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64.
llvm-svn: 196456
2013-12-05 02:12:01 +00:00
Kevin Qin afd095de8b [AArch64 Neon] Add ACLE intrinsic vceqz_f64.
llvm-svn: 196362
2013-12-04 08:02:34 +00:00
Kevin Qin f9832e8de7 [AArch64 NEON] Add missing compare intrinsics.
llvm-svn: 196360
2013-12-04 07:53:28 +00:00
Hao Liu dca64f4a20 [AArch64]Add missing floating point convert, round and misc intrinsics.
E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn

llvm-svn: 196210
2013-12-03 06:06:55 +00:00
Hao Liu c250cbc095 AArch64: add missing ACLE intrinsics mapping to general arithmetic operation from VFP instructions.
E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm.

llvm-svn: 196208
2013-12-03 05:58:30 +00:00
Hao Liu 21a461353a AArch64: Add missing scalar pair intrinsics.
E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s".

llvm-svn: 196198
2013-12-03 03:39:47 +00:00
Jiangning Liu 3a541d46a1 Add some missing pattern matches for AArch64 Neon intrinsics like vuqadd_s64 and friends.
llvm-svn: 196192
2013-12-03 01:33:52 +00:00
Jiangning Liu 94a7bb2130 Add some missing pattern matches for AArch64 Neon intrinsics like vmull_high_n_s16 and friends.
llvm-svn: 196190
2013-12-03 01:29:32 +00:00
Chad Rosier 3106de3f9d [AArch64] Implemented vcopy_lane patterns using scalar DUP instruction.
Patch by Ana Pazos!

llvm-svn: 196151
2013-12-02 21:05:16 +00:00
Hao Liu ba38eee8ac AArch64: The pattern match should check the range of the immediate value.
Or we can generate some illegal instructions.
E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16].

llvm-svn: 195941
2013-11-29 02:11:22 +00:00
Jiangning Liu f7b4c7c2ce Add missing test case for bsl_f64 support of AArch64 NEON.
llvm-svn: 195939
2013-11-29 01:38:08 +00:00
Jiangning Liu 97aa8cf8b7 Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics.
llvm-svn: 195843
2013-11-27 14:02:25 +00:00
Chad Rosier 75290c6307 [AArch64] Add support for NEON scalar floating-point absolute difference.
llvm-svn: 195803
2013-11-27 01:45:58 +00:00
Chad Rosier 9653d5c989 [AArch64] Add support for NEON scalar floating-point to integer convert
instructions.

llvm-svn: 195788
2013-11-26 22:17:37 +00:00
Kevin Qin 599c47d0de Refactored the implementation of AArch64 NEON instruction ZIP, UZP
and TRN.
Fix a bug when mixed use of vget_high_u8() and vuzp_u8().

llvm-svn: 195716
2013-11-26 03:26:47 +00:00
Kevin Qin 33ca18fdcf [AArch64]Implement 128 bit register copy with NEON.
llvm-svn: 195713
2013-11-26 02:33:42 +00:00
Hao Liu 25aed9bb5b Fix the bugs about AArch64 Load/Store vector types and bitcast between i64 and vector types.
e.g. "%tmp = load <2 x i64>* %ptr" can't be selected. 
     "%tmp = bitcast i64 %in to <2 x i32>" can't be selected.

llvm-svn: 195424
2013-11-22 08:47:22 +00:00
Jiangning Liu a91633a435 For AArch64 back-end instruction selection, lower Neon_Lowxxx with EXTRCT_SUBREG.
llvm-svn: 195408
2013-11-22 02:45:13 +00:00
Ana Pazos 9ac2fc85d2 Implemented Neon scalar vdup_lane intrinsics.
Fixed scalar dup alias and added test case.

llvm-svn: 195330
2013-11-21 08:16:15 +00:00
Ana Pazos fbc1adbaa7 Implemented Neon scalar by element intrinsics.
Intrinsics implemented: vqdmull_lane, vqdmulh_lane, vqrdmulh_lane,
vqdmlal_lane, vqdmlsl_lane scalar Neon intrinsics.

llvm-svn: 195327
2013-11-21 07:37:04 +00:00
Hao Liu 16edc4675c Implement AArch64 neon instructions class SIMD lsone and SIMD lone-post.
llvm-svn: 195078
2013-11-19 02:17:05 +00:00
Jiangning Liu 0c0c1e8598 Implement AArch64 SISD intrinsics for vget_high and vget_low.
llvm-svn: 195074
2013-11-19 01:46:48 +00:00
Jiangning Liu e329114ae5 Add predicate for AArch64 crypto instructions.
llvm-svn: 195071
2013-11-19 01:38:31 +00:00
Hao Liu 5a4e4e107d Implement the newly added ACLE functions for ld1/st1 with 2/3/4 vectors.
The functions are like: vst1_s8_x2 ...

llvm-svn: 194990
2013-11-18 06:31:53 +00:00
Ana Pazos d035209bd7 Implemented aarch64 Neon scalar vmulx_lane intrinsics
Implemented aarch64 Neon scalar vfma_lane intrinsics
Implemented aarch64 Neon scalar vfms_lane intrinsics

Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64
intrinsics (v1f64 parameter type) using Neon scalar instructions.

Implemented legacy vfma_lane_f64, vfms_lane_f64,
vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type)
using Neon scalar instructions.

llvm-svn: 194888
2013-11-15 23:32:10 +00:00
Chad Rosier 0c57c3402e [AArch64] Fix the scalar NEON ACLE functions so that they return float/double
rather than the vector equivalent.

llvm-svn: 194853
2013-11-15 21:28:10 +00:00
Kevin Qin 6e0547dfc9 Add test case for AArch64 NEON instruction set misc.
llvm-svn: 194673
2013-11-14 06:45:17 +00:00
Kevin Qin aec95baf1a Implement aarch64 neon instruction class SIMD misc.
llvm-svn: 194656
2013-11-14 02:44:13 +00:00
Jiangning Liu bb60ccf355 Implement AArch64 NEON instruction set AdvSIMD (table).
llvm-svn: 194648
2013-11-14 01:57:32 +00:00
Chad Rosier d3ae5f895e [AArch64] Add support for legacy AArch32 NEON scalar shift by immediate
instructions.  This patch does not include the shift right and accumulate
instructions.  A number of non-overloaded intrinsics have been remove in favor
of their overloaded counterparts.

llvm-svn: 194598
2013-11-13 20:05:37 +00:00
Chad Rosier d3684a0566 [AArch64] The shift right/left and insert immediate builtins expect 3
source operands, a vector, an element to insert, and a shift amount.

llvm-svn: 194406
2013-11-11 19:11:11 +00:00
Chad Rosier 35575e737c [AArch64] Add support for NEON scalar floating-point convert to fixed-point instructions.
llvm-svn: 194394
2013-11-11 18:04:07 +00:00
Jiangning Liu f4226f1d7b Implement AArch64 Neon instruction set Perm.
llvm-svn: 194123
2013-11-06 03:35:27 +00:00
Jiangning Liu a50e22ca4f Implement AArch64 Neon instruction set Bitwise Extract.
llvm-svn: 194118
2013-11-06 02:25:49 +00:00
Jiangning Liu d7c52676f6 Implement AArch64 Neon Crypto instruction classes AES, SHA, and 3 SHA.
llvm-svn: 194085
2013-11-05 17:42:05 +00:00
Hao Liu d6b40b51c7 Implement AArch64 post-index vector load/store multiple N-element structure class SIMD(lselem-post).
Including following 14 instructions:
4 ld1 insts: post-index load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: post-index load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: post-index store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: post-index store multiple N-element structure from sequential N registers (N = 2,3,4).

llvm-svn: 194043
2013-11-05 03:39:32 +00:00
Kevin Qin 97f6aaa8ad Implemented aarch64 neon intrinsic vcopy_lane with float type.
llvm-svn: 194041
2013-11-05 02:03:59 +00:00
Tim Northover ace0bd4d33 AArch64: use default asm operand printing when modifier inapplicable
If an inline assembly operand has multiple constraints (e.g. "Ir" for immediate
or register) and an operand modifier (E.g. "w" for "print register as wN") then
we need to decide behaviour when the modifier doesn't apply to the constraint.

Previousely produced some combination of an assertion failure and a fatal
error. GCC's behaviour appears to be to ignore the modifier and print the
operand in the default way. This patch should implement that.

llvm-svn: 194024
2013-11-04 23:04:07 +00:00
Chad Rosier 74b65cd811 [AArch64] Add support for NEON scalar fixed-point convert to floating-point instructions.
llvm-svn: 193816
2013-10-31 22:36:59 +00:00
Chad Rosier 20e1f20d69 [AArch64] Add support for NEON scalar shift immediate instructions.
llvm-svn: 193790
2013-10-31 19:28:44 +00:00
Amara Emerson f80f95fcc7 [AArch64] Make the use of FP instructions optional, but enabled by default.
This adds a new subtarget feature called FPARMv8 (implied by NEON), and
predicates the support of the FP instructions and registers on this feature.

llvm-svn: 193739
2013-10-31 09:32:11 +00:00
Chad Rosier be020d0309 [AArch64] Add support for NEON scalar floating-point compare instructions.
llvm-svn: 193691
2013-10-30 15:19:37 +00:00
Weiming Zhao acf48d75e5 add test cases for frameaddr and returnaddr for aarch64
llvm-svn: 193626
2013-10-29 17:01:29 +00:00
Tim Northover d29ddf6713 AArch64: add 'a' inline asm operand modifier
This is used in the Linux kernel, and effectively just means "print an
address".

llvm-svn: 193593
2013-10-29 08:22:33 +00:00
Rafael Espindola 57ec995c37 Convert another llc -filetype=obj test.
llvm-svn: 193538
2013-10-28 21:06:12 +00:00
Rafael Espindola 3a5eecb57c Convert another llc -filetype=obj test.
llvm-svn: 193537
2013-10-28 20:59:41 +00:00
Rafael Espindola 3f018baac0 Convert another llc -filetype=obj test.
llvm-svn: 193536
2013-10-28 20:54:33 +00:00
Rafael Espindola 889a180e5a Convert a llc -filetype=obj test into a llvm-mc test.
llvm-svn: 193534
2013-10-28 20:40:20 +00:00
Amara Emerson c5cae0f20c [AArch64] Fix NZCV reg live-in bug in F128CSEL codegen.
When generating the IfTrue basic block during the F128CSEL pseudo-instruction
handling, the NZCV live-in for the newly created BB wasn't being added. This
caused a fault during MI-sched/live range calculation when the predecessor
for the fall-through BB didn't have a live-in for phys-reg as expected.

llvm-svn: 193316
2013-10-24 08:28:24 +00:00
Chad Rosier e012cb3783 [AArch64] Add the constraint to NEON scalar mla/mls instructions.
llvm-svn: 193117
2013-10-21 20:11:47 +00:00
Chad Rosier fe2f58c8a1 [AArch64] Add support for NEON scalar extract narrow instructions.
llvm-svn: 192970
2013-10-18 14:03:24 +00:00
Chad Rosier 37d29173aa [AArch64] Add support for NEON scalar three register different instruction
class.  The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.

llvm-svn: 192908
2013-10-17 18:12:29 +00:00
Chad Rosier 846a72539c [AArch64] Add support for NEON scalar negate instruction.
llvm-svn: 192843
2013-10-16 21:04:39 +00:00
Chad Rosier 175601d997 [AArch64] Add support for NEON scalar absolute value instruction.
llvm-svn: 192842
2013-10-16 21:04:34 +00:00
Chad Rosier 178b1cefc7 [AArch64] Add support for NEON scalar signed saturating accumulated of unsigned
value and unsigned saturating accumulate of signed value instructions.

llvm-svn: 192800
2013-10-16 16:09:02 +00:00
Chad Rosier 9d51708677 [AArch64] Add support for NEON scalar signed saturating absolute value and
scalar signed saturating negate instructions.

llvm-svn: 192733
2013-10-15 21:18:44 +00:00
Chad Rosier d1f40d760a [AArch64] Add support for NEON scalar integer compare instructions.
llvm-svn: 192596
2013-10-14 14:37:20 +00:00
Kevin Qin a89e7a0e1c Implement aarch64 neon instruction set AdvSIMD (copy).
llvm-svn: 192410
2013-10-11 02:33:55 +00:00
Hao Liu 99eac7ee44 Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem).
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).

llvm-svn: 192361
2013-10-10 17:00:52 +00:00
Rafael Espindola 9558af461d Revert "Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem). Including following 14 instructions: 4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers. ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4). 4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers. st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4)."
This reverts commit r192352. It broke the build.

llvm-svn: 192354
2013-10-10 15:15:17 +00:00
Hao Liu 9123ad8ab9 Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem).
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).

llvm-svn: 192352
2013-10-10 15:01:24 +00:00
Tim Northover 1fdb076a31 AArch64: enable MISched by default.
Substantial SelectionDAG scheduling is going away soon, and is
interfering with Hao's attempts to implement LDn/STn instructions, so
I say we make the leap first.

There were a few reorderings (inevitably) which broke some tests. I
tried to replace them with CHECK-DAG variants mostly, but some too
complex for that to be useful and I just reordered them.

llvm-svn: 192282
2013-10-09 07:53:57 +00:00
Tim Northover 74cf0bd77d AArch64: migrate ADRP relaxation test to be llvm-mc only.
llvm-svn: 192281
2013-10-09 07:53:49 +00:00
Chad Rosier 9849cc6696 [AArch64] Add support for NEON scalar floating-point reciprocal estimate,
reciprocal exponent, and reciprocal square root estimate instructions.

llvm-svn: 192242
2013-10-08 22:09:04 +00:00
Chad Rosier f7ed96ef76 [AArch64] Add support for NEON scalar signed/unsigned integer to floating-point
convert instructions.

llvm-svn: 192231
2013-10-08 20:43:30 +00:00
Chad Rosier b6ceeb9126 [AArch64] Add support for NEON scalar arithmetic instructions:
SQDMULH, SQRDMULH, FMULX, FRECPS, and FRSQRTS.

llvm-svn: 192107
2013-10-07 16:36:15 +00:00
Jiangning Liu ad242fbb71 Implement aarch64 neon instruction set AdvSIMD (Across).
llvm-svn: 192028
2013-10-05 08:22:10 +00:00
Jiangning Liu ac5fd7e5d3 Implement aarch64 neon instruction set AdvSIMD (3V elem).
llvm-svn: 191944
2013-10-04 09:20:44 +00:00
NAKAMURA Takumi b4efd90d2f llvm/test/CodeGen/AArch64/neon-scalar-reduce-pairwise.ll: Use -mtriple here, or aach64-pecoff might be misassumed on win32 hosts.
llvm-svn: 191275
2013-09-24 04:14:29 +00:00
Jiangning Liu 63dc840fc5 Initial support for Neon scalar instructions.
Patch by Ana Pazos.

1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.

llvm-svn: 191263
2013-09-24 02:47:27 +00:00
Kevin Qin 36399e6b68 Implement 3 AArch64 neon instructions : umov smov ins.
llvm-svn: 190839
2013-09-17 02:21:02 +00:00
Jiangning Liu 2878dc8fe7 Implement aarch64 neon instruction set AdvSIMD (3V Diff), covering the following 26 instructions,
SADDL, UADDL, SADDW, UADDW, SSUBL, USUBL, SSUBW, USUBW, ADDHN, RADDHN, SABAL, UABAL, SUBHN, RSUBHN, SABDL, UABDL, SMLAL, UMLAL, SMLSL, UMLSL, SQDMLAL, SQDMLSL, SMULL, UMULL, SQDMULL, PMULL

llvm-svn: 190288
2013-09-09 02:20:27 +00:00
Hao Liu d4aede098f Inplement aarch64 neon instructions in AdvSIMD(shift). About 24 shift instructions:
sshr,ushr,ssra,usra,srshr,urshr,srsra,ursra,sri,shl,sli,sqshlu,sqshl,uqshl,shrn,sqrshrun,sqshrn,uqshr,sqrshrn,uqrshrn,sshll,ushll
 and 4 convert instructions:
      scvtf,ucvtf,fcvtzs,fcvtzu

llvm-svn: 189925
2013-09-04 09:28:24 +00:00
Hao Liu 546bcd2f50 A minor change for an obvous problem caused by r188451:
def imm0_63 : Operand<i32>, ImmLeaf<i32, [{ return Imm >= 0 && Imm < 63;}]>{
As it seems Imm <63 should be Imm <= 63. ImmLeaf is used in pattern match, but there is already a function check the shift amount range, so just remove ImmLeaf. Also add a test to check 63.

llvm-svn: 188911
2013-08-21 17:47:53 +00:00
Daniel Dunbar 9efbedfd35 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Hao Liu cd8b02dce3 Clang and AArch64 backend patches to support shll/shl and vmovl instructions and ACLE functions
llvm-svn: 188451
2013-08-15 08:26:11 +00:00
Stephen Lin 5532f9a9c3 CHECK-LABEL-ify tests
llvm-svn: 188087
2013-08-09 17:50:15 +00:00
Tim Northover 40e9efd725 AArch64: add initial NEON support
Patch by Ana Pazos.

- Completed implementation of instruction formats:
AdvSIMD three same
AdvSIMD modified immediate
AdvSIMD scalar pairwise

- Completed implementation of instruction classes
(some of the instructions in these classes
belong to yet unfinished instruction formats):
Vector Arithmetic
Vector Immediate
Vector Pairwise Arithmetic

- Initial implementation of instruction formats:
AdvSIMD scalar two-reg misc
AdvSIMD scalar three same

- Intial implementation of instruction class:
Scalar Arithmetic

- Initial clang changes to support arm v8 intrinsics.
Note: no clang changes for scalar intrinsics function name mangling yet.

- Comprehensive test cases for added instructions
To verify auto codegen, encoding, decoding, diagnosis, intrinsics.

llvm-svn: 187567
2013-08-01 09:20:35 +00:00
Tim Northover 8f7613ae03 AArch64: add llc-based tests for previous commit.
Better to have tests run even on non-AArch64 platforms.

llvm-svn: 187128
2013-07-25 16:23:55 +00:00
Stephen Lin d24ab20e9b Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Stephen Lin f799e3f944 Convert CodeGen/*/*.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion.
This was done with the following sed invocation to catch label lines demarking function boundaries:
    sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.

llvm-svn: 186258
2013-07-13 20:38:47 +00:00
Stephen Lin 764d8d3d6f Start using CHECK-LABEL in some tests.
llvm-svn: 186163
2013-07-12 14:54:12 +00:00
Stephen Lin 73de7bf5de AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Tim Northover 8625fd8cad AArch64: correct CodeGen of MOVZ/MOVK combinations.
According to the AArch64 ELF specification (4.6.8), it's the
assembler's responsibility to make sure the shift amount is correct in
relocated MOVZ/MOVK instructions.

This wasn't being obeyed by either the MCJIT CodeGen or RuntimeDyldELF
(which happened to work out well for JIT tests). This commit should
make us compliant in this area.

llvm-svn: 185360
2013-07-01 19:23:10 +00:00
Tim Northover 1806f938b5 AArch64: remove accidental test output file.
llvm-svn: 184236
2013-06-18 21:16:53 +00:00
Rafael Espindola 4f60a38f18 Change how we iterate over relocations on ELF.
For COFF and MachO, sections semantically have relocations that apply to them.
That is not the case on ELF.

In relocatable objects (.o), a section with relocations in ELF has offsets to
another section where the relocations should be applied.

In dynamic objects and executables, relocations don't have an offset, they have
a virtual address. The section sh_info may or may not point to another section,
but that is not actually used for resolving the relocations.

This patch exposes that in the ObjectFile API. It has the following advantages:

* Most (all?) clients can handle this more efficiently. They will normally walk
all relocations, so doing an effort to iterate in a particular order doesn't
save time.

* llvm-readobj now prints relocations in the same way the native readelf does.

* probably most important, relocations that don't point to any section are now
visible. This is the case of relocations in the rela.dyn section. See the
updated relocation-executable.test for example.

llvm-svn: 182908
2013-05-30 03:05:14 +00:00
Tim Northover b65f6b0820 Teach ReMaterialization to be more cunning about subregisters
This allows rematerialization during register coalescing to handle
more cases involving operations like SUBREG_TO_REG which might need to
be rematerialized using sub-register indices.

For example, code like:
    v1(GPR64):sub_32 = MOVZ something
    v2(GPR64) = COPY v1(GPR64)
should be convertable to:
    v2(GPR64):sub_32 = MOVZ something

but previously we just gave up in places like this

llvm-svn: 182872
2013-05-29 19:32:06 +00:00
Andrew Trick e2431c64bc Track IR ordering of SelectionDAG nodes 3/4.
Remove the old IR ordering mechanism and switch to new one.  Fix unit
test failures.

llvm-svn: 182704
2013-05-25 03:08:10 +00:00
Rafael Espindola da5d100005 More test coverage for addFrameMove.
llvm-svn: 182051
2013-05-16 20:50:56 +00:00
Rafael Espindola c6b7383bda Add more test coverage for addFrameMove.
llvm-svn: 182017
2013-05-16 15:18:50 +00:00
Tim Northover 885698a25c AArch64: support literal pool access in large memory model.
llvm-svn: 181120
2013-05-04 16:54:07 +00:00
Tim Northover 8ff187df5f AArch64: support large code model for jump-tables
llvm-svn: 181119
2013-05-04 16:54:00 +00:00
Tim Northover 9fc1cddb21 AArch64: implement support for blockaddress in large code model
llvm-svn: 181118
2013-05-04 16:53:53 +00:00
Tim Northover 2dbef3452c AArch64: implement large code model access to global variables.
The MOVZ/MOVK instruction sequence may not be the most efficient (a
literal-pool load could be better) but adding that would require
reinstating the ConstantIslands pass.

For now the sequence is correct, and that's enough. Beware, as of
commit GNU ld does not appear to support the relocations needed for
this. Its primary purpose (for now) will be to support JITed code,
since in that case there is no guarantee of where your code will end
up in memory relative to external symbols it references.

llvm-svn: 181117
2013-05-04 16:53:46 +00:00
Nico Rieck ba848e3bca Replace coff-/elf-dump with llvm-readobj
llvm-svn: 179361
2013-04-12 04:06:46 +00:00
Tim Northover 15410e98d3 AArch64: remove barriers from AArch64 atomic operations.
I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.

llvm-svn: 179005
2013-04-08 08:40:41 +00:00
Hal Finkel 4e05788cc3 Update PEI's virtual-register-based scavenging to support multiple simultaneous mappings
The previous algorithm could not deal properly with scavenging multiple virtual
registers because it kept only one live virtual -> physical mapping (and
iterated through operands in order). Now we don't maintain a current mapping,
but rather use replaceRegWith to completely remove the virtual register as
soon as the mapping is established.

In order to allow the register scavenger to return a physical register killed
by an instruction for definition by that same instruction, we now call
RS->forward(I) prior to eliminating virtual registers defined in I. This
requires a minor update to forward to ignore virtual registers.

These new features will be tested in forthcoming commits.

llvm-svn: 178058
2013-03-26 18:56:54 +00:00
Benjamin Kramer 01b75cc0f2 Test case hygiene.
llvm-svn: 176772
2013-03-09 18:25:40 +00:00
Tim Northover c8f1a5de9f AArch64: specify full triple in test as only Linux works for now.
llvm-svn: 176692
2013-03-08 15:27:30 +00:00
Tim Northover 95f4892d4c AArch64: expand sincos operations, we don't support them.
Patch based on Mans Rullgard's.

llvm-svn: 176688
2013-03-08 13:55:07 +00:00
Tim Northover c3c5c0971d AArch64: be more careful resorting to inefficient addressing for weak vars.
If an otherwise weak var is actually defined in this unit, it can't be
undefined at runtime so we can use normal global variable sequences (ADRP/ADD)
to access it.

llvm-svn: 176259
2013-02-28 14:36:31 +00:00
Tim Northover b9d4fd210b AArch64: don't drop GlobalAddress offset when handling extern_weak decls.
llvm-svn: 176258
2013-02-28 14:36:24 +00:00
Tim Northover 9fafdf6d5a AArch64: Use cbnz instead of cmp/b.ne pair for atomic operations.
llvm-svn: 176253
2013-02-28 13:52:07 +00:00
Tim Northover 3533ad6bbd AArch64: remove ConstantIsland pass & put literals in separate section.
This implements the review suggestion to simplify the AArch64 backend. If we
later discover that we *really* need the extra complexity of the
ConstantIslands pass for performance reasons it can be resurrected.

llvm-svn: 175258
2013-02-15 09:33:43 +00:00
Tim Northover 5466e36fb5 AArch64: refactor frame handling to use movz/movk for overlarge offsets.
In the near future litpools will be in a different section, which means that
any access to them is at least two instructions. This makes the case for a
movz/movk pair (if total offset <= 32-bits) even more compelling.

llvm-svn: 175257
2013-02-15 09:33:26 +00:00
Tim Northover 228d9d3aa2 Implement external weak (ELF) symbols on AArch64
Weakly defined symbols should evaluate to 0 if they're undefined at
link-time. This is impossible to do with the usual address generation
patterns, so we should use a literal pool entry to materlialise the
address.

llvm-svn: 174518
2013-02-06 16:43:33 +00:00
Owen Anderson de89ecf1fc Reapply r174343, with a fix for a scary DAG combine bug where it failed to differentiate between the alignment of the
base point of a load, and the overall alignment of the load.  This caused infinite loops in DAG combine with the
original application of this patch.

ORIGINAL COMMIT LOG:
When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment.  However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.

This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.

llvm-svn: 174431
2013-02-05 19:24:39 +00:00
NAKAMURA Takumi 3753b28cd2 Revert r174343, "When the target-independent DAGCombiner inferred a higher alignment for a load,"
It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts.

llvm-svn: 174374
2013-02-05 14:44:16 +00:00
Owen Anderson a47fdbb032 When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment.  However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.

This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.

llvm-svn: 174343
2013-02-05 06:25:30 +00:00
Tim Northover e3d4236402 Add explicit triples to AArch64 tests
Only Linux is supported at the moment, and other platforms quickly fault. As a
result these tests would fail on non-Linux hosts. It may be worth making the
tests more generic again as more platforms are supported.

llvm-svn: 174170
2013-02-01 11:40:47 +00:00
Tim Northover e0e3aefdd3 Add AArch64 as an experimental target.
This patch adds support for AArch64 (ARM's 64-bit architecture) to
LLVM in the "experimental" category. Currently, it won't be built
unless requested explicitly.

This initial commit should have support for:
    + Assembly of all scalar (i.e. non-NEON, non-Crypto) instructions
      (except the late addition CRC instructions).
    + CodeGen features required for C++03 and C99.
    + Compilation for the "small" memory model: code+static data <
      4GB.
    + Absolute and position-independent code.
    + GNU-style (i.e. "__thread") TLS.
    + Debugging information.

The principal omission, currently, is performance tuning.

This patch excludes the NEON support also reviewed due to an outbreak of
batshit insanity in our legal department. That will be committed soon bringing
the changes to precisely what has been approved.

Further reviews would be gratefully received.

llvm-svn: 174054
2013-01-31 12:12:40 +00:00