Commit Graph

8885 Commits

Author SHA1 Message Date
Davide Italiano a22ddddfea [Target] Rename X86/ARM Assembly printer to reflect reality.
This shows up a lot profiling LTO testcases with -time-passes, so
better have a non confusing name.

llvm-svn: 286488
2016-11-10 18:39:31 +00:00
Oliver Stannard 18ca2adf2d [ARM] Thumb2 LDR (literal) should accept PC as the destination
The version of this instruction with the .w suffix already correctly accepts
this, but the alias without the .w did not.

Differential Revision: https://reviews.llvm.org/D26499

llvm-svn: 286446
2016-11-10 13:20:41 +00:00
James Molloy b03e0879fc [Thumb1] Move padding earlier when synthesizing TBBs off of the PC
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.

If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.

In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.

llvm-svn: 286107
2016-11-07 13:38:21 +00:00
Saleem Abdulrasool 804e12eeb5 ARM: lower fpowi appropriately for Windows ARM
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI.  Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.

Addresses PR30825!

llvm-svn: 286082
2016-11-06 19:46:54 +00:00
Weiming Zhao 962eaaea9c [Cortex-M0] Atomic lowering
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.

Reviewers: rengolin, efriedma, t.p.northover, jmolloy

Subscribers: efriedma, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26120

llvm-svn: 285969
2016-11-03 21:49:08 +00:00
Chandler Carruth 5589aa60c7 Remove a redundant condition found by PVS-Studio.
Filed http://llvm.org/PR30897 to teach Clang to warn on this kind of
stuff.

llvm-svn: 285945
2016-11-03 17:42:02 +00:00
James Molloy e7d97368f2 Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"
This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 .

llvm-svn: 285912
2016-11-03 14:08:01 +00:00
James Molloy b60d8b1987 [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.

For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 285893
2016-11-03 10:18:20 +00:00
Nirav Dave 0a392a8e7f [ARM][MC] Cleanup ARM Target Assembly Parser
Summary:
Correctly parse end-of-statement tokens and handle preprocessor
end-of-line comments in ARM assembly processor.

Reviewers: rnk, majnemer

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26152

llvm-svn: 285830
2016-11-02 16:22:51 +00:00
Alex Bradbury 58eba09949 [TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.

This patch is a prerequisite for D23563

Differential Revision: https://reviews.llvm.org/D23496

llvm-svn: 285705
2016-11-01 16:32:05 +00:00
James Molloy 70a3d6df52 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]

The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 285690
2016-11-01 13:37:41 +00:00
Saleem Abdulrasool e1aa782bd0 CodeGen: further loosen -O0 CG for WoA division
Generate the slowest possible codepath for noopt CodeGen.  Even trying to be
clever with the negated jump can cause out-of-range jumps.  Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible.  This re-enables the
previous optimization as well as hopefully gives us working code in all cases.

Addresses PR30356!

llvm-svn: 285649
2016-10-31 22:12:37 +00:00
Saleem Abdulrasool 075d2e3c59 ARM: ensure that the Windows DBZ check is in range
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:

    cmp r?, #0
    cbz .Ltrap
    b .Lbody
  .Lbody:
    ...
  .Ltrap:
    udf #249 @ __brkdiv0

This works great most of the time.  However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).

Since this is a matter of correctness, possibly pay a small penalty instead.  We
now form this slightly differently:

    cbnz .Lbody
    udf #249 @ __brkdiv0
  .Lbody:
    ...

The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).

The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.

Addresses PR30532!

llvm-svn: 285312
2016-10-27 16:59:22 +00:00
Sam Parker e7d9505c08 [ARM] Predicate UMAAL selection on hasDSP.
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.

Patch by Vadzim Dambrouski.

Differential Revision: https://reviews.llvm.org/D25890

llvm-svn: 285278
2016-10-27 09:47:10 +00:00
Tim Northover a9cc385664 ARM: don't rely on push/pop reglists being in order when folding SP adjust.
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.

Should fix an issue where we were turning something like:

    push {r8, r4, r7, lr}
    sub sp, #24

into nonsense like:

    push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}

llvm-svn: 285232
2016-10-26 20:01:00 +00:00
Matthias Braun 8b38ffaa98 CodeGen/Passes: Pass MachineFunction as functor arg; NFC
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.

llvm-svn: 285039
2016-10-24 23:23:02 +00:00
Eli Friedman b37864b58d Revert r284580+r284917. ("Synthesize TBB/TBH instructions")
The optimization has correctness issues, so reverting for now to fix tests
on thumb1 targets.

llvm-svn: 284993
2016-10-24 17:20:50 +00:00
James Molloy 2bae8640d7 [ARM] Fix crash in ConstantIslands
tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator.

llvm-svn: 284917
2016-10-22 09:58:37 +00:00
Benjamin Kramer 2a8bef8769 Do a sweep over move ctors and remove those that are identical to the default.
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.

No functionality change intended.

llvm-svn: 284721
2016-10-20 12:20:28 +00:00
Sjoerd Meijer 2fc4cb6f72 Reapply r284571 (with the new tests fixed).
llvm-svn: 284588
2016-10-19 13:43:02 +00:00
James Molloy fbfd173447 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 284580
2016-10-19 12:06:49 +00:00
Sjoerd Meijer 3f5111d363 Revert of r284571 because of failing tests.
llvm-svn: 284572
2016-10-19 07:45:48 +00:00
Sjoerd Meijer a318779263 Checking FP function attribute values and adding more build attribute tests.
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).

Differential Revision: https://reviews.llvm.org/D25625

llvm-svn: 284571
2016-10-19 07:25:06 +00:00
Eli Friedman c0a717ba5b Improve ARM lowering for "icmp <2 x i64> eq".
The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.

Differential Revision: https://reviews.llvm.org/D25713

llvm-svn: 284536
2016-10-18 21:03:40 +00:00
Javed Absar e7c338081a [ARM] Assign cost of scaling for Cortex-R52
This patch assigns cost of the scaling used in addressing for Cortex-R52.

On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.

Differential Revision: http://reviews.llvm.org/D25670

Reviewer: jmolloy
llvm-svn: 284460
2016-10-18 09:08:54 +00:00
Dean Michael Berris 156f6cafc2 [XRay] Support for for tail calls for ARM no-Thumb
This patch adds simplified support for tail calls on ARM with XRay instrumentation.

Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14  -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program

    #include <cstdio>
    #include <cassert>
    #include <xray/xray_interface.h>

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
      std::printf("In fC()\n");
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
      std::printf("In fB()\n");
      fC();
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
      std::printf("In fA()\n");
      fB();
    }

    // Avoid infinite recursion in case the logging function is instrumented (so calls logging
    //   function again).
    [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
    {
      printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
    }

    int main(int argc, char* argv[]) {
      __xray_set_handler(simplyPrint);

      printf("Patching...\n");
      __xray_patch();
      fA();

      printf("Unpatching...\n");
      __xray_unpatch();
      fA();

      return 0;
    }

gives the following output:

    Patching...
    XRay: functionId=3 type=0.
    In fA()
    XRay: functionId=3 type=1.
    XRay: functionId=2 type=0.
    In fB()
    XRay: functionId=2 type=1.
    XRay: functionId=1 type=0.
    XRay: functionId=1 type=1.
    In fC()
    Unpatching...
    In fA()
    In fB()
    In fC()

So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().

Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).

Differential Revision: https://reviews.llvm.org/D25030

llvm-svn: 284456
2016-10-18 05:54:15 +00:00
Davide Italiano e9cdb24f67 [ArmFastISel] Kill dead code. NFCI.
llvm-svn: 284320
2016-10-16 01:09:39 +00:00
Eric Christopher 445c952bd0 Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help
readability a bit.

llvm-svn: 284202
2016-10-14 05:47:37 +00:00
Javed Absar 85874a9360 [ARM]: Assign cost of scaling used in addressing mode for ARM cores
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.

For instance:

LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]

Above, (1) takes less cycles than (2).

By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.

Differential Revision: http://reviews.llvm.org/D24857

Reviewers: jmolloy, rengolin
llvm-svn: 284127
2016-10-13 14:57:43 +00:00
Reid Kleckner bdfc05ff93 Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
Reverts r283938 to reinstate r283867 with a fix.

The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.

llvm-svn: 283942
2016-10-11 21:14:03 +00:00
Reid Kleckner f4876beb2b Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
This reverts r283867.

This appears to be an infinite loop:

    while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) {
      if (HiRegsToSave.count(*HiRegToSave)) {
        ...

        CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end());
        HiRegToSave =
            findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end());
      }
    }

llvm-svn: 283938
2016-10-11 20:54:41 +00:00
NAKAMURA Takumi e9587bd771 ARMMachineFunctionInfo.cpp: Add an initializer of ARMFunctionInfo::ReturnRegsCount in the explicit ctor.
It caused crash since r283867.

llvm-svn: 283909
2016-10-11 17:38:30 +00:00
NAKAMURA Takumi e61de02016 Reformat.
llvm-svn: 283908
2016-10-11 17:38:25 +00:00
Daniel Jasper b617286089 Silence unused warning in non-assert builds.
llvm-svn: 283899
2016-10-11 16:22:36 +00:00
Oliver Stannard d2083fb356 [Thumb] Save/restore high registers in Thumb1 pro/epilogues
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.

This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.

In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.

We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.

There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.

In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.

Differential Revision: https://reviews.llvm.org/D24228

llvm-svn: 283867
2016-10-11 10:12:25 +00:00
Oliver Stannard 50a74393c2 [ARM] Fix registers clobbered by SjLj EH on soft-float targets
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.

Differential Revision: https://reviews.llvm.org/D25180

llvm-svn: 283866
2016-10-11 10:06:59 +00:00
Peter Collingbourne 0da86301ad Revert r283690, "MC: Remove unused entities."
llvm-svn: 283814
2016-10-10 22:49:37 +00:00
Alexandros Lamprineas 20e9ddba73 [ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEON
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.

The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.

Differential Revision: https://reviews.llvm.org/D25281

llvm-svn: 283763
2016-10-10 16:01:54 +00:00
Mehdi Amini f42454b94b Move the global variables representing each Target behind accessor function
This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702
2016-10-09 23:00:34 +00:00
Peter Collingbourne cc723cccab MC: Remove unused entities.
llvm-svn: 283691
2016-10-09 04:39:13 +00:00
Peter Collingbourne 5c924d7117 Target: Remove unused entities.
llvm-svn: 283690
2016-10-09 04:38:57 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Javed Absar 9797989ca7 [ARM]: add missing switch case for cortex-r52
Adds a missing switch case for handling cortex-r52
in init-subtarget-features.

llvm-svn: 283551
2016-10-07 13:41:55 +00:00
Martin Storsjo 04864f45b2 [ARM] Reapply: Use __rt_div functions for divrem on Windows
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.

This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D25332

llvm-svn: 283550
2016-10-07 13:28:53 +00:00
Javed Absar fb4b6e8db9 [ARM]: Add Cortex-R52 target to LLVM
This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. 
Cortex-R52 implements the ARMv8-R architecture.

llvm-svn: 283542
2016-10-07 12:06:40 +00:00
Oliver Stannard 4df1cc0b00 [ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.

Differential Revision: https://reviews.llvm.org/D24462

llvm-svn: 283530
2016-10-07 08:48:24 +00:00
Mehdi Amini 68c6c8cd78 Use StringRef in ARMELFStreamer (NFC)
llvm-svn: 283529
2016-10-07 08:48:07 +00:00
Mehdi Amini a0016ec95f Use StringReg in TargetParser APIs (NFC)
llvm-svn: 283527
2016-10-07 08:37:29 +00:00
Diana Picus 6341e46cd1 Revert "[ARM] Use __rt_div functions for divrem on Windows"
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'

It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.

llvm-svn: 283442
2016-10-06 11:24:29 +00:00
James Molloy 6215fad0e9 [ARM] Constant pool promotion - fix alignment calculation
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.

Testcase added.

llvm-svn: 283423
2016-10-06 07:56:00 +00:00
Martin Storsjo f997759aef [ARM] Use __rt_div functions for divrem on Windows
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D24076

llvm-svn: 283383
2016-10-05 21:08:02 +00:00
James Molloy b7de497cb9 [Thumb] Don't try and emit LDRH/LDRB from the constant pool
This is not a valid encoding - these instructions cannot do PC-relative addressing.

The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.

llvm-svn: 283323
2016-10-05 14:52:13 +00:00
Mehdi Amini 5b00770c35 Use StringRef in ARMConstantPool APIs (NFC)
llvm-svn: 283293
2016-10-05 01:41:06 +00:00
Sjoerd Meijer 535529b41c Consistent fp denormal mode names. NFC.
This fixes the inconsistency of the fp denormal option names: in LLVM this was
DenormalType, but in Clang this is DenormalMode which seems better.

Differential Revision: https://reviews.llvm.org/D24906

llvm-svn: 283192
2016-10-04 08:03:36 +00:00
Sjoerd Meijer 4dbe73c1ed [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.

Patch mostly by Pablo Barrio.

Differential Revision: https://reviews.llvm.org/D25077

llvm-svn: 283098
2016-10-03 10:12:32 +00:00
Mehdi Amini 48878ae579 Use StringRef in Datalayout API (NFC)
llvm-svn: 283013
2016-10-01 05:57:55 +00:00
Mehdi Amini 217b246484 Revert "Use StringRef in Datalayout API (NFC)"
This reverts commit r283009. Bots are broken.

llvm-svn: 283011
2016-10-01 05:12:48 +00:00
Mehdi Amini 29baf9c0e1 Use StringRef in Datalayout API (NFC)
llvm-svn: 283009
2016-10-01 04:17:59 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
James Molloy 9abb2fa5bb [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282387
2016-09-26 07:26:24 +00:00
James Molloy 85124c76fc Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882.

llvm-svn: 282249
2016-09-23 13:35:43 +00:00
James Molloy 1ce54d6be2 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282241
2016-09-23 12:15:58 +00:00
Nico Weber 903859c0e4 Revert r281715, it caused PR30475
llvm-svn: 282076
2016-09-21 15:33:24 +00:00
Oliver Stannard e1f6dc59ce [Thumb] Set correct initial mapping symbol for big-endian thumb
The initial mapping symbol state is set from the triple, but we only checked
for the little-endian thumb triple, so could end up with an ARM mapping symbol
for big-endian thumb.

Differential Revision: https://reviews.llvm.org/D24553

llvm-svn: 281894
2016-09-19 09:21:45 +00:00
Tim Northover eaee28b5ca ARM: check alignment before transforming ldr -> ldm (or similar).
ldm and stm instructions always require 4-byte alignment on the pointer, but we
weren't checking this before trying to reduce code-size by replacing a
post-indexed load/store with them. Unfortunately, we were also dropping this
incormation in DAG ISel too, but that's easy enough to fix.

llvm-svn: 281893
2016-09-19 09:11:09 +00:00
Dean Michael Berris 4640154446 [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 281878
2016-09-19 00:54:35 +00:00
Nirav Dave 2364748a49 Defer asm errors to post-statement failure
Recommitting after fixing AsmParser initialization and X86 inline asm
error cleanup.

Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.

As part of this many minor cleanups to the Parser:

* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
  and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
  now fixed.

These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.

Reviewers: rnk, majnemer

Subscribers: aemerson, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D24047

llvm-svn: 281762
2016-09-16 18:30:20 +00:00
Sjoerd Meijer 227825346e Reverting r281719, this is causing buildbot failures and timeouts again.
llvm-svn: 281722
2016-09-16 13:16:52 +00:00
Sjoerd Meijer 23385c87a4 This is an attempt to reapply r280808: [ARM] Lower UDIV+UREM to UDIV+MLS
(and the same for SREM)

This was causing buildbot failures earlier (time outs in the LNT suite).
However, we haven't been able to reproduce this and are suspecting this
was caused by another (reverted) patch.

llvm-svn: 281719
2016-09-16 12:10:09 +00:00
James Molloy 0dc4708fca [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 281715
2016-09-16 10:17:04 +00:00
Eric Christopher 4367c7fb9a Move the Mangler from the AsmPrinter down to TLOF and clean up the
TLOF API accordingly.

llvm-svn: 281708
2016-09-16 07:33:15 +00:00
Evgeniy Stepanov a0601a40f7 Revert "[ARM] Promote small global constants to constant pools"
This reverts r281604, which adds text relocations to ARM binaries.

llvm-svn: 281645
2016-09-15 19:13:32 +00:00
James Molloy fe7fd879d7 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

llvm-svn: 281604
2016-09-15 12:30:27 +00:00
Matt Arsenault 1b9fc8ed65 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Evgeniy Stepanov e97d3b90b9 Revert "[ARM] Promote small global constants to constant pools"
Breaks Android tests by introducing text relocations to ARM binaries.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/25362/steps/run%20asan%20lit%20tests%20%5Barm%2Fbullhead-userdebug%2FMTC20F%5D/logs/stdio

llvm-svn: 281526
2016-09-14 20:02:30 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Matt Arsenault a2b036e88b AArch64: Use TTI branch functions in branch relaxation
The main change is to return the code size from
InsertBranch/RemoveBranch.

Patch mostly by Tim Northover

llvm-svn: 281505
2016-09-14 17:23:48 +00:00
Sanjay Patel 284582b6d4 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI
llvm-svn: 281498
2016-09-14 16:54:10 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
James Molloy 13065b00ba [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281484
2016-09-14 14:47:27 +00:00
James Molloy 9790d8f81d Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"
This reverts commit r281323. It caused chromium test failures and a selfhost failure.

llvm-svn: 281451
2016-09-14 09:45:28 +00:00
Sjoerd Meijer 724023a1ec This reapplies r281304. The issue was that I had missed
to copy the new isAdd field in the tablegen data structure.

llvm-svn: 281447
2016-09-14 08:20:03 +00:00
Nico Weber e204c48d16 Revert r281336 (and r281337), it caused PR30372.
llvm-svn: 281361
2016-09-13 18:17:00 +00:00
Nirav Dave 9fa8af2180 Defer asm errors to post-statement failure
Recommitting after fixing AsmParser Initialization.

Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.

As part of this many minor cleanups to the Parser:

* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
  and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
  now fixed.

These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.

Reviewers: rnk, majnemer

Subscribers: aemerson, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D24047

llvm-svn: 281336
2016-09-13 13:55:06 +00:00
James Molloy 043d613791 Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r281314. Speculatively revert as it's possible this caused linker errors: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19656

llvm-svn: 281327
2016-09-13 12:45:51 +00:00
Pablo Barrio bb6984d401 [ARM] Add ".code 32" to functions in the ARM instruction set
Before, only Thumb functions were marked as ".code 16". These
".code x" directives are effective until the next directive of its
kind is encountered. Therefore, in code with interleaved ARM and
Thumb functions, it was possible to declare a function as ARM and
end up with a Thumb function after assembly. A test has been added.

An existing test has also been fixed to take this change into
account.

Reviewers: aschwaighofer, t.p.northover, jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D24337

llvm-svn: 281324
2016-09-13 12:18:15 +00:00
James Molloy d246c598de [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 281323
2016-09-13 12:12:32 +00:00
Peter Smith 85bbda191d [ARM] Support ldr.w in pseudo instruction ldr rd,=immediate
The changes made in r269352, r269353 and r269354 to support the 
transformation of the ldr rd,=immediate to mov introduced a regression
from 3.8 (ldr.w rd, =immediate) not supported.

This change puts support back in for ldr.w by means of a t2InstAlias for
the .w form. The .w is ignored in ARM state and propagated to the ldr in
Thumb2.

llvm-svn: 281319
2016-09-13 11:15:51 +00:00
James Molloy 3e4bc66134 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281314
2016-09-13 10:28:11 +00:00
Sjoerd Meijer 520a18df9c Revert of r281304 as it is causing build bot failures in hexagon
hwloop regression tests. These tests pass locally; will be investigating
where these differences come from.

llvm-svn: 281306
2016-09-13 08:51:59 +00:00
Sjoerd Meijer 05453991fe This adds a new field isAdd to MCInstrDesc. The ARM and Hexagon instruction
descriptions now tag add instructions, and the Hexagon backend is using this to
identify loop induction statements.

Patch by Sam Parker and Sjoerd Meijer.

Differential Revision: https://reviews.llvm.org/D23601

llvm-svn: 281304
2016-09-13 08:08:06 +00:00
Eric Christopher 04c7db31e8 Temporarily Revert "[MC] Defer asm errors to post-statement failure" as it's causing errors on the sanitizer bots.
This reverts commit r281249.

llvm-svn: 281280
2016-09-13 00:19:29 +00:00
Nico Weber 7c31d0ebc0 Revert r281215, it caused PR30358.
llvm-svn: 281263
2016-09-12 21:40:50 +00:00
Nirav Dave c0c0f7a196 [MC] Defer asm errors to post-statement failure
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.

As part of this many minor cleanups to the Parser:

* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
  and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
  now fixed.

These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.

Reviewers: rnk, majnemer

Subscribers: aemerson, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D24047

llvm-svn: 281249
2016-09-12 20:03:02 +00:00
James Molloy 3d06ff22b7 Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r281213. It made a bot go bang: http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/14625

llvm-svn: 281228
2016-09-12 16:18:23 +00:00
James Molloy 1e1b56bd48 [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 281215
2016-09-12 14:30:48 +00:00
James Molloy 8f82d45ff4 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281213
2016-09-12 13:42:16 +00:00
Duncan P. N. Exon Smith 1872096f1e CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MI
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at
the end (since r281168 and r281170), implement
MachineBasicBlock::reverse_iterator directly on top of an
ilist::reverse_iterator by adding an IsReverse template parameter to
MachineInstrBundleIterator.  This replaces another hard-to-reason-about
use of std::reverse_iterator on list iterators, matching the changes for
ilist::reverse_iterator from r280032 (see the "out of scope" section at
the end of that commit message).  MachineBasicBlock::reverse_iterator
now has a handle to the current node and has obvious invalidation
semantics.

r280032 has a more detailed explanation of how list-style reverse
iterators (invalidated when the pointed-at node is deleted) are
different from vector-style reverse iterators like std::reverse_iterator
(invalidated on every operation).  A great motivating example is this
commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp.

Note: If your out-of-tree backend deletes instructions while iterating
on a MachineBasicBlock::reverse_iterator or converts between
MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator,
you'll need to update your code in similar ways to r280032.  The
following table might help:

                  [Old]              ==>             [New]
        delete &*RI, RE = end()                   delete &*RI++
        RI->erase(), RE = end()                   RI++->erase()
      reverse_iterator(I)                 std::prev(I).getReverse()
      reverse_iterator(I)                          ++I.getReverse()
    --reverse_iterator(I)                            I.getReverse()
      reverse_iterator(std::next(I))                 I.getReverse()
                RI.base()                std::prev(RI).getReverse()
                RI.base()                         ++RI.getReverse()
              --RI.base()                           RI.getReverse()
     std::next(RI).base()                           RI.getReverse()

(For more details, have a look at r280032.)

llvm-svn: 281172
2016-09-11 18:51:28 +00:00