Commit Graph

1257 Commits

Author SHA1 Message Date
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
James Molloy bef6e43107 [ARM] Compute known bits for ARMISD::CMOV
We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities.

I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :(

This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case.

llvm-svn: 252168
2015-11-05 15:21:58 +00:00
James Molloy e7d679cf4c [ARM] Combine CMOV into BFI where possible
If we have a CMOV, OR and AND combination such as:
  if (x & CN)
    y |= CM;

And:
  * CN is a single bit;
  * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252057
2015-11-04 16:55:07 +00:00
Tim Northover f8e47e4868 ARM: add support for WatchOS's compact unwind information.
llvm-svn: 251573
2015-10-28 22:56:36 +00:00
Tim Northover 8b40366b54 ARM: teach backend about WatchOS and TvOS libcalls.
The most substantial changes are again for watchOS: libcalls are hard-float if
needed and sincos has a different calling convention.

llvm-svn: 251571
2015-10-28 22:51:16 +00:00
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
Peter Collingbourne 99fac80db2 ARM/ELF: Restore original (pre-r251322) logic for deciding whether to use GOT.
Unbreaks linking with gold, which cannot resolve direct relocations referring
to global symbols.

llvm-svn: 251342
2015-10-26 20:46:44 +00:00
Peter Collingbourne 97aae40880 ARM/ELF: Better codegen for global variable addresses.
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.

This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.

As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.

We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.

This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.

Differential Revision: http://reviews.llvm.org/D13650

llvm-svn: 251322
2015-10-26 18:23:16 +00:00
Craig Topper 8fe40e0ed5 Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC
llvm-svn: 251032
2015-10-22 17:05:00 +00:00
Artyom Skrobov 7fd67e25aa Adding support for TargetLoweringBase::LibCall
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:

- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
  computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall

This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.

The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.

The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.

Reviewers: t.p.northover, rengolin

Subscribers: t.p.northover, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13862

llvm-svn: 250826
2015-10-20 13:14:52 +00:00
Duncan P. N. Exon Smith 9f9559e807 ARM: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250759
2015-10-19 23:25:57 +00:00
Craig Topper 2626094fa1 Make a bunch of static arrays const.
llvm-svn: 250642
2015-10-18 05:15:34 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Oliver Stannard d3d114ba54 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.

llvm-svn: 249565
2015-10-07 16:58:49 +00:00
Chad Rosier 17436bf64e [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 249560
2015-10-07 16:15:40 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Chad Rosier db71abf2d4 [ARM] Push more complex check down to reduce compile time. NFC.
llvm-svn: 249547
2015-10-07 13:40:44 +00:00
Chad Rosier dca46b426f [ARM] Minor refactoring. NFC.
llvm-svn: 249465
2015-10-06 20:58:42 +00:00
Chad Rosier aed910b7d7 [ARM] Minor refactoring. NFC.
llvm-svn: 249464
2015-10-06 20:51:26 +00:00
Chad Rosier 9df4aff86d [ARM] Minor refactoring. NFC.
llvm-svn: 249463
2015-10-06 20:45:45 +00:00
Chad Rosier a087fd21da [ARM] Minor refactoring to improve readability. NFC.
llvm-svn: 249454
2015-10-06 20:23:42 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Artyom Skrobov ad8a0638f7 [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()
supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.

Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.

NFC.

llvm-svn: 248703
2015-09-28 09:44:11 +00:00
Ahmed Bougacha e81610fabb [ARM] Don't generate clrex for pre-v7 targets.
Since r248294, we emit clrex, but it doesn't exist on v6.

llvm-svn: 248640
2015-09-26 00:14:02 +00:00
Saleem Abdulrasool 8e99f50768 ARM: make -Asserts,-Werror=unused-variable build happy
The value was only used in an assertion.  Sink the variable usage into the
assertion.

llvm-svn: 248562
2015-09-25 05:41:02 +00:00
Saleem Abdulrasool fe83b50289 ARM: address WoA division limitation
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

llvm-svn: 248561
2015-09-25 05:15:46 +00:00
Artyom Skrobov cf296444ab [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.

Differential Revision: http://reviews.llvm.org/D12937

llvm-svn: 248519
2015-09-24 17:31:16 +00:00
Ahmed Bougacha 81616a72ea [ARM] Emit clrex in the expanded cmpxchg fail block.
ARM counterpart to r248291:

In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248294
2015-09-22 17:22:58 +00:00
Jeroen Ketema 41681a5329 [ARM] Do not scale vext with a factor
The vext pseudo-instruction takes the number of elements that need to be
extracted, not the number of bytes. Hence, use the number of elements
directly instead of scaling them with a factor.

Reviewers: Silviu Baranga, James Molloy
(not reflected in the differential revision)

Differential Revision: http://reviews.llvm.org/D12974

llvm-svn: 248208
2015-09-21 20:28:04 +00:00
Saleem Abdulrasool 4966f58ac2 ARM: cleanup formatting
clang-format a line which was poorly formatted.  NFC.

llvm-svn: 248110
2015-09-20 03:19:09 +00:00
Sanjay Patel a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Ahmed Bougacha 5246867384 [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha 9d677131c4 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
James Molloy 8c995a93ce [ARM] Do not use vtrn for vectorshuffle if the order is reversed
The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.

Patch by Jeroen Ketema!

llvm-svn: 247254
2015-09-10 08:42:28 +00:00
Ahmed Bougacha 699a9dd7c3 [ARM] Don't abort on variable-idx extractelt in ReconstructShuffle.
The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only
takes constant indices, but it does accept variables.
Bail out for those: we can't use them, as the shuffles we want to
reconstruct do require constant masks.

llvm-svn: 246594
2015-09-01 21:56:00 +00:00
Ahmed Bougacha f9c19da03a [CodeGen] Support (and default to) expanding READCYCLECOUNTER to 0.
For targets that didn't support this, this will let us respect the
langref instead of failing to select.

Note that we don't need to change the 32-bit x86/PPC lowerings (to
account for the result type/# difference) because they're both
custom and bypass type legalization.

llvm-svn: 246258
2015-08-28 01:49:59 +00:00
Reid Kleckner 0e2882345d [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

llvm-svn: 246235
2015-08-27 23:27:47 +00:00
Scott Douglass bdef60462d [ARM] Use AEABI helpers for i64 div and rem
Differential Revision: http://reviews.llvm.org/D12232

llvm-svn: 245830
2015-08-24 09:17:18 +00:00
Scott Douglass d2974a6afa [ARM] Refactor LowerDivRem before adding LowerREM (nfc)
Differential Revision: http://reviews.llvm.org/D12230

llvm-svn: 245829
2015-08-24 09:17:11 +00:00
James Molloy bf17009a97 [ARM] Don't try and custom lower a vNi64 SETCC.
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes.

Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009.

llvm-svn: 245577
2015-08-20 16:33:44 +00:00
Silviu Baranga ad1b19fcb7 [ARM] Add instruction selection patterns for vmin/vmax
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.

Reviewers: rengolin, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12105

llvm-svn: 245439
2015-08-19 14:11:27 +00:00
James Molloy 974838f294 [ARM] Fix crash when targetting CPU without NEON
We emulate a scalar vmin/vmax with NEON instructions as they don't exist in the VFP ISA. So only mark these as legal when NEON is available.

Found here: https://code.google.com/p/chromium/issues/detail?id=521671

llvm-svn: 245231
2015-08-17 19:37:12 +00:00
James Molloy c617be559a Rip out hand-rolled matching code for VMIN, VMAX, VMINNM and VMAXNM
This is no longer needed - SDAGBuilder will do this for us.

llvm-svn: 245197
2015-08-17 07:13:15 +00:00
James Molloy 31117875c2 [ARM] FMINNAN/FMAXNAN of f64 are not legal.
This was my error. We've got f32 marked as legal because they're simulated using a v2f32 instruction, but there's no equivalent for f64.

This will get test coverage imminently when D12015 lands.

llvm-svn: 244916
2015-08-13 17:28:26 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
James Molloy d616c642bb [ARM] Match fminnan/fmaxnan for vector vmin/vmax instead of an intrinsic
Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG.

NFCI.

llvm-svn: 244593
2015-08-11 12:06:28 +00:00
James Molloy ee868b2a3e [ARM] Match fminnum/fmaxnum for vector vminnm/vmaxnm instead of an intrinsic
Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics.

NFCI.

llvm-svn: 244592
2015-08-11 12:06:25 +00:00
James Molloy ea3a687a33 [ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM
NFCI. This replaces another custom ISDNode with a generic equivalent.

llvm-svn: 244591
2015-08-11 12:06:22 +00:00
James Molloy db8ee4b5a9 [ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN.
NFCI. This removes a custom ISDNode.

llvm-svn: 244590
2015-08-11 12:06:15 +00:00