Commit Graph

1383 Commits

Author SHA1 Message Date
Rafael Espindola 3888bdb022 Refactor duplicated code. NFC.
llvm-svn: 272901
2016-06-16 15:22:01 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Benjamin Kramer 46e38f3678 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Saleem Abdulrasool 532dcbc2c5 ARM: correct TLS access on WoA
TLS access requires an offset from the TLS index.  The index itself is the
section-relative distance of the symbol.  For ARM, the relevant relocation
(IMAGE_REL_ARM_SECREL) is applied as a constant.  This means that the value may
not be an immediate and must be lowered into a constant pool.  This offset will
not be base relocated.  We were previously emitting the actual address of the
symbol which would be base relocated and would therefore be the vaue offset by
the ImageBase + TLS Offset.

llvm-svn: 271974
2016-06-07 03:15:07 +00:00
Sjoerd Meijer d906bf1369 RAS extensions are part of ARMv8.2-A. This change enables them by introducing a
new instruction to ARM and AArch64 targets and several system registers.

Patch by: Roger Ferrer Ibanez and Oliver Stannard

Differential Revision: http://reviews.llvm.org/D20282

llvm-svn: 271670
2016-06-03 14:03:27 +00:00
Rafael Espindola 41410cc812 Avoid a load for local functions.
llvm-svn: 271437
2016-06-01 21:57:11 +00:00
Rafael Espindola 7ad97b2fe4 Add a use of shouldAssumeDSOLocal to ARM.
Now this code path knows about position independent executables.

llvm-svn: 271290
2016-05-31 15:31:55 +00:00
Tim Northover b5ece527a1 ARM: stop emitting blx instructions for most calls on MachO.
I'm really not sure why we were in the first place, it's the linker's job to
convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls
that could be locally resolved completely unencodable since all offsets to BLX
are multiples of 4.

rdar://26182344

llvm-svn: 269101
2016-05-10 19:17:47 +00:00
Craig Topper 33772c5375 [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior.
llvm-svn: 267853
2016-04-28 03:34:31 +00:00
Ahmed Bougacha 128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
Craig Topper d8d6be4f99 [ARM] Expand vector ctlz_zero_undef so it becomes ctlz.
The default is Legal, which results in 'Cannot select' errors.

llvm-svn: 267521
2016-04-26 05:04:37 +00:00
Craig Topper edb4a6ba98 [ARM] Expand v1i64 and v2i64 ctlz.
The default is legal, which results in 'Cannot select' errors.

llvm-svn: 267520
2016-04-26 05:04:33 +00:00
Silviu Baranga 82d04260b7 [ARM] Add support for the X asm constraint
Summary:
This patch adds support for the X asm constraint.

To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).

Fixes PR26493

Reviewers: t.p.northover, echristo, rengolin

Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19061

llvm-svn: 267411
2016-04-25 14:29:18 +00:00
Saleem Abdulrasool 9611518646 ARM: fix __chkstk Frame Setup on WoA
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation.  We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.

Adjust the ISelLowering to mark Kills and FrameSetup as well.

This partially resolves PR27480.

llvm-svn: 267361
2016-04-24 20:12:48 +00:00
Tim Northover 1ee27c74cb ARM: fix assertion failure on -O0 cmpxchg.
Because lowering of CMP_SWAP_64 occurs during type legalization, there can be
i64 types produced by more than just a BUILD_PAIR or similar. My initial tests
used just incoming function args.

llvm-svn: 266828
2016-04-19 22:25:02 +00:00
Marcin Koscielnicki 3fdc257d6a [AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer.  I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.

This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.

Differential Revision: http://reviews.llvm.org/D19098

llvm-svn: 266818
2016-04-19 20:51:05 +00:00
Tim Northover b629c77692 ARM: use a pseudo-instruction for cmpxchg at -O0.
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.

Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.

Should fix the 32-bit part of PR25526.

llvm-svn: 266679
2016-04-18 21:48:55 +00:00
Matthias Braun 46b0f03e12 TargetLowering: Factor out common code for tail call eligibility checking; NFC
llvm-svn: 266270
2016-04-14 01:10:42 +00:00
Matthias Braun 707e02c273 ARM: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18901

llvm-svn: 266253
2016-04-13 21:43:25 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Manman Ren 802cd6f9d7 Swift Calling Convention: swiftcc for ARM.
Differential Revision: http://reviews.llvm.org/D18769

llvm-svn: 265482
2016-04-05 22:44:44 +00:00
Matthias Braun 870c34f0cf ARM, AArch64, X86: Check preserved registers for tail calls.
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.

This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.

Related to rdar://24207743

Differential Revision: http://reviews.llvm.org/D18680

llvm-svn: 265329
2016-04-04 18:56:13 +00:00
James Y Knight e6a4646372 Remove useless check for ThreadModel==Single in ARMISelLowering. NFC.
ThreadModel::Single is already handled already by ARMPassConfig adding
LowerAtomicPass to the pass list, which lowers all atomics to non-atomic
ops and deletes fences.

So by the time we get to ISel, there's no atomic fences left, so they
don't need special handling.

llvm-svn: 265178
2016-04-01 19:33:19 +00:00
Benjamin Kramer 569efd2cfd [ARM] Expand v1i64 and v2i64 ctpop.
The default is legal, which results in 'Cannot select' errors. This is
triggered during selfhost due to a recent cost model change.

llvm-svn: 265040
2016-03-31 19:42:04 +00:00
Matthias Braun 8d41436004 CodeGen: Factor out code for tail call result compatibility check; NFC
llvm-svn: 264959
2016-03-30 22:46:04 +00:00
Saleem Abdulrasool 750a90df6a ARM: maintain BB ordering when expanding WIN__DBZCHK
It is possible to have a fallthrough MBB prior to MBB placement.  The original
addition of the BB would result in reordering the BB as not preceding the
successor.  Because of the fallthrough nature of the BB, we could end up
executing incorrect code or even a constant pool island!  Insert the spliced BB
into the same location to avoid that.

Thanks to Tim Northover for invaluable hints and Fiora for the discussion on
what may have been occurring!

llvm-svn: 264454
2016-03-25 19:48:06 +00:00
Saleem Abdulrasool 0dab98d926 ARM: fix optimised division on WoA
We did not have an explicit branch to the continuation BB.  When the check was
hoisted, this could permit control follow to fall through into the division
trap.  Add the explicit branch to the continuation basic block to ensure that
code execution is correct.

llvm-svn: 264370
2016-03-25 00:34:11 +00:00
Peter Collingbourne 86b9fbe980 ARM: Better codegen for 64-bit compares.
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.

Before:

	push	{r7, lr}
	cmp	r0, r2
	mov.w	r0, #0
	mov.w	r12, #0
	it	hs
	movhs	r0, #1
	cmp	r1, r3
	it	ge
	movge.w	r12, #1
	it	eq
	moveq	r12, r0
	cmp.w	r12, #0
	bne	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

After:

	push	{r7, lr}
	subs	r0, r0, r2
	sbcs.w	r0, r1, r3
	bge	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

Saves around 80KB in Chromium's libchrome.so.

Some notes on this patch:

- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
  introduced (nothing else needs them). However, they are necessary in
  order to avoid poor codegen, and they seem similar to existing combines
  in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
  (brcond Compare)).

- No support for Thumb-1. This is in principle possible, but we'd need
  to implement ARMISD::SUBE for Thumb-1.

Differential Revision: http://reviews.llvm.org/D15256

llvm-svn: 263962
2016-03-21 18:00:02 +00:00
Manman Ren 4865d89653 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Tim Northover 498c56c240 ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.
llvm-svn: 263741
2016-03-17 20:10:28 +00:00
Saleem Abdulrasool 071a099102 ARM: Revert SVN r253865, 254158, fix windows division
The two changes together weakened the test and caused a regression with division
handling in MSVC mode.  They were applied to avoid an assertion being triggered
in the block frequency analysis.  However, the underlying problem was simply
being masked rather than solved properly.  Address the actual underlying problem
and revert the changes.  Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.

The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK).  We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them.  This would result in assertions being triggered
in the block frequency analysis pass.

Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code.  Update this with new tests that actually ensure that we do not
regress on the code generation.

llvm-svn: 263714
2016-03-17 14:10:49 +00:00
James Y Knight f44fc5219f Tweak some atomics functions in preparation for larger changes; NFC.
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
  '__sync' libcalls and '__atomic' libcalls, and this function is for
  the '__sync' ones.

- getInsertFencesForAtomic() has been replaced with
  shouldInsertFencesForAtomic(Instruction), so that the decision can be
  made per-instruction. This functionality will be used soon.

- emitLeadingFence/emitTrailingFence are no longer called if
  shouldInsertFencesForAtomic returns false, and thus don't need to
  check the condition themselves.

llvm-svn: 263665
2016-03-16 22:12:04 +00:00
Sanjay Patel 7506852709 [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel 5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Roman Levenstein 2792b3f02f Add support for a preserve_most calling convention to the AArch64 backend.
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.

There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.

Differential Revision: http://reviews.llvm.org/D18016

llvm-svn: 263092
2016-03-10 04:35:09 +00:00
Renato Golin 175c6d6d95 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262738
2016-03-04 19:19:36 +00:00
Renato Golin 3d78271eac Revert "[ARM] Merging 64-bit divmod lib calls into one"
This reverts commit r262507, which broke some ARM buildbots.

llvm-svn: 262594
2016-03-03 08:57:44 +00:00
Renato Golin 93e42d9934 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262507
2016-03-02 19:35:45 +00:00
Junmo Park 453f4aa4dd [ARM] fix initialization of PredictableSelectIsExpensive
Summary:
If we want classify OoO or not, using getSchedModel().isOutOfOrder()
could be more proper way than using Subtarget->isLikeA9().

Reviewers: jmolloy, rengolin

Differential Revision: http://reviews.llvm.org/D17433

llvm-svn: 261623
2016-02-23 09:56:58 +00:00
Junmo Park 1108ab059c Minor code cleanups. NFC.
llvm-svn: 261294
2016-02-19 01:46:04 +00:00
Ahmed Bougacha 93cff7fb82 [CodeGen] Document and use getConstant's splat-building feature. NFC.
Differential Revision: http://reviews.llvm.org/D17229

llvm-svn: 260901
2016-02-15 18:07:29 +00:00
Ahmed Bougacha f8dfb47c02 [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Saleem Abdulrasool f36005a358 ARM: support TLS for WoA
Add support for TLS access for Windows on ARM.  This generates a similar access
to MSVC for ARM.

The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call.  The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.

llvm-svn: 259676
2016-02-03 18:21:59 +00:00
Renato Golin 6027dd38ef [ARM] Move GNUEABI divmod to __aeabi_divmod*
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.

Fixes PR26450.

llvm-svn: 259657
2016-02-03 16:10:54 +00:00
Matthias Braun b30f2f5141 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00
Tim Northover 042a6c1fe1 ARMv7k: base ABI decision on v7k Arch rather than watchos OS.
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.

llvm-svn: 258975
2016-01-27 19:32:29 +00:00
Manman Ren e5f807f928 CXX_FAST_TLS calling convention: fix issue on ARM.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

PR26136

llvm-svn: 257930
2016-01-15 20:24:11 +00:00
Bradley Smith 433c22e35c [ARM] Add ARMv8-A semaphore/atomic instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257882
2016-01-15 10:26:51 +00:00
Bradley Smith 519563e371 [ARM] Add SDIV/UDIV instructions to ARMv8-M Baseline
llvm-svn: 257880
2016-01-15 10:25:35 +00:00
Manman Ren 5e9e65e705 CXX_FAST_TLS calling convention: performance improvement for ARM.
This is the same change on ARM as r255821 on AArch64.
rdar://9001553

llvm-svn: 257424
2016-01-12 00:47:18 +00:00
Manman Ren 1602605bf8 CXX_FAST_TLS calling convention: Add support for ARM on Darwin.
rdar://9001553

llvm-svn: 257417
2016-01-11 23:50:43 +00:00
Weiming Zhao 4b3b13d3bc RBIT Instruction only available for ARMv6t2 and above.
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.

RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.

Patch by Z. Zheng <zhaoshiz@codeaurora.org>

Reviewers: apazos, jmolloy, weimingz

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15932

llvm-svn: 257188
2016-01-08 18:43:41 +00:00
Eric Christopher b793230797 Add some testing for thumb1 and thumb2 inline asm immediate constraints
and fix a couple of bugs on inspection.

Also fixes PR26061.

llvm-svn: 257122
2016-01-08 00:34:44 +00:00
Tim Northover bd41cf880c ARM: support TLS accesses on Darwin platforms
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.

llvm-svn: 257039
2016-01-07 09:03:03 +00:00
Chad Rosier 353d71914a Remove extra whitespace. NFC.
llvm-svn: 256173
2015-12-21 18:08:05 +00:00
Cong Hou c106989fd5 Normalize MBB's successors' probabilities in several locations.
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).


Differential revision: http://reviews.llvm.org/D15259

llvm-svn: 255455
2015-12-13 09:26:17 +00:00
Hal Finkel cd8664c3c2 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Ahmed Bougacha 97564c3a1b [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

llvm-svn: 255089
2015-12-09 01:19:50 +00:00
Quentin Colombet 901f036353 [ARM] When a bitcast is about to be turned into a VMOVDRR, try to combine it
with its source instead of forcing the values on GPRs.

This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.

rdar://problem/23691584 

llvm-svn: 254684
2015-12-04 01:53:14 +00:00
Tim Northover f520eff782 AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.

llvm-svn: 254524
2015-12-02 18:12:57 +00:00
Cong Hou d97c100dc4 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
(This is the second attempt to submit this patch. The first caused two assertion
 failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)

The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254377
2015-12-01 05:29:22 +00:00
Hans Wennborg 1dbaf67537 Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces."
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."

Asserts were firing in Chromium builds. See PR25687.

llvm-svn: 254366
2015-12-01 03:49:42 +00:00
Cong Hou fa1917c673 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254348
2015-12-01 00:02:51 +00:00
Martell Malone d12292480a ARM: address WOA unsigned division overflow crash
Building on r253865 the crash is not limited to signed overflows.

Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.

llvm-svn: 254158
2015-11-26 15:34:03 +00:00
Artyom Skrobov 314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Martell Malone a6b867eb0d ARM: address WoA division overflow crash
Disable custom handling of signed 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit integer overflow crashes.

llvm-svn: 253865
2015-11-23 13:11:39 +00:00
James Molloy 2018091e87 Properly check if a CMPZ node is in fact comparing against zero
This was left implicit and never ever checked, which means we could have a CMPZ against some non-zero value and we were carrying on with BFI conversion regardless.

Caught by Oliver Stannard using csmith; regression test added.

llvm-svn: 253195
2015-11-16 10:49:25 +00:00
James Molloy b564098c62 [ARM] Replace ARMISD::RBIT with ISD::BITREVERSE
ISD::BITREVERSE matches "rbit" completely, so remove ARMISD::RBIT and mark ISD::BITREVERSE as legal, adding a test for lowering.

llvm-svn: 253047
2015-11-13 16:05:22 +00:00
James Molloy 8e99e97f2a [ARM] CMOV->BFI combining: handle both senses of CMPZ
I completely misunderstood what ARMISD::CMPZ means. It's not "compare equal to zero", it's "compare, only setting the zero/Z flag". It can either be equal-to-zero or not-equal-to-zero, and we weren't checking what sense it was.

If it's equal-to-zero, we can swap the operands around and pretend like it is not-equal-to-zero, which is both a bug fix and lets us handle more cases.

llvm-svn: 252891
2015-11-12 13:49:17 +00:00
Diego Novillo 0767ae5896 Properly fix unused variable in disable-assert builds.
I missed the side-effects of ParseBFI in my previous attempt (r252748).
Thanks dblaikie for the suggestion of adding a void use of the unused
variable instead.

llvm-svn: 252751
2015-11-11 16:39:22 +00:00
Diego Novillo 29f88a2460 Remove unused variable in disable-assert builds. NFC.
llvm-svn: 252748
2015-11-11 16:14:52 +00:00
James Molloy ce12c92f66 [ARM] Combine BFIs together
If we have a chain of BFIs, we may be able to combine several together into one merged BFI. We can do this if the "from" bits from one BFI OR'd with the "from" bits from the other BFI form a contiguous range, and the same with the "to" bits.

llvm-svn: 252740
2015-11-11 15:40:40 +00:00
Sanjay Patel af1b48bfdc [ARM] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2
implementation.

The net result of allowing this speculation for the regression tests in this patch is
that we get this code:

ctlz:               
  clz  r0, r0
  bx  lr
cttz:              
  rbit  r0, r0
  clz  r0, r0
  bx  lr

Instead of:

ctlz:    
  cmp  r0, #0
  moveq  r0, #32
  clzne  r0, r0
  bx  lr
cttz:     
  cmp   r0, #0
  moveq  r0, #32
  rbitne  r0, r0
  clzne  r0, r0
  bx  lr

This will help solve a general speculation/despeculation problem noted in PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818

Differential Revision: http://reviews.llvm.org/D14469

llvm-svn: 252639
2015-11-10 19:24:31 +00:00
James Molloy 9d55f19cfa Reapply "[ARM] Combine CMOV into BFI where possible"
Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh!

Original commit message:

If we have a CMOV, OR and AND combination such as:
  if (x & CN)
      y |= CM;

And:
  * CN is a single bit;
    * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252606
2015-11-10 14:22:05 +00:00
Renato Golin 6d435f12f0 [EABI] Add LLVM support for -meabi flag
"GCC requires the freestanding environment provide memcpy, memmove, memset
and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html

Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent
'__aeabi_memops'. This convertion violates GCC contract.

The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI
targets.

Without -meabi: use the triple default EABI.
With -meabi=default: use the triple default EABI.
With -meabi=gnu: use 'memops'.
With -meabi=4 or -meabi=5: use '__aeabi_memops'.
With -meabi set to an unknown value: same as -meabi=default.

Patch by Vinicius Tinti.

llvm-svn: 252462
2015-11-09 12:40:30 +00:00
Renato Golin 1d8a2c952f Revert "[ARM] Combine CMOV into BFI where possible"
This reverts commit r252057, as it broke ARM self-hosting buildbots, probably
due to a code-gen fault.

llvm-svn: 252460
2015-11-09 12:19:10 +00:00
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
James Molloy bef6e43107 [ARM] Compute known bits for ARMISD::CMOV
We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities.

I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :(

This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case.

llvm-svn: 252168
2015-11-05 15:21:58 +00:00
James Molloy e7d679cf4c [ARM] Combine CMOV into BFI where possible
If we have a CMOV, OR and AND combination such as:
  if (x & CN)
    y |= CM;

And:
  * CN is a single bit;
  * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252057
2015-11-04 16:55:07 +00:00
Tim Northover f8e47e4868 ARM: add support for WatchOS's compact unwind information.
llvm-svn: 251573
2015-10-28 22:56:36 +00:00
Tim Northover 8b40366b54 ARM: teach backend about WatchOS and TvOS libcalls.
The most substantial changes are again for watchOS: libcalls are hard-float if
needed and sincos has a different calling convention.

llvm-svn: 251571
2015-10-28 22:51:16 +00:00
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
Peter Collingbourne 99fac80db2 ARM/ELF: Restore original (pre-r251322) logic for deciding whether to use GOT.
Unbreaks linking with gold, which cannot resolve direct relocations referring
to global symbols.

llvm-svn: 251342
2015-10-26 20:46:44 +00:00
Peter Collingbourne 97aae40880 ARM/ELF: Better codegen for global variable addresses.
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.

This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.

As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.

We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.

This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.

Differential Revision: http://reviews.llvm.org/D13650

llvm-svn: 251322
2015-10-26 18:23:16 +00:00
Craig Topper 8fe40e0ed5 Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC
llvm-svn: 251032
2015-10-22 17:05:00 +00:00
Artyom Skrobov 7fd67e25aa Adding support for TargetLoweringBase::LibCall
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:

- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
  computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall

This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.

The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.

The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.

Reviewers: t.p.northover, rengolin

Subscribers: t.p.northover, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13862

llvm-svn: 250826
2015-10-20 13:14:52 +00:00
Duncan P. N. Exon Smith 9f9559e807 ARM: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250759
2015-10-19 23:25:57 +00:00
Craig Topper 2626094fa1 Make a bunch of static arrays const.
llvm-svn: 250642
2015-10-18 05:15:34 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Oliver Stannard d3d114ba54 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.

llvm-svn: 249565
2015-10-07 16:58:49 +00:00
Chad Rosier 17436bf64e [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 249560
2015-10-07 16:15:40 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Chad Rosier db71abf2d4 [ARM] Push more complex check down to reduce compile time. NFC.
llvm-svn: 249547
2015-10-07 13:40:44 +00:00
Chad Rosier dca46b426f [ARM] Minor refactoring. NFC.
llvm-svn: 249465
2015-10-06 20:58:42 +00:00
Chad Rosier aed910b7d7 [ARM] Minor refactoring. NFC.
llvm-svn: 249464
2015-10-06 20:51:26 +00:00
Chad Rosier 9df4aff86d [ARM] Minor refactoring. NFC.
llvm-svn: 249463
2015-10-06 20:45:45 +00:00
Chad Rosier a087fd21da [ARM] Minor refactoring to improve readability. NFC.
llvm-svn: 249454
2015-10-06 20:23:42 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Artyom Skrobov ad8a0638f7 [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()
supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.

Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.

NFC.

llvm-svn: 248703
2015-09-28 09:44:11 +00:00