Commit Graph

444 Commits

Author SHA1 Message Date
Roger Ferrer Ibanez 4f92b4162f [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 313009
2017-09-12 07:40:09 +00:00
Hans Wennborg 075e5a2e2b Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY"
It caused PR34564.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312980
2017-09-11 23:52:02 +00:00
Roger Ferrer Ibanez 12b20f2307 [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312898
2017-09-11 07:38:05 +00:00
Diana Picus 1d101d76c0 Move static helper into ARMTargetLowering. NFC
This exposes the isReadOnly(GlobalValue *) in the ARMTargetLowering so
we can make use of it in GlobalISel as well.

llvm-svn: 312320
2017-09-01 10:44:48 +00:00
Evgeny Astigeevich 540a39adf7 [ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes
ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes
for the Thumb1 target. This causes generation of redundant code and affects
performance.

This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106

Differential Revision: https://reviews.llvm.org/D36467

llvm-svn: 311649
2017-08-24 10:00:25 +00:00
Craig Topper 2251ef95a3 [X86][ARM][TargetLowering] Add SrcVT to isExtractSubvectorCheap
Summary:
Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not.

For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors.

Reviewers: RKSimon, zvi, efriedma

Reviewed By: RKSimon

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36649

llvm-svn: 310793
2017-08-13 17:29:07 +00:00
Nico Weber bfde70b097 Revert r309923, it caused PR34045.
llvm-svn: 309950
2017-08-03 15:41:26 +00:00
Roger Ferrer Ibanez 854980341b [ARM] Use ADDCARRY / SUBCARRY
This patch:

- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
  using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value
  using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
  operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
  borrow, we need to invert the value of the second operand and result before
  and after using ARMISD::SUBE. We need to invert the carry result of
  ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
  ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering
  as well otherwise i64 operations now would require branches. This implies
  updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
  to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 309923
2017-08-03 07:45:10 +00:00
Zvi Rackover 1b73682243 TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI.
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.

This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.

llvm-svn: 309085
2017-07-26 08:06:58 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Nirav Dave 4dcad5dc6b Add DAG argument to canMergeStoresTo NFC.
llvm-svn: 307583
2017-07-10 20:25:54 +00:00
Alexandros Lamprineas 2b2b420563 [ARM] Support constant pools in data when generating execute-only code.
Resubmission of r305387, which was reverted at r305390. The Address
Sanitizer caught a stack-use-after-scope of a Twine variable. This
is now fixed by passing the Twine directly as a function parameter.

The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

llvm-svn: 305776
2017-06-20 07:20:52 +00:00
Alexandros Lamprineas 1c15ee2631 Revert "[ARM] Support constant pools in data when generating execute-only code."
This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0.

I've upset a buildbot which runs the address sanitizer:
ERROR: AddressSanitizer: stack-use-after-scope
lib/Target/ARM/ARMISelLowering.cpp:2690
That Twine variable is used illegally.

llvm-svn: 305390
2017-06-14 15:00:08 +00:00
Alexandros Lamprineas c582d6e133 [ARM] Support constant pools in data when generating execute-only code.
The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

llvm-svn: 305387
2017-06-14 13:22:41 +00:00
Nirav Dave 6c910c0dd8 [DAG] Add AddressSpace parameter to canMergeStoresTo. NFC.
llvm-svn: 303673
2017-05-23 18:53:02 +00:00
Tim Shen 04de70d3a7 [Atomic] Remove IsStore/IsLoad in the interface, and pass the instruction instead. NFC.
Now both emitLeadingFence and emitTrailingFence take the instruction
itself, instead of taking IsLoad/IsStore pairs.
Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used
for determining those two booleans.

The instruction argument is also useful for later D32763, in
emitTrailingFence. For emitLeadingFence, it seems to have cleaner
interface with the proposed change.

Differential Revision: https://reviews.llvm.org/D32762

llvm-svn: 302539
2017-05-09 15:27:17 +00:00
Matthias Braun 4682ac6c83 ARM: Compute MaxCallFrame size early
This exposes a method in MachineFrameInfo that calculates
MaxCallFrameSize and calls it after instruction selection in the ARM
target.

This avoids
ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame()
giving different answers in early/late phases of codegen.

The testcase shows a particular nasty example result of that where we
would fail to properly align an alloca.

Differential Revision: https://reviews.llvm.org/D32622

llvm-svn: 302303
2017-05-05 22:04:05 +00:00
Sam Parker df337704f0 [ARM] ACLE Chapter 9 intrinsics
Added the integer data processing intrinsics from ACLE v2.1 Chapter 9
but I have missed out the saturation_occurred intrinsics for now. For
the instructions that read and write the GE bits, a chain is included
and the only instruction that reads these flags (sel) is only
selectable via the implemented intrinsic.

Differential Revision: https://reviews.llvm.org/D32281

llvm-svn: 302126
2017-05-04 07:31:28 +00:00
Craig Topper d0af7e8ab8 [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

llvm-svn: 301620
2017-04-28 05:31:46 +00:00
Matt Arsenault 3138075dd4 DAG: Make mayBeEmittedAsTailCall parameter const
llvm-svn: 300603
2017-04-18 21:16:46 +00:00
Matthew Simpson 1468d3e04e [ARM/AArch64] Ensure valid vector element types for interleaved accesses
This patch refactors and strengthens the type checks performed for interleaved
accesses. The primary functional change is to ensure that the interleaved
accesses have valid element types. The added test cases previously failed
because the element type is f128.

Differential Revision: https://reviews.llvm.org/D31817

llvm-svn: 299864
2017-04-10 18:34:37 +00:00
Sanjay Patel b2f1621bb1 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483

llvm-svn: 299542
2017-04-05 14:09:39 +00:00
Simon Pilgrim 37b536e4b3 [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

llvm-svn: 299201
2017-03-31 11:24:16 +00:00
Sam Parker 654cb8263a [ARM] Enable SMLAL[B|T] isel
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.

Differential Revision: https://reviews.llvm.org/D30044

llvm-svn: 297809
2017-03-15 08:27:11 +00:00
Sam Parker 916b1ba617 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708

llvm-svn: 297716
2017-03-14 09:13:22 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
James Molloy d508789668 [ARM] Use VCMP, not VCMPE, for floating point equality comparisons
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.

It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:

  The relational and equality operators support the usual mathematical
  relationships between numeric values. For any ordered pair of numeric
  values exactly one of relationships the less, greater, equal and is true.
  Relational operators may raise the floating-point exception when argument
  values are NaNs.

The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).

Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.

llvm-svn: 294945
2017-02-13 12:32:47 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Saleem Abdulrasool 6ef45916c6 ARM: match GCC's behaviour for builtins
GCC changes the CC between the user-code and the builtins based on the
value of `-target` rather than `-mfloat-abi`.  When a HF target is used,
the VFP variant of the AAPCS CC is used.  Otherwise, the AAPCS variant
is used.  In all cases, the AEABI functions use the AAPCS CC.  Adjust
the calling convention based on the target.

Resolves PR30543!

llvm-svn: 291909
2017-01-13 16:25:33 +00:00
Eugene Zelenko c4ad1ce068 [Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291641
2017-01-11 01:45:03 +00:00
Eli Friedman d03df8145f [ARM] Implement isExtractSubvectorCheap.
See https://reviews.llvm.org/D6678 for the history of
isExtractSubvectorCheap. Essentially the same considerations apply
to ARM.

This temporarily breaks the formation of vpadd/vpaddl in certain cases;
AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles.
See https://reviews.llvm.org/D27779 for followup fix.

Differential Revision: https://reviews.llvm.org/D27774

llvm-svn: 290198
2016-12-20 20:05:07 +00:00
Eli Friedman f624ec27b7 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289972
2016-12-16 18:44:08 +00:00
Diana Picus 2af9c389bf [ARM] Expose methods to get the CCAssignFn. NFCI
Add two public methods to ARMTargetLowering: CCAssignFnForCall and
CCAssignFnForReturn, which are just calling the already existing private method
CCAssignFnForNode. These will come in handy for GlobalISel on ARM.

We also replace all calls to CCAssignFnForNode in ARMISelLowering.cpp, because
the new methods are friendlier to the reader.

llvm-svn: 289932
2016-12-16 10:35:20 +00:00
Nico Weber 11c2e6ee3a Revert 279703, it caused PR31404.
llvm-svn: 289923
2016-12-16 04:51:25 +00:00
Eli Friedman 10576e73c9 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289703
2016-12-14 20:25:26 +00:00
Javed Absar 85874a9360 [ARM]: Assign cost of scaling used in addressing mode for ARM cores
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.

For instance:

LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]

Above, (1) takes less cycles than (2).

By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.

Differential Revision: http://reviews.llvm.org/D24857

Reviewers: jmolloy, rengolin
llvm-svn: 284127
2016-10-13 14:57:43 +00:00
Saleem Abdulrasool 92e33a3ebc ARM: move the builtins libcall CC setup
Move the target specific setup into the target specific lowering setup.  As
pointed out by Anton, the initial change was moving this too high up the stack
resulting in a violation of the layering (the target generic code path setup
target specific bits).  Sink this into the ARM specific setup.  NFC.

llvm-svn: 281088
2016-09-09 20:11:31 +00:00
Diana Picus 774d157a5d [ARM] Honour ABI for rem under -O0 for EABI, GNUEABI, Android and Musl
At higher optimization levels, we generate the libcall for DIVREM_Ix, which is
fine: aeabi_{u|i}divmod. At -O0 we generate the one for REM_Ix, which is the
default {u}mod{q|h|s|d}i3.

This commit makes sure that we don't generate REM_Ix calls for ABIs that
don't support them (i.e. where we need to use DIVREM_Ix instead). This is
achieved by bailing out of FastISel, which can't handle non-double multi-reg
returns, and letting the legalization infrastructure expand the REM_Ix calls.

It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some
Windows checks to it to make sure we don't break things for it.

Fixes PR27068

Differential Revision: https://reviews.llvm.org/D21926

llvm-svn: 275773
2016-07-18 06:48:25 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Rafael Espindola ae0d866f56 Refactor a duplicated predicate. NFC.
llvm-svn: 273826
2016-06-26 22:13:55 +00:00
Pablo Barrio 7a64346533 [ARM] Lower (select_cc k k (select_cc ~k ~k x)) into (SSAT l_k x)
Summary:
SSAT saturates an integer, making sure that its value lies within
an interval [-k, k]. Since the constant is given to SSAT as the
number of bytes set to one, k + 1 must be a power of 2, otherwise
the optimization is not possible. Also, the select_cc must use <
and > respectively so that they define an interval.

Reviewers: mcrosier, jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D21372

llvm-svn: 273581
2016-06-23 16:53:49 +00:00
Sam Parker d616cf07b2 [ARM] Enable isel of UMAAL
TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL
dags into UMAAL. Selection is split into the two phases because it
is easier to match the two patterns at those different times.

Differential Revision: http://http://reviews.llvm.org/D21461

llvm-svn: 273165
2016-06-20 16:47:09 +00:00
Rafael Espindola 0f89833c31 Add a isPositionIndependent predicate.
Reduces a bit of code duplication and clarify where we are interested
just on position independence and no the location of the symbol.

llvm-svn: 273164
2016-06-20 16:43:17 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Tim Northover b5ece527a1 ARM: stop emitting blx instructions for most calls on MachO.
I'm really not sure why we were in the first place, it's the linker's job to
convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls
that could be locally resolved completely unencodable since all offsets to BLX
are multiples of 4.

rdar://26182344

llvm-svn: 269101
2016-05-10 19:17:47 +00:00
Silviu Baranga 82d04260b7 [ARM] Add support for the X asm constraint
Summary:
This patch adds support for the X asm constraint.

To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).

Fixes PR26493

Reviewers: t.p.northover, echristo, rengolin

Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19061

llvm-svn: 267411
2016-04-25 14:29:18 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Peter Collingbourne 86b9fbe980 ARM: Better codegen for 64-bit compares.
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.

Before:

	push	{r7, lr}
	cmp	r0, r2
	mov.w	r0, #0
	mov.w	r12, #0
	it	hs
	movhs	r0, #1
	cmp	r1, r3
	it	ge
	movge.w	r12, #1
	it	eq
	moveq	r12, r0
	cmp.w	r12, #0
	bne	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

After:

	push	{r7, lr}
	subs	r0, r0, r2
	sbcs.w	r0, r1, r3
	bge	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

Saves around 80KB in Chromium's libchrome.so.

Some notes on this patch:

- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
  introduced (nothing else needs them). However, they are necessary in
  order to avoid poor codegen, and they seem similar to existing combines
  in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
  (brcond Compare)).

- No support for Thumb-1. This is in principle possible, but we'd need
  to implement ARMISD::SUBE for Thumb-1.

Differential Revision: http://reviews.llvm.org/D15256

llvm-svn: 263962
2016-03-21 18:00:02 +00:00
James Y Knight f44fc5219f Tweak some atomics functions in preparation for larger changes; NFC.
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
  '__sync' libcalls and '__atomic' libcalls, and this function is for
  the '__sync' ones.

- getInsertFencesForAtomic() has been replaced with
  shouldInsertFencesForAtomic(Instruction), so that the decision can be
  made per-instruction. This functionality will be used soon.

- emitLeadingFence/emitTrailingFence are no longer called if
  shouldInsertFencesForAtomic returns false, and thus don't need to
  check the condition themselves.

llvm-svn: 263665
2016-03-16 22:12:04 +00:00
Saleem Abdulrasool f36005a358 ARM: support TLS for WoA
Add support for TLS access for Windows on ARM.  This generates a similar access
to MSVC for ARM.

The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call.  The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.

llvm-svn: 259676
2016-02-03 18:21:59 +00:00
Manman Ren 5e9e65e705 CXX_FAST_TLS calling convention: performance improvement for ARM.
This is the same change on ARM as r255821 on AArch64.
rdar://9001553

llvm-svn: 257424
2016-01-12 00:47:18 +00:00
Tim Northover bd41cf880c ARM: support TLS accesses on Darwin platforms
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.

llvm-svn: 257039
2016-01-07 09:03:03 +00:00
Martell Malone d12292480a ARM: address WOA unsigned division overflow crash
Building on r253865 the crash is not limited to signed overflows.

Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.

llvm-svn: 254158
2015-11-26 15:34:03 +00:00
Martell Malone a6b867eb0d ARM: address WoA division overflow crash
Disable custom handling of signed 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit integer overflow crashes.

llvm-svn: 253865
2015-11-23 13:11:39 +00:00
James Molloy b564098c62 [ARM] Replace ARMISD::RBIT with ISD::BITREVERSE
ISD::BITREVERSE matches "rbit" completely, so remove ARMISD::RBIT and mark ISD::BITREVERSE as legal, adding a test for lowering.

llvm-svn: 253047
2015-11-13 16:05:22 +00:00
Sanjay Patel af1b48bfdc [ARM] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2
implementation.

The net result of allowing this speculation for the regression tests in this patch is
that we get this code:

ctlz:               
  clz  r0, r0
  bx  lr
cttz:              
  rbit  r0, r0
  clz  r0, r0
  bx  lr

Instead of:

ctlz:    
  cmp  r0, #0
  moveq  r0, #32
  clzne  r0, r0
  bx  lr
cttz:     
  cmp   r0, #0
  moveq  r0, #32
  rbitne  r0, r0
  clzne  r0, r0
  bx  lr

This will help solve a general speculation/despeculation problem noted in PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818

Differential Revision: http://reviews.llvm.org/D14469

llvm-svn: 252639
2015-11-10 19:24:31 +00:00
James Molloy 9d55f19cfa Reapply "[ARM] Combine CMOV into BFI where possible"
Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh!

Original commit message:

If we have a CMOV, OR and AND combination such as:
  if (x & CN)
      y |= CM;

And:
  * CN is a single bit;
    * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252606
2015-11-10 14:22:05 +00:00
Renato Golin 1d8a2c952f Revert "[ARM] Combine CMOV into BFI where possible"
This reverts commit r252057, as it broke ARM self-hosting buildbots, probably
due to a code-gen fault.

llvm-svn: 252460
2015-11-09 12:19:10 +00:00
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
James Molloy e7d679cf4c [ARM] Combine CMOV into BFI where possible
If we have a CMOV, OR and AND combination such as:
  if (x & CN)
    y |= CM;

And:
  * CN is a single bit;
  * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252057
2015-11-04 16:55:07 +00:00
Peter Collingbourne 97aae40880 ARM/ELF: Better codegen for global variable addresses.
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.

This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.

As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.

We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.

This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.

Differential Revision: http://reviews.llvm.org/D13650

llvm-svn: 251322
2015-10-26 18:23:16 +00:00
James Molloy 72222f5dca [ARM] Handle the inline asm constraint type 'o'
This means "memory with offset" and requires very little plumbing to get working. This fixes PR25317.

llvm-svn: 251280
2015-10-26 10:04:52 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Saleem Abdulrasool fe83b50289 ARM: address WoA division limitation
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

llvm-svn: 248561
2015-09-25 05:15:46 +00:00
Ahmed Bougacha 81616a72ea [ARM] Emit clrex in the expanded cmpxchg fail block.
ARM counterpart to r248291:

In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248294
2015-09-22 17:22:58 +00:00
Ahmed Bougacha 5246867384 [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha 9d677131c4 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
Scott Douglass bdef60462d [ARM] Use AEABI helpers for i64 div and rem
Differential Revision: http://reviews.llvm.org/D12232

llvm-svn: 245830
2015-08-24 09:17:18 +00:00
James Molloy ea3a687a33 [ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM
NFCI. This replaces another custom ISDNode with a generic equivalent.

llvm-svn: 244591
2015-08-11 12:06:22 +00:00
James Molloy db8ee4b5a9 [ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN.
NFCI. This removes a custom ISDNode.

llvm-svn: 244590
2015-08-11 12:06:15 +00:00
Matthias Braun 3cd00c1739 Fix __builtin_setjmp in combination with sjlj exception handling.
llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling
style but is also used in clang to implement __builtin_setjmp.  The ARM
backend needs to output additional dispatch tables for the SjLj
exception handling style, these tables however can't be emitted if
llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual
landing pad blocks exist.

To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is
introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj
exception handling lowering, so we can differentiate between the case
where we actually need to setup a dispatch table and the case where we
just need the __builtin_setjmp semantic.

Differential Revision: http://reviews.llvm.org/D9313

llvm-svn: 242481
2015-07-16 22:34:16 +00:00
Pat Gavlin a717f255b6 Allow {e,r}bp as the target of {read,write}_register.
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.

Differential Revision: http://reviews.llvm.org/D10977

llvm-svn: 241827
2015-07-09 17:40:29 +00:00
Mehdi Amini 0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini 44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Benjamin Kramer 9bfb627a0e [TargetLowering] StringRefize asm constraint getters.
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.

llvm-svn: 241411
2015-07-05 19:29:18 +00:00
Hao Liu 2cd34bb585 [ARM] Lower interleaved memory accesses to vldN/vstN intrinsics.
This patch also adds a function to calculate the cost of interleaved memory accesses.

E.g. Lower an interleaved load:
        %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4
        %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>
        %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>
     into:
        %vld2 = { <4 x i32>, <4 x i32> } call llvm.arm.neon.vld2(%ptr, 4)
        %vec0 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 0
        %vec1 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 1

E.g. Lower an interleaved store:
        %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
        store <12 x i32> %i.vec, <12 x i32>* %ptr, align 4
     into:
        %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>
        %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>
        %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>
        call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4)

Differential Revision: http://reviews.llvm.org/D10533

llvm-svn: 240755
2015-06-26 02:45:36 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Peter Collingbourne 6679fc1a79 Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM."
as it caused miscompilations and assertion failures (PR23768,
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html).

llvm-svn: 239169
2015-06-05 18:01:28 +00:00
Daniel Sanders 1f58ef71ea [arm] Distinguish the /U[qytnms]/, 'Uv', 'Q', and 'm' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.

Of these, /U[qytnms]/ do not have backend tests but are accepted by clang.

No functional change intended.

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D8203

llvm-svn: 238921
2015-06-03 12:33:56 +00:00
Matt Arsenault bd7d80a4a6 Add address space argument to isLegalAddressingMode
This is important because of different addressing modes
depending on the address space for GPU targets.

This only adds the argument, and does not update
any of the uses to provide the correct address space.

llvm-svn: 238723
2015-06-01 05:31:59 +00:00
Peter Collingbourne 450fbee6b2 Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing these as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach
the register allocator to allocate the registers in ascending order. This
never got implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achiveing better codegen is to create LDM/STM
instructions with identical sets of virtual registers, let the register
allocator pick arbitrary registers and order register lists when printing an
MCInst. This approach also avoids the need to repeatedly calculate offsets
which ultimately ought to be eliminated pre-RA in order to decrease register
pressure.

This is implemented by lowering the memcpy intrinsic to a series of SD-only
MCOPY pseudo-instructions which performs a memory copy using a given number
of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a
little unusual, but it avoids the need to encode register lists in the SD,
and we can take advantage of SD use lists to decide whether to use the _UPD
variant of the instructions.

Fixes PR9199.

Differential Revision: http://reviews.llvm.org/D9508

llvm-svn: 238473
2015-05-28 20:02:45 +00:00
Eric Christopher 824f42f209 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Matthias Braun d04893fa36 Change getTargetNodeName() to produce compiler warnings for missing cases, fix them
llvm-svn: 236775
2015-05-07 21:33:59 +00:00
Matthias Braun eec4efcca5 Cleanup, remove unused return value
llvm-svn: 235952
2015-04-28 00:37:05 +00:00
James Molloy fa041153e5 [ARM] Remove target-specific ITOFP/FPTOI nodes
Anton tried this 5 years ago but it was reverted due to extra VMOVs
being emitted. This can be easily fixed with a liberal application
of patterns - matching loads/stores and extractelts.

llvm-svn: 232958
2015-03-23 16:15:16 +00:00
John Brawn 0dbcd65442 [ARM] Align stack objects passed to memory intrinsics
Memcpy, and other memory intrinsics, typically tries to use LDM/STM if
the source and target addresses are 4-byte aligned. In CodeGenPrepare
look for calls to memory intrinsics and, if the object is on the
stack, 4-byte align it if it's large enough that we expect that memcpy
would want to use LDM/STM to copy it.

Differential Revision: http://reviews.llvm.org/D7908

llvm-svn: 232627
2015-03-18 12:01:59 +00:00
Daniel Sanders bf5b80f5f9 Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC.
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.

Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8173

llvm-svn: 232373
2015-03-16 13:13:41 +00:00
Tim Northover 8cda34f5e7 ARM: simplify and extend byval handling
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.

But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.

I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.

rdar://20095672

llvm-svn: 231959
2015-03-11 18:54:22 +00:00
Ahmed Bougacha 4200cc95b4 [ARM] Enable vector extload combine for legal types.
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.

Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because: 
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
  when we "manually expand" extloads to vld+vmov.

For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.

Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423

llvm-svn: 231396
2015-03-05 19:37:53 +00:00
JF Bastien f14889ee34 Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded.
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.

This does not represent a behavioural change and as such no tests were added.

Patch by: Richard Diamond.

Reviewers: jfb

Reviewed By: jfb

Subscribers: jfb, aemerson, t.p.northover, llvm-commits

Differential Revision: http://reviews.llvm.org/D7713

llvm-svn: 231250
2015-03-04 15:47:57 +00:00
Eric Christopher 11e4df73c8 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Eric Christopher 23a3a7c871 Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.

llvm-svn: 230583
2015-02-26 00:00:24 +00:00