Commit Graph

269 Commits

Author SHA1 Message Date
Jessica Paquette 71947ed927 [AArch64][GlobalISel] Constrain reg operands in selectBrJT
This was causing a machine verifier failure on the test suite.

Make sure that we don't end up with a weird register class here.

Failure for reference:

*** Bad machine code: Illegal virtual register for instruction ***
- function:    check_constrain
- basic block: %bb.1  (0x7f8b70839f80)
- instruction: early-clobber %6:gpr64, early-clobber %7:gpr64sp =
  JumpTableDest32 %5:gpr64, %1:gpr64sp, %jump-table.0
- operand 3:   %1:gpr64sp
Expected a GPR64 register, but got a GPR64sp register

Differential Revision: https://reviews.llvm.org/D77349
2020-04-02 20:34:11 -07:00
Matt Arsenault b8fc192d42 Revert "[GISel]: Fix incorrect IRTranslation while translating null pointer types"
This reverts commit b3297ef051.

This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
2020-03-30 19:30:42 -04:00
Jessica Paquette ef4282e0ee [AArch64][GlobalISel] Avoid copies to target register bank for subregister copies
Previously for any copy from a register bigger than the destination:

Copied to a same-sized register in the destination register bank.
Subregister copy of that to the destination.
This fails for copies from 128-bit FPRs to GPRs because the GPR register bank
can't accomodate 128-bit values.

Instead of special-casing such copies to perform the truncation beforehand in
the source register bank, generalize this:
a) Perform a subregister copy straight from source register whenever possible.
This results in shorter MIR and fixes the above problem.

b) Perform a full copy to target bank and then do a subregister copy only if
source bank can't support target's size. E.g. GPR to 8-bit FPR copy.

Patch by Raul Tambre (tambre)!

Differential Revision: https://reviews.llvm.org/D75421
2020-03-05 11:13:02 -08:00
Amara Emerson 65f99b5383 [AArch64][GlobalISel] Fixup <32b heterogeneous regbanks of G_PHIs just before selection.
Since all types <32b on gpr end up being assigned gpr32 regclasses, we can end
up with PHIs here which try to select between a gpr32 and an fpr16. Ideally RBS
shouldn't be selecting heterogenous regbanks for operands if possible, but we
still need to be able to deal with it here.

To fix this, if we have a gpr-bank operand < 32b in size and at least one other
operand is on the fpr bank, then we add cross-bank copies to homogenize the
operand banks. For simplicity the bank that we choose to settle on is whatever
bank the def operand has. For example:

%endbb:
  %dst:gpr(s16) = G_PHI %in1:gpr(s16), %bb1, %in2:fpr(s16), %bb2
 =>
%bb2:
  ...
  %in2_copy:gpr(s16) = COPY %in2:fpr(s16)
  ...
%endbb:
  %dst:gpr(s16) = G_PHI %in1:gpr(s16), %bb1, %in2_copy:gpr(s16), %bb2

Differential Revision: https://reviews.llvm.org/D75086
2020-02-26 14:10:32 -08:00
Jessica Paquette 45417b7aa7 [AArch64][GlobalISel] Properly implement widening for TB(N)Z
When we have to widen to a 64-bit register, we have to emit a SUBREG_TO_REG.

Add a general-purpose widening helpe  which emits the correct SUBREG_TO_REG
instruction based off of a desired size and add a testcase.

Also remove some asserts which are technically incorrect in `emitTestBit`.

- p0 doesn't count as a scalar type, so we need to check `!Ty.isVector()`
instead

- Whenever we have a s1, the Size/Bit checks are too conservative, so just
remove them

Replace these asserts with less conservative ones where applicable.

Differential Revision: https://reviews.llvm.org/D74427
2020-02-12 09:24:58 -08:00
Jessica Paquette 609a489e05 [AArch64][GlobalISel] Reland SLT/SGT TBNZ optimization
The issue in the previous commits was that we swap the LHS and RHS while
looking for the constant. In SLT/SGT, the constant must be on the RHS, or the
optimization is invalid.

Move the swapping logic after the check for the SLT/SGT case and update tests.

Original commits:

d78cefb160
a373841407
2020-02-07 11:15:25 -08:00
Jessica Paquette 3e5d837cda Revert "[AArch64][GlobalISel] Emit TBNZ with G_BRCOND where the condition is SLT"
This reverts commit a373841407.

It looks like this broke set_shadow_test.c, so I'm reverting until I can fix it.

I also reverted the SGT change because it's probably also broken.
2020-02-06 16:30:13 -08:00
Jessica Paquette df51b685ef Revert "[AArch64][GlobalISel] Emit TBZ for SGT cond branches against -1"
This reverts commit d78cefb160.

One of this and the SLT change broke set_shadow_test.c, so I'm reverting until
I can fix it.
2020-02-06 16:29:00 -08:00
Jessica Paquette d78cefb160 [AArch64][GlobalISel] Emit TBZ for SGT cond branches against -1
When we have a G_BRCOND fed by a sgt compare against -1, we can just emit a TBZ.

This is similar to the code in `AArch64TargetLowering::LowerBR_CC`.

Also while we're here, properly scope the commutative constant check in
`selectCompareBranch`, since it sometimes would call
`getConstantVRegValWithLookThrough` twice.

Differential Revision: https://reviews.llvm.org/D74149
2020-02-06 12:04:03 -08:00
Jessica Paquette a373841407 [AArch64][GlobalISel] Emit TBNZ with G_BRCOND where the condition is SLT
When we have a G_ICMP which checks SLT, and the comparison is against 0, we
can emit a TBNZ instead of a CBZ.

This lets us fold in things into the branch, which can provide some code size
savings.

This is similar to the case in `AArch64TargetLowering::LowerBR_CC`.

https://reviews.llvm.org/D74090
2020-02-05 15:23:54 -08:00
Jessica Paquette bab993451e [AArch64][GlobalISel][NFC] Factor out TB(N)Z emission code into its own function
Factor it out into `emitTestBit` and add some asserts to the new function.

This will be useful for implementing TB(N)Z emission for SLT/SGT compares.

Differential Revision: https://reviews.llvm.org/D74080
2020-02-05 15:15:44 -08:00
Jessica Paquette 7212f65784 [AArch64][GlobalISel] Fold G_LSHR into test bit calculation
Add support for walking through G_LSHR in `getTestBitReg`. Equivalent to the
code in `getTestBitOperand` in AArch64ISelLowering.

```
(tbz (lshr x, c), b) -> (tbz x, b+c) when b + c is < # bits in x
```

Differential Revision: https://reviews.llvm.org/D74077
2020-02-05 15:14:12 -08:00
Shu-Chun Weng ce9633633c [GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.

This change adds another function to directly return the register that
is the source of the register and use that for folding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44783

Differential Revision: https://reviews.llvm.org/D74005
2020-02-05 10:38:35 -08:00
Jessica Paquette 292f725711 [AArch64][GlobalISel] Fold G_ASHR into TB(N)Z bit calculation
This implements walking over G_ASHR in the same way as `getTestBitOperand` in
AArch64ISelLowering.

```
(tbz (ashr x, c), b) -> (tbz x, b+c) or (tbz x, msb) if b+c is > # bits in x
```

Differential Revision: https://reviews.llvm.org/D73933
2020-02-05 10:04:48 -08:00
Jessica Paquette a82a28ae12 [AArch64][GlobalISel] Fix one use check in getTestBitReg
(1) The check needs to be on the 0th operand of whatever we're folding
(2) Checks for validity should happen before we change the bit

Fixes a bug which caused MultiSource/Applications/JM/lencod to fail at -O3.

Differential Revision: https://reviews.llvm.org/D74002
2020-02-05 09:54:52 -08:00
Jessica Paquette 9effe38b22 [AArch64][GlobalISel] Fold G_XOR into TB(N)Z bit calculation
This ports the existing case for G_XOR from `getTestBitOperand` in
AArch64ISelLowering into GlobalISel.

The idea is to flip between TBZ and TBNZ while walking through G_XORs.

Let's say we have

```
tbz (xor x, c), b
```

Let's say the `b`-th bit in `c` is 1. Then

- If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 0.
- If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 1.

So, then

```
tbz (xor x, c), b == tbnz x, b
```

Let's say the `b`-th bit in `c` is 0. Then

- If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 1.
- If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 0.

So, then

```
tbz (xor x, c), b == tbz x, b
```

Differential Revision: https://reviews.llvm.org/D73929
2020-02-03 15:22:24 -08:00
Jessica Paquette 37910fd0e1 [AArch64][GlobalISel] Fold G_SHL into TB(N)Z bit calculation
This implements the following optimization:

```
(tbz (shl x, c), b) -> (tbz x, b-c)
```

Which appears in `getTestBitOperand` in AArch64ISelLowering.cpp.

If we test bit `b` of `shl x, c`, we can fold away the `shl` by looking `c` bits
to the right of `b` in `x` when this fits in the type. So, we can just test the
`b-c`th bit.

Differential Revision: https://reviews.llvm.org/D73924
2020-02-03 14:27:08 -08:00
Jessica Paquette 2bd46444d7 [AArch64][GlobalISel] Walk through G_AND in TB(N)Z bit calculation
Given

```
tb(n)z (and x, m), b
```

Where the `b`-th bit of `m` is 1,

```
tb(n)z (and x, m), b == tb(n)z x, b
```

So, we can walk past a `G_AND` in this case.

Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this.

Differential Revision: https://reviews.llvm.org/D73790
2020-02-03 11:53:47 -08:00
Amara Emerson b911b99052 [AArch64][GlobalISel] Don't reconvert to p0 in convertPtrAddToAdd().
convertPtrAddToAdd improved overall code size and quality by a significant amount,
but on -O0 we generate some cross-class copies due to the fact that we emitted
G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any
register coalescing, so these cross class copies end up escaping as moves, and
we ended up regressing 3 benchmarks on CTMark (though still a winner overall).

This patch changes the lowering to instead directly emit the G_ADD into the
destination register, and then force changes the dest LLT to s64 from p0. This
should be ok, as all uses of the register should now be selected and therefore
the LLT doesn't matter for the users. It does however matter for the importer
patterns, which will fail to select a G_ADD if there's a p0 LLT.

I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't
use the same trick of breaking the type system since that could break the
selection of the defining instruction. Thus with -O0 we still end up with a
cross class copy on source.

Code size improvements on -O0:
Program                                         baseline      new         diff
 test-suite :: CTMark/Bullet/bullet.test        965520       949164      -1.7%
 test-suite...TMark/7zip/7zip-benchmark.test    1069456      1052600     -1.6%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    1213692      1199804     -1.1%
 test-suite...:: CTMark/sqlite3/sqlite3.test    421680       419736      -0.5%
 test-suite...-typeset/consumer-typeset.test    837076       833380      -0.4%
 test-suite :: CTMark/lencod/lencod.test        799712       796976      -0.3%
 test-suite...:: CTMark/ClamAV/clamscan.test    688264       686132      -0.3%
 test-suite :: CTMark/kimwitu++/kc.test         1002344      999648      -0.3%
 test-suite...Mark/mafft/pairlocalalign.test    422296       421768      -0.1%
 test-suite :: CTMark/SPASS/SPASS.test          656792       656532      -0.0%
 Geomean difference                                                      -0.6%

Differential Revision: https://reviews.llvm.org/D73910
2020-02-03 11:50:22 -08:00
Jessica Paquette b9bf9305d1 [AArch64][GlobalISel] Walk through G_TRUNC in getTestBitReg
When you encounter a G_TRUNC, you are moving from a larger type to a smaller
type.

Asking for the i-th bit on a larger value is the same as asking for the i-th
bit on a smaller value.

So, we should always be able to walk through G_TRUNC when computing the bit
for a TB(N)Z.

Differential Revision: https://reviews.llvm.org/D73748
2020-01-31 11:09:55 -08:00
Jessica Paquette c8c987d310 [AArch64][GlobalISel] Fold in G_ANYEXT/G_ZEXT into TB(N)Z
This is similar to the code in getTestBitOperand in AArch64ISelLowering. Instead
of implementing all of the TB(N)Z optimizations at once, this patch implements
the simplest case first. The way that this is set up should make it fairly easy
to add the rest as we go along.

The idea here is that after determining that we can use a TB(N)Z, we can
continue looking through instructions and perform further folding.

In this case, when we have a G_ZEXT or G_ANYEXT where the extended bits are not
used, we can fold it into the TB(N)Z.

Differential Revision: https://reviews.llvm.org/D73673
2020-01-30 14:51:26 -08:00
Amara Emerson 6170272ab9 [AArch64][GlobalISel] Disallow vectors in convertPtrAddToAdd.
Found by inspection, but there's no test for this yet because G_PTR_ADD is
currently illegal for vectors. I'll add the test at a later time when the
legalizer support has landed.
2020-01-30 14:50:44 -08:00
Amara Emerson 610f1d22f1 [AArch64][GlobalISel] During ISel try to convert G_PTR_ADD to G_ADD.
This lowering tries to look for G_PTR_ADD instructions and then converts
them to a standard G_ADD with a COPY on the source, and G_INTTOPTR on the
result. This is ok for address space 0 on AArch64 as p0 can be treated as
s64.

The motivation behind this is to expose the add semantics to the imported
tablegen patterns. We shouldn't need to check for uses being loads/stores,
because the selector works bottom up, uses before defs. By the time we
end up trying to select a G_PTR_ADD, we should have already attempted to
fold this into addressing modes and were therefore unsuccessful.

This gives some performance and code size improvements across the board.

Differential Revision: https://reviews.llvm.org/D73673
2020-01-29 23:04:52 -08:00
Jessica Paquette 050cd443ca [AArch64][GlobalISel] Fix TBNZ/TBZ opcode selection
When the bit is <= 32, we have to use the W register variant for TB(N)Z.

This is because of the way the instruction is encoded.

Differential Revision: https://reviews.llvm.org/D73660
2020-01-29 13:11:18 -08:00
Jessica Paquette dba29f7c3b [AArch64][GlobalISel] Fold G_AND into G_BRCOND
When the G_BRCOND is fed by a eq or ne G_ICMP, it may be possible to fold a
G_AND into the branch by producing a tbnz/tbz instead.

This happens when

  1. We have a ne/eq G_ICMP feeding into the G_BRCOND
  2. The G_ICMP is a comparison against 0
  3. One of the operands of the G_AND is a power of 2 constant

This is very similar to the code in AArch64TargetLowering::LowerBR_CC.

Add opt-and-tbnz-tbz to test this.

Differential Revision: https://reviews.llvm.org/D73573
2020-01-28 14:00:31 -08:00
Amara Emerson 14c2cf8e18 [AArch64][GlobalISel] Don't bail out of the select(cmp(a, b)) -> csel optimization with multiple users.
It can still be beneficial to do the optimization if the result of the compare
is used by *another* select.

Differential Revision: https://reviews.llvm.org/D73511
2020-01-28 10:09:03 -08:00
Amara Emerson 44b496758f [AArch64][GlobalISel] Remove duplicate attribute lookup code that was supposed to be cached. NFC.
When I cached this a long time ago it seems I forgot to remove the locally
declared variable of the same name in select(), so the caching wasn't having
any compile time benefit. Doh.
2020-01-23 15:50:08 -08:00
Amara Emerson 765b37abdf [AArch64][GlobalISel] Fallback if the +strict-align target feature is given.
Works around PR44246.
2020-01-23 14:45:29 -08:00
Amara Emerson 2e25d75aaa [AArch64][GlobalISel] Fix llvm.returnaddress(0) selection when LR is clobbered.
The code was originally ported from SelectionDAG, which does CSE behind the scenes
automatically. When copying the return address from LR live into the function, we
need to make sure to use the single copy on function entry. Any later copy from LR
could be using clobbered junk.

Implement this by caching the copy in the per-MF state in the selector.

Should hopefully fix the AArch64 sanitiser buildbot failure.
2020-01-21 22:53:32 -08:00
Jay Foad 63f73545dd [GlobalISel] Pass MachineOperands into MachineIRBuilder helper methods
Reviewers: arsenm, aditya_nandakumar, aemerson

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72849
2020-01-16 16:04:21 +00:00
Amara Emerson 2e39ea726e Revert "Revert rG6078f2fedcac5797ac39ee5ef3fd7a35ef1202d5 - "[AArch64][GlobalISel]: Support @llvm.{return,frame}address selection.""
The original change wasn't constraining the operand regclasses which broke EXPENSIVE_CHECKS.
2020-01-15 10:13:11 -08:00
Simon Pilgrim e26a78e708 Revert rG6078f2fedcac5797ac39ee5ef3fd7a35ef1202d5 - "[AArch64][GlobalISel]: Support @llvm.{return,frame}address selection."
These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.

Committing Tim's original patch.

Differential Revision: https://reviews.llvm.org/D65656
----
Breaks EXPENSIVE_CHECKS builds.
2020-01-15 12:37:37 +00:00
Amara Emerson 6078f2fedc [AArch64][GlobalISel]: Support @llvm.{return,frame}address selection.
These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.

Committing Tim's original patch.

Differential Revision: https://reviews.llvm.org/D65656
2020-01-14 13:41:21 -08:00
Eli Friedman e68e4cbcc5 [GlobalISel] Change representation of shuffle masks in MachineOperand.
We're planning to remove the shufflemask operand from ShuffleVectorInst
(D72467); fix GlobalISel so it doesn't depend on that Constant.

The change to prelegalizercombiner-shuffle-vector.mir happens because
the input contains a literal "-1" in the mask (so the parser/verifier
weren't really handling it properly). We now treat it as equivalent to
"undef" in all contexts.

Differential Revision: https://reviews.llvm.org/D72663
2020-01-13 16:55:41 -08:00
Matt Arsenault b4a647449f TableGen/GlobalISel: Add way for SDNodeXForm to work on timm
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.

Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.

Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.

Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.

Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.

One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
2020-01-09 17:37:52 -05:00
Amara Emerson cc95bb1f57 [AArch64][GlobalISel] Implement selection of <2 x float> vector splat.
Also requires making G_IMPLICIT_DEF of v2s32 legal.

Differential Revision: https://reviews.llvm.org/D72422
2020-01-09 14:05:35 -08:00
Jessica Paquette 9949b1a175 [GlobalISel][AArch64] Import + select LDR*roW and STR*roW patterns
This adds support for selecting a large chunk of the load/store *roW patterns.

This is pretty much a straight port of AArch64DAGToDAGISel::SelectAddrModeWRO
into GISel. The code is very similar to the XRO code. The main difference is
that in the *roW patterns, we want to try and fold in an extend, and *possibly*
a shift along with it. A good portion of this patch is refactoring the existing
XRO code.

- Add selectAddrModeWRO

- Factor out the code from selectAddrModeShiftedExtendXReg which is used by both
  selectAddrModeXRO and selectAddrModeWRO into selectExtendedSHL.
  This is similar to the function of the same name in AArch64DAGToDAGISel.

- Add support for extends to the factored out code in selectExtendedSHL.

- Teach getExtendTypeForInst how to handle AND masks that are intended to be
  used in loads/stores (necessary for this addressing mode.)

- Make getExtendTypeForInst not static because moving it made an annoying diff
  and I wanted to have the WRO/XRO functions close to each other while I was
  writing the code.

Differential Revision: https://reviews.llvm.org/D72426
2020-01-09 12:15:56 -08:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Amara Emerson 7ac9662401 [AArch64][GlobalISel] Add missing default statement to a switch in the selector. 2019-12-06 17:43:27 -08:00
Sterling Augustine aa3c877fb5 Move variable only used in an assert into the assert itself.
This prevents unused variable warnings from breaking the build.
2019-12-06 17:09:19 -08:00
Amara Emerson c77b441140 [AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates.
Only implemented for the type combinations already supported for G_SHL.

Differential Revision: https://reviews.llvm.org/D71153
2019-12-06 16:24:57 -08:00
Daniel Sanders e74c5b9661 [globalisel] Rename G_GEP to G_PTR_ADD
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD

Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69734
2019-11-05 10:31:17 -08:00
Amara Emerson 0f6ed432d5 [AArch64][GlobalISel] Fix assertion fail in C++ selection for vector zext of <4 x s8>
We bailed out of dealing with vectors only after the assertion, move it before.

Fixes PR43794
2019-10-28 15:45:01 -07:00
Amara Emerson 9c7d599dec [AArch64][GlobalISel] Implement selection for G_SHL of <2 x i64>
Simple continuation of existing selection support.

llvm-svn: 372467
2019-09-21 09:21:16 +00:00
Amara Emerson a59a886832 [AArch64][GlobalISel] Selection support for G_ASHR of <2 x s64>
Just add an extra case to the existing selection logic.

llvm-svn: 372466
2019-09-21 09:21:13 +00:00
Jessica Paquette 04e657be28 [AArch64][GlobalISel] Select arithmetic extended register patterns
This teaches GISel to select patterns which fold an extend plus optional shift
into the addressing mode. In particular, adds and subs.

Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td
and create GISel equivalents.

Add some equivalent functions to the ones in AArch64ISelDAGToDAG:

- `selectArithExtendedRegister`
- `narrowExtendRegIfNeeded`
- `getExtendTypeForInst`

`getExtendTypeForInst` includes the checks for loads and stores. This will be
used for WRO addressing modes in loads + stores.

Teach selectCopy to properly handle subregister copies on the same bank in
order to support `narrowExtendRegIfNeeded`. The extended register must be a
GPR32, so we need to support same-bank subregister copies.

Fix a bug in getSubRegForClass which would cause registers on things like
GPR32common to end up getting ssub. Just change the check to look for FPR32
rather than GPR32.

For tests:

- Add select-arith-extended-reg.mir
- Update addsub_ext.ll to include GlobalISel checks

Differential Revision: https://reviews.llvm.org/D66835

llvm-svn: 370410
2019-08-29 21:53:58 +00:00
Matt Arsenault caff0a88dd GlobalISel: Add known bits to InstructionSelector
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.

llvm-svn: 370388
2019-08-29 17:24:32 +00:00
Jessica Paquette b8b23a1648 [GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.*
Remove manual selection code for this intrinsic and use a GISelPredicateCode
instead.

This allows us to fully select this intrinsic without any tricky custom C++
matching.

Differential Revision: https://reviews.llvm.org/D65780

llvm-svn: 370380
2019-08-29 16:45:19 +00:00
Jessica Paquette c327daeea5 [AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics
Add a GISelPredicateCode to ldaxr_*. This allows us to import the patterns for
@llvm.aarch64.ldaxr.*, and thus select them.

Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these
intrinsics involves the same check.

Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66897

llvm-svn: 370377
2019-08-29 16:16:38 +00:00
Jessica Paquette 83fe56b3b9 [AArch64][GlobalISel] Import XRO load/store patterns instead of custom selection
Instead of using custom C++ in `earlySelect` for loads and stores, just import
the patterns.

Remove `earlySelectLoad`, since we can just import the work it's doing.

Some minor changes to how `ComplexRendererFns` are returned for the XRO
addressing modes. If you add immediates in two steps, sometimes they are not
imported properly and you only end up with one immediate. I'm not sure if this
is intentional.

- Update load-addressing-modes.mir to include the instructions we can now
  import.

- Add a similar test, store-addressing-modes.mir to show which store opcodes we
  currently import, and show that we can pull in shifts etc.

- Update arm64-fastisel-gep-promote-before-add.ll to use FastISel instead of
  GISel. This test failed with GISel because GISel folds the gep into the load.
  The test checks that FastISel doesn't fold non-pointer-width adds into loads.
  GISel on the other hand, produces a G_CONSTANT of -128 for the add, and then
  a G_GEP, which must be pointer-width.

Note that we don't get STRBRoX right now. It seems like the importer can't
handle `FPR8Op:{ *:[Untyped] }:$Rt` source operands. So, those are not currently
supported.

Differential Revision: https://reviews.llvm.org/D66679

llvm-svn: 369806
2019-08-23 20:31:34 +00:00