Commit Graph

63 Commits

Author SHA1 Message Date
Amara Emerson e3f5046e44 [AArch64][GlobalISel] Merge selection of vector-vector G_ASHR/G_LSHR and support more cases.
The vector-immediate cases are handled elsewhere in an earlier commit.
2020-09-21 16:04:52 -07:00
Amara Emerson a513fdec90 [AArch64][GlobalISel] Add a post-legalize combine for lowering vector-immediate G_ASHR/G_LSHR.
In order to select the immediate forms using the imported patterns, we need to
lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this
matching build_vector of constant operands.

With this, we get selection for free.
2020-09-21 16:04:52 -07:00
Amara Emerson 825203daae [AArch64][GlobalISel] Make <4 x s16> G_ASHR and G_LSHR legal.
Selection support for these is coming up.
2020-09-21 15:32:48 -07:00
Amara Emerson 5a50f8b39f [AArch64][GlobalISel] Add legalization and selection support for <4 x s16> G_SHL. 2020-09-18 23:32:01 -07:00
Amara Emerson 269bcc39ca [AArch64][GlobalISel] Legalize arithmetic ops for <4 x s16> 2020-09-18 17:13:55 -07:00
Amara Emerson 5d34d7f1a0 [GlobalISel] Add lowering support for G_ABS and use for AArch64.
Differential Revision: https://reviews.llvm.org/D87952
2020-09-18 16:17:18 -07:00
Amara Emerson 615695de27 [AArch64][GlobalISel] Make <8 x s8> of G_BUILD_VECTOR legal. 2020-09-18 10:32:33 -07:00
Tim Northover 2afe4becec AArch64: make sure jump table entries can reach entire image
This turns all jump table entries into deltas within the target
function because in the small memory model all code & static data must
be in a 4GB block somewhere in memory.

When the entries were a delta between the table location and a basic
block, the 32-bit signed entries are not enough to guarantee
reachability.

https://reviews.llvm.org/D87286
2020-09-18 09:50:40 +01:00
Amara Emerson f5898f8c2d [AArch64][GlobalISel] Make G_STORE <8 x s8> legal. 2020-09-17 16:42:18 -07:00
Amara Emerson 196e2f97b7 [AArch64][GlobalISel] clang-format AArch64LegalizerInfo.cpp. NFC. 2020-09-17 16:41:10 -07:00
Amara Emerson 7d5b103483 [AArch64][GlobalISel] Widen G_EXTRACT_VECTOR_ELT element types if < 8b.
In order to not unnecessarily promote the source vector to greater than our
native vector size of 128b, I've added some cascading rules to widen based on
the number of elements.
2020-09-17 11:50:33 -07:00
Amara Emerson bea7749d03 [AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal for shifts. 2020-09-17 11:50:32 -07:00
Amara Emerson 79b21fc187 [AArch64][GlobalISel] Fix bug in fewVectorElts action while legalizing oversize G_FPTRUNC vectors.
For <8 x s32> = fptrunc <8 x s64> the fewerElementsVector action tries to break
down the source vector into the final source vectors of <2 x s64> using unmerge.
This fixes a crash due to using the wrong number of elements for the breakdown
type.

Also add some legalizer tests for explicitly G_FPTRUNC which we didn't have.

Differential Revision: https://reviews.llvm.org/D87814
2020-09-17 08:56:26 -07:00
Amara Emerson 6ad33d8360 [AArch64][GlobalISel] Make G_BUILD_VECTOR os <16 x s8> legal. 2020-09-16 11:19:47 -07:00
Jessica Paquette ffe9986de4 [AArch64][GlobalISel] Refactor + improve CMN, ADDS, and ADD emit functions
These functions were extremely similar:

- `emitADD`
- `emitADDS`
- `emitCMN`

Refactor them a little, introducing a more generic `emitInstr` function to
do most of the work.

Also add support for the immediate + shifted register addressing modes in each
of them.

Update select-uaddo.mir to show that selecing ADDS now supports folding
immediates + shifts. (I don't think this can impact CMN, because the CMN checks
require a G_SUB with a non-constant on the RHS.)

This is around a 0.02% code size improvement on CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D87529
2020-09-15 17:18:05 -07:00
Zarko Todorovski 035396197a Remove unused variable introduce in 0448d11a06 causing build
failures with -Werror on.
2020-09-10 20:07:11 -04:00
Amara Emerson 0448d11a06 [AArch64][GlobalISel] Don't emit a branch for a fallthrough G_BR at -O0.
With optimizations we leave the decision to eliminate fallthrough branches to
bock placement, but at -O0 we should do it in the selector to save code size.

This regressed -O0 with a recent change to a combiner.
2020-09-10 15:01:26 -07:00
Jordan Rupprecht 52f0837778 [NFC] Move definition of variable now only used in debug builds 2020-09-09 20:23:59 -07:00
Jessica Paquette 480e7f43a2 [AArch64][GlobalISel] Share address mode selection code for memops
We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.

As a result, we were missing cases like this:

```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```

https://godbolt.org/z/16r7ad

This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.

This is a 0.2% geomean code size improvement for CTMark at -O3.

There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.

Differential Revision: https://reviews.llvm.org/D87397
2020-09-09 15:14:46 -07:00
Amara Emerson 520ab710fb Revert "Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")"
This reverts commit 8693ddc743.

Re-committing with the test requiring asserts.
2020-09-01 14:29:04 -07:00
Jordan Rupprecht 8693ddc743 Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")
This reverts commit 8ad8f484b6. It causes crashes when running `ninja check-llvm-codegen-aarch64-globalisel`, e.g.
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132/steps/test-stage1-compiler/logs/stdio.
Note that the crash does not seem to reproduce in debug builds.

5ded444252 depends on this, so revert that too.
2020-09-01 13:31:57 -07:00
Amara Emerson 5ded444252 [AArch64][GlobalISel] Optimize away a Not feeding a brcond by using tbz instead of tbnz.
Usually brconds are fed by compares, but not always, in which case we would
miss this fold.

Differential Revision: https://reviews.llvm.org/D86413
2020-09-01 11:06:06 -07:00
Matt Arsenault 0b7f6cc71a GlobalISel: Add generic instructions for memory intrinsics
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
2020-08-26 20:08:45 -04:00
Raul Tambre e887d0e89b [AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass()
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.

It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.

Differential Revision: https://reviews.llvm.org/D85720
2020-08-19 12:52:30 -07:00
Matt Arsenault eae9c54148 AArch64/GlobalISel: Fix verifier error after selecting returnaddress
This was caching the wrong register to re-use later.
2020-08-06 13:18:05 -04:00
Matt Arsenault f8fb7835d6 GlobalISel: Add utilty for getting function argument live ins
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.

This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.

I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
2020-08-04 16:55:55 -04:00
Mitch Phillips 9a05fa10bd [HWASan] [GlobalISel] Add +tagged-globals backend feature for GlobalISel
GlobalISel is the default ISel for aarch64 at -O0. Prior to D78465, GlobalISel
didn't have support for dealing with address-of-global lowerings, so it fell
back to SelectionDAGISel.

HWASan Globals require special handling, as they contain the pointer tag in the
top 16-bits, and are thus outside the code model. We need to generate a `movk`
in the instruction sequence with a G3 relocation to ensure the bits are
relocated properly. This is implemented in SelectionDAGISel, this patch does
the same for GlobalISel.

GlobalISel and SelectionDAGISel differ in their lowering sequence, so there are
differences in the final instruction sequence, explained in
`tagged-globals.ll`. Both of these implementations are correct, but GlobalISel
is slightly larger code size / slightly slower (by a couple of arithmetic
instructions). I don't see this as a problem for now as GlobalISel is only on
by default at `-O0`.

Reviewed By: aemerson, arsenm

Differential Revision: https://reviews.llvm.org/D82615
2020-08-03 14:28:44 -07:00
Amara Emerson 09f9f7dd1b [AArch64][GlobalISel] Add legalization & selection support for G_INTRINSIC_LRINT.
Differential Revision: https://reviews.llvm.org/D84552
2020-07-30 16:14:56 -07:00
Amara Emerson d8ba622209 [AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.

Differential Revision: https://reviews.llvm.org/D84866
2020-07-29 11:41:37 -07:00
Jessica Paquette 7ff9575594 [AArch64][GlobalISel] Select XRO addressing mode with wide immediates
Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO.

If we have a wide immediate which can't be represented in an add, we can end up
with code like this:

```
mov  x0, imm
add x1, base, x0
ldr  x2, [x1, 0]
```

If we use the [base, xN] addressing mode instead, we can produce this:

```
mov  x0, imm
ldr  x2, [base, x0]
```

This saves 0.4% code size on 7zip at -O3, and gives a geomean code size
improvement of 0.1% on CTMark.

Differential Revision: https://reviews.llvm.org/D84784
2020-07-29 11:02:10 -07:00
Amara Emerson 9b19400004 [AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal types for G_SHUFFLE_VECTOR and G_IMPLICIT_DEF.
Trivial change, we're still missing support for rev matching for these types
in the combiner.
2020-07-26 00:48:09 -07:00
Jessica Paquette 604e33e83a [AArch64][GlobalISel] Look through constants when selection stores of 0
Very minor code size improvements (hits 8 times in Bullet at -O3), but still
something.

Also very minor NFC change to make sure we only search for a 0 constant when
selecting a store. Before, we'd do this for loads as well.

Differential Revision: https://reviews.llvm.org/D84573
2020-07-24 22:46:14 -07:00
Jessica Paquette fcc55c0952 [AArch64][GlobalISel] Use wzr/xzr for 16 and 32 bit stores of zero
We weren't performing this optimization on 16 and 32 bit stores. SDAG happily
does this though.

e.g. https://godbolt.org/z/cWocKr

This saves about 0.2% in code size on CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D84568
2020-07-24 17:15:20 -07:00
Amara Emerson f320f83f3a [AArch64][GlobalISel] Promote G_UITOFP vector operands to same elt size as result.
Fixes legalization failures.
2020-07-24 17:00:50 -07:00
Matt Arsenault d26526fd09 AArch64: Use Register 2020-07-22 14:14:44 -04:00
Matt Arsenault 0c92bfa4b8 GlobalISel: Don't use virtual for distinguishing arg handlers
There's no reason to involve the hassle of a virtual method targets
have to override for a simple boolean.

Not sure exactly what's going on with Mips, but it seems to define its
own totally separate handler classes.
2020-07-22 14:14:43 -04:00
Matt Arsenault b98f902f18 GlobalISel: Restructure argument lowering loop in handleAssignments
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
2020-07-22 13:31:11 -04:00
Amara Emerson f1ae96d9bf [AArch64][GlobalISel] Fix TLS accesses clobbering registers incorrectly.
This was happening because the BLR didn't have a use of the X0 arg register,
which would end up being re-used in high reg pressure situations.
The change also avoids hard coding the use of X0 for the sequence except to
copy the value for the call. ld64 should still be able to optimize it.

rdar://65438258
2020-07-21 16:01:17 -07:00
Tim Northover 88464a55b4 AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms
It's useful for a debugger to be able to distinguish an @llvm.debugtrap
from a (noreturn) @llvm.trap, so this extends the existing Windows
behaviour to other platforms.
2020-07-20 10:31:26 +01:00
Matt Arsenault 9ff310d5bf AArch64: Fix unused variables 2020-07-10 15:12:25 -04:00
David Sherwood 79d34a5a1b [SVE][CodeGen] Fix bug when falling back to DAG ISel
In an earlier commit 584d0d5c17 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.

I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.

Differential Revision: https://reviews.llvm.org/D82524
2020-07-07 09:23:04 +01:00
Amara Emerson 97a34b5f8d [AArch64][GlobalISel] Fix extended shift addressing mode selection not handling sxth.
The complex pattern for extended shift offsets only allow sxtw as the extend,
not sxth. Our equivalent function to do this was not rejecting SXTH so we
were miscompiling. This was exposed by D81992.
2020-06-25 17:24:32 -07:00
Jessica Paquette 7fb84dff69 [AArch64][GlobalISel] Port buildvector -> dup pattern from AArch64ISelLowering
Given this:

```
%x:_(<n x sK>) = G_BUILD_VECTOR %lane, ...
...
%y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...)
```

We can produce:

```
%y:_(<n x sK) = G_DUP %lane(sK)
```

Doesn't seem to be too common, but AArch64ISelLowering attempts to do this
before trying to produce a DUPLANE. Might as well port it.

Also make it so that when the splat has an undef mask, we try setting it to
0. SDAG does this, and it makes sure that when we get the build vector operand,
we actually get a source operand.

Differential Revision: https://reviews.llvm.org/D81979
2020-06-25 14:19:06 -07:00
Amara Emerson fceadbcb33 [AArch64][GlobalISel] Improve codegen for some constant vectors by using constant pool loads.
There's more smarts in AArch64ISelLowering that we don't have yet, but this
change incrementally improves some of the more common patterns. I think future
iterations will want to use some combination of PostLegalizerCombiner and the
selector to catch the other cases.

Differential Revision: https://reviews.llvm.org/D82340
2020-06-23 19:23:47 -07:00
Eli Friedman a2caa3b614 Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Amara Emerson 1feeecf224 [AArch64][GlobalISel] Make G_SEXT_INREG legal and add selection support.
We were defaulting to the lower action for this, resulting in SHL+ASHR
sequences. On AArch64 we can do this in one instruction for an arbitrary
extension using SBFM as we do for G_SEXT.

Differential Revision: https://reviews.llvm.org/D81992
2020-06-19 13:20:41 -07:00
David Sherwood 584d0d5c17 [SVE] Fall back on DAG ISel at -O0 when encountering scalable types
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:

1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
   vectors.

For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.

I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to

  CodeGen/AArch64/GlobalISel/arm64-fallback.ll

that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.

Differential Revision: https://reviews.llvm.org/D81557
2020-06-19 10:57:00 +01:00
Matt Arsenault 7f8b2e1b91 GlobalISel: Pass LegalizerHelper to custom legalize callbacks
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.

This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.

I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.

This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
2020-06-18 17:17:38 -04:00
Daniel Sanders e35ba09961 [gicombiner] Allow generated combiners to store additional members
Summary:
Adds the ability to add members to a generated combiner via
a State base class. In the current AArch64PreLegalizerCombiner
this is used to make Helper available without having to
provide it to every call.

As part of this, split the command line processing into a
separate object so that it still only runs once even though
the generated combiner is constructed more frequently.

Depends on D81862

Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81863
2020-06-16 14:47:04 -07:00
Jessica Paquette 7caa9caa80 [AArch64][GlobalISel] Avoid creating redundant ubfx when selecting G_ZEXT
When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend.

If the instruction feeding into the G_ZEXT implicitly zero extends the high
half of the register, we can just emit a SUBREG_TO_REG instead.

Differential Revision: https://reviews.llvm.org/D81897
2020-06-16 09:50:47 -07:00