Commit Graph

10782 Commits

Author SHA1 Message Date
Nikita Popov 1cc4f8d172 [ARM] Expand vector reduction intrinsics on soft float
Followup to D73135. If the target doesn't have hard float (default
for ARM), then we assert when trying to soften the result of vector
reduction intrinsics. This patch marks these for expansion as well.
(A bit odd to use vectors on a target without hard float ... but
that's where you end up if you expose target-independent vector types.)

Differential Revision: https://reviews.llvm.org/D73854
2020-02-03 18:49:12 +01:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Simon Moll 5c8ba508b2 [NFC] unsigned->Register in storeRegTo/loadRegFromStack
Summary:
This patch makes progress on the 'unsigned -> Register' rewrite for
`TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`.

Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka

Reviewed By: arsenm

Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73870
2020-02-03 14:22:16 +01:00
Guillaume Chatelet fc19465965 [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
John Brawn b37d59353f [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.

The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.

Differential Revision: https://reviews.llvm.org/D73194
2020-02-03 12:59:12 +00:00
Simon Tatham 961530fdc9 [ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.

So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?

The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.

To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.

In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.

For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73786
2020-02-03 11:20:06 +00:00
Simon Tatham f8d4afc49a [ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq.
Summary:
These instructions generate a vector of consecutive elements starting
from a given base value and incrementing by 1, 2, 4 or 8. The `wdup`
versions also wrap the values back to zero when they reach a given
limit value. The instruction updates the scalar base register so that
another use of the same instruction will continue the sequence from
where the previous one left off.

At the IR level, I've represented these instructions as a family of
target-specific intrinsics with two return values (the constructed
vector and the updated base). The user-facing ACLE API provides a set
of intrinsics that throw away the written-back base and another set
that receive it as a pointer so they can update it, plus the usual
predicated versions.

Because the intrinsics return two values (as do the underlying
instructions), the isel has to be done in C++.

This is the first family of MVE intrinsics that use the `imm_1248`
immediate type in the clang Tablegen framework, so naturally, I found
I'd given it the wrong C integer type. Also added some tests of the
check that the immediate has a legal value, because this is the first
time those particular checks have been exercised.

Finally, I also had to fix a bug in MveEmitter which failed an
assertion when I nested two `seq` nodes (the inner one used to extract
the two values from the pair returned by the IR intrinsic, and the
outer one put on by the predication multiclass).

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73357
2020-02-03 11:20:06 +00:00
Simon Tatham cf7e98e6f7 [ARM,MVE] Add intrinsics for vdupq.
Summary:
The unpredicated case of this is trivial: the clang codegen just makes
a vector splat of the input, and LLVM isel is already prepared to
handle that. For the predicated version, I've generated a `select`
between the same vector splat and the `inactive` input parameter, and
added new Tablegen isel rules to match that pattern into a predicated
`MVE_VDUP` instruction.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73356
2020-02-03 11:20:06 +00:00
David Green d50e188a07 Revert "[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS"
This reverts commit e34801c8e6 and the followup due to multiple
problems.

I've tried to keep the tests and RDA parts where possible, as those
still seem useful.
2020-02-02 13:24:05 +00:00
Jay Foad 2a1b5af299 [GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister
Summary:
As a side effect some redundant copies of constant values are removed by
CSEMIRBuilder.

Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73789
2020-01-31 17:07:16 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Nikita Popov 70d345e687 [AArch64][ARM] Always expand ordered vector reductions (PR44600)
fadd/fmul reductions without reassoc are lowered to
VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization
support. Until that is in place, expand these intrinsics on
ARM and AArch64. Other targets always expand the vector reduction
intrinsics.

Additionally expand fmax/fmin reductions without nonan flag on
AArch64, as the backend asserts that the flag is present when
lowering VECREDUCE_FMIN/FMAX.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44600.

Differential Revision: https://reviews.llvm.org/D73135
2020-01-30 18:40:24 +01:00
Sam Parker 06e12893ff [ARM][LowOverheadLoops] Skip debug values
While iterating through the loop, don't inspect any dbg values.

Differential Revision: https://reviews.llvm.org/D73688
2020-01-30 11:51:58 +00:00
Sam Parker 6726d67bfd [ARM][LowOverheadLoops] Check scalar predicates
When trying to remove the loop iteration count, check that the
instruction will always execute.

Differential Revision: https://reviews.llvm.org/D73682
2020-01-30 09:13:04 +00:00
Huihui Zhang 2ec954579a Revert "[ARM] Fix data race on RegisterBank initialization."
There looks to be buildbot failure related.

This reverts commit 91618d940e.
2020-01-29 11:15:27 -08:00
Huihui Zhang 91618d940e [ARM] Fix data race on RegisterBank initialization.
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has data race, use llvm::call_once instead.

This is continuing work of D73587.

Reviewers: arsenm, rovka, dsanders, t.p.northover, efriedma, apazos

Reviewed By: arsenm

Subscribers: wdng, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73605
2020-01-29 10:15:37 -08:00
Sjoerd Meijer f719b0ba13 [MVE][MC] evaluateBranch: add missing MVE opcode
This adds some missing MVE opcodes to evaluateBranch, which results in
llvm-objdump being able to print the PC relative branch target as an
annotation.

Differential Revision: https://reviews.llvm.org/D73553
2020-01-29 13:19:45 +00:00
Sam Parker ac30ea2f87 [RDA][ARM] Move functionality into RDA
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt

And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove

Both of these have some wrappers too to make them more convienent to
use.

Differential Revision: https://reviews.llvm.org/D73460
2020-01-29 03:27:47 -05:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
David Green 8a6b948eb5 [MVE] Fixup order of gather writeback intrinsic outputs
The MVE_VLDRWU32_qi_pre gather loads, like the other _pre/_post mve
loads returns the writeback as result 0, the value as result 1. The llvm
ir intrinsic seems to have this the other way around though, and so when
lowering from one to the other we need to switch the first two outputs.

I've also fixed up the types of _pre/_post on normal MVE loads. There we
were already getting the values the right way around, just not for the
types. I don't believe this was causing anything to go wrong, but it was
very confusing to read in the debug output.

Differential Revision: https://reviews.llvm.org/D73370
2020-01-27 14:08:06 +00:00
Sjoerd Meijer b567ff2fa0 [ARM][MVE] Tail-predication: support constant trip count
We had support for runtime trip count values, but not constants, and this adds
supports for that.

And added a minor optimisation while I was add it: don't invoke Cleanup when
there's nothing to clean up.

Differential Revision: https://reviews.llvm.org/D73198
2020-01-27 11:05:26 +00:00
Sam Parker 6c2df5d14f [ARM][LowOverheadLoops] Dont ignore VCTP
When expanding the LoopStart, we try to remove the iteration count
calculation. However, if part of the calculation was also used to
calculate the number of elements we could end up deleting
instructions that were required to feed DLSTP/WLSTP.

Differential Revision: https://reviews.llvm.org/D73275
2020-01-27 10:59:12 +00:00
David Green b535aa405a [ARM] Use reduction intrinsics for larger than legal reductions
The codegen for splitting a llvm.vector.reduction intrinsic into parts
will be better than the codegen for the generic reductions. This will
only directly effect when vectorization factors are specified by the
user.

Also added tests to make sure the codegen for larger reductions is OK.

Differential Revision: https://reviews.llvm.org/D72257
2020-01-24 17:07:24 +00:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Sam Parker ddbc077895 [NFC][ARM] Make some params members instead.
Add MachineLoopInfo and ReachingDefAnalysis as members of
LowOverheadLoop instead of passing them several times to different
methods.
2020-01-24 10:19:17 +00:00
Fangrui Song 22467e2595 Add function attribute "patchable-function-prefix" to support -fpatchable-function-entry=N,M where M>0
Similar to the function attribute `prefix` (prefix data),
"patchable-function-prefix" inserts data (M NOPs) before the function
entry label.

-fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry)
will look like:

```
  .type	foo,@function
.Ltmp0:               # @foo
  nop
foo:
.Lfunc_begin0:
  # optional `bti c` (AArch64 Branch Target Identification) or
  # `endbr64` (Intel Indirect Branch Tracking)
  nop

  .section  __patchable_function_entries,"awo",@progbits,get,unique,0
  .p2align  3
  .quad .Ltmp0
```

-fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable
placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html):

```
(a)         (b)

func:       func:
.Ltmp0:     bti c
  bti c     .Ltmp0:
  nop       nop
```

(a) needs no additional code. If the consensus is to go for (b), we will
need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp .

Differential Revision: https://reviews.llvm.org/D73070
2020-01-23 17:02:27 -08:00
Jay Foad b482e1bfe2 [CodeGen] Make use of MachineInstrBuilder::getReg
Reviewers: arsenm

Subscribers: wdng, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73262
2020-01-23 13:38:13 +00:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Simon Tatham 4321c6af28 [ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)

The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.

The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.

This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.

Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 11:53:52 +00:00
Simon Tatham 772e493193 [ARM,MVE] Revise immediate VBIC/VORR to look more like NEON.
Summary:
In NEON, the immediate forms of VBIC and VORR are each represented as
a single MC instruction, which takes its immediate operand already
encoded in a NEON-friendly format: 8 data bits, plus some control bits
indicating how to expand them into a full vector.

In MVE, we represented immediate VBIC and VORR as four separate MC
instructions each, for an 8-bit immediate shifted left by 0, 8, 16 or
24 bits. For each one, the value of the immediate operand is in the
'natural' form, i.e. the numerical value that would actually be BICed
or ORRed into each vector lane (and also the same value shown in
assembly). For example, MVE_VBICIZ16v4i32 takes an operand such as
0xab0000, which NEON would represent as 0xab | (control bits << 8).

The MVE approach is superficially nice (it makes assembly input and
output easy, and it's also nice if you're manually constructing
immediate VBICs). But it turns out that it's better for isel if we
make the NEON and MVE instructions work the same, because the
ARMISD::VBICIMM and VORRIMM node types already encode their immediate
into the NEON format, so it's easier if we can just use it.

Also, this commit reduces the total amount of code rather than
increasing it, which is surely an indication that it really is simpler
to do it this way!

Reviewers: dmgreen, ostannard, miyuki, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73205
2020-01-23 11:53:52 +00:00
David Green 58991ba773 [ARM] Mark MVE loads/store as not having side effects
The hasSideEffect parameter is usually automatically inferred from
instruction patterns. For some of our MVE instructions, we do not have
patterns though, such as for the pre/post inc loads and stores. This
instead specifies the flag manually on the base MVE_VLDRSTR_base
tablegen class, making sure we get this correct.

This can help with scheduling multiple loads more optimally. Here I've
added a unittest as a more direct form of testing.

Differential Revision: https://reviews.llvm.org/D73117
2020-01-22 17:56:55 +00:00
David Green e9c198278e [ARM] Basic gather scatter cost model
This is a very basic MVE gather/scatter cost model, based roughly on the
code that we will currently produce. It does not handle truncating
scatters or extending gathers correctly yet, as it is difficult to tell
that they are going to be correctly extended/truncated from the limited
information in the cost function.

This can be improved as we extend support for these in the future.

Based on code originally written by David Sherwood.

Differential Revision: https://reviews.llvm.org/D73021
2020-01-22 14:41:38 +00:00
Sam Parker c04b9ba595 [ARM][MVE] Clear MaskedInsts vector
In MVETailPredication, clear the vector before running on a new loop.

Differential Revision: https://reviews.llvm.org/D73048
2020-01-22 04:27:36 -05:00
Anna Welker ff9877ce34 [ARM][MVE] Enable masked scatter
Extends the gather/scatter pass in MVEGatherScatterLowering.cpp to
enable the transformation of masked scatters into calls to MVE's masked
scatter intrinsic.

Differential Revision: https://reviews.llvm.org/D72856
2020-01-21 09:46:26 +00:00
Mark Murray b10a0eb04a [ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments.
Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken.

Reviewers: dmgreen, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72830
2020-01-20 14:33:26 +00:00
Simon Tatham f3e73e88fd [ARM,MVE] Fix confusing MC names for MVE VMINA/VMAXA insns.
Summary:
A recent commit accidentally defined names like `MVE_VMAXAs8` as
instances of the multiclass `MVE_VMINA`, and vice versa. This has no
effect on the test suite, because nothing directly refers to those
instruction names (the isel patterns are generated in Tablegen using
`!cast<Instruction>(NAME)` inside a lower-level multiclass). But it
means that `llvm-mc -show-inst` was listing VMAXA as VMINA, and it
would also affect any further draft code gen patches that use those
instruction ids.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73034
2020-01-20 13:25:52 +00:00
Sjoerd Meijer 8cba99e2aa [ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.

Differential Revision: https://reviews.llvm.org/D72714
2020-01-20 10:26:36 +00:00
David Green ff2e67a4f7 [ARM] MVE VLDn postinc
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.

It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.

Differential Revision: https://reviews.llvm.org/D71194
2020-01-20 06:57:07 +00:00
David Green 5e51f75542 [ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.

Differential Revision: https://reviews.llvm.org/D70790
2020-01-20 06:57:07 +00:00
Fangrui Song 8e8a75ad50 [TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
2020-01-19 14:20:37 -08:00
Sam Parker 42350cd893 [ARM][MVE] Tail Predicate IsSafeToRemove
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.

As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.

Differential Revision: https://reviews.llvm.org/D71837
2020-01-17 13:19:14 +00:00
Jay Foad 63f73545dd [GlobalISel] Pass MachineOperands into MachineIRBuilder helper methods
Reviewers: arsenm, aditya_nandakumar, aemerson

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72849
2020-01-16 16:04:21 +00:00
Sam Parker 760b175109 [ARM][LowOverheadLoops] Update liveness info
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:

After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).

Differential Revision: https://reviews.llvm.org/D72131
2020-01-16 15:44:25 +00:00
Anna Welker c24cf97960 [ARM][MVE] Enable extending gathers
Enables the masked gather pass to
create extending masked gathers.

Differential Revision: https://reviews.llvm.org/D72451
2020-01-16 15:24:54 +00:00
Mark Murray da9d57d2c2 [ARM][MVE][Intrinsics] Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics.
Summary: Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics and unit tests.

Reviewers: simon_tatham, miyuki, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72761
2020-01-15 17:20:15 +00:00
Tom Stellard 0dbcb36394 CMake: Make most target symbols hidden by default
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%.  This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439
2020-01-14 19:46:52 -08:00
Sjoerd Meijer a08c0adee0 [ARM][MVE] VTP Block Pass fix
Fix a missing and broken test: 2 VPT blocks predicated on the same VCMP
instruction that can be folded. The problem was that for each VPT block, we
record the predicate statements with a list, but the same instruction was added
twice. Thus, we were running in an assert trying to remove the same instruction
twice. To avoid this the instructions are now recorded with a set.

Differential Revision: https://reviews.llvm.org/D72699
2020-01-14 16:10:55 +00:00
Simon Tatham 71d5454b37 [ARM,MVE] Use the new Tablegen `defvar` and `if` statements.
Summary:
This cleans up a lot of ugly `foreach` bodges that I've been using to
work around the lack of those two language features. Now they both
exist, I can make then all into something more legible!

In particular, in the common pattern in `ARMInstrMVE.td` where a
multiclass defines an `Instruction` instance plus one or more `Pat` that
select it, I've used a `defvar` to wrap `!cast<Instruction>(NAME)` so
that the patterns themselves become a little more legible.

Replacing a `foreach` with a `defvar` removes a level of block
structure, so several pieces of code have their indentation changed by
this patch. Best viewed with whitespace ignored.

NFC: the output of `llvm-tblgen -print-records` on the two affected
Tablegen sources is exactly identical before and after this change, so
there should be no effect at all on any of the other generated files.

Reviewers: MarkMurrayARM, miyuki

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72690
2020-01-14 12:08:03 +00:00
Sam Parker e27632c302 [ARM][LowOverheadLoops] Allow all MVE instrs.
We have a whitelist of instructions that we allow when tail
predicating, since these are trivial ones that we've deemed need no
special handling. Now change ARMLowOverheadLoops to allow the
non-trivial instructions if they're contained within a valid VPT
block. Since a valid block is one that is predicated upon the VCTP so
we know that these non-trivial instructions will still behave as
expected once the implicit predication is used instead.

This also fixes a previous test failure.

Differential Revision: https://reviews.llvm.org/D72509
2020-01-14 12:03:58 +00:00
Sam Parker bad6032bc1 [ARM][LowOverheadLoops] Change predicate inspection
Use the already provided helper function to get the operand type so
that we can detect whether the vpr is being used as a predicate or
not. Also use existing helpers to get the predicate indices when we
converting the vpt blocks. This enables us to support both types of
vpr predicate operand.

Differential Revision: https://reviews.llvm.org/D72504
2020-01-14 11:47:34 +00:00
Diogo Sampaio d94d079a6a [ARM][Thumb2] Fix ADD/SUB invalid writes to SP
Summary:
This patch fixes pr23772  [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).

Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb

Reviewed By: efriedma

Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70680
2020-01-14 11:47:19 +00:00
Sam Parker e73b20c57d [ARM][MVE] Disallow VPSEL for tail predication
Due to the current way that we collect predicated instructions, we
can't easily handle vpsel in tail predicated loops. There are a
couple of issues:
1) It will use the VPR as a predicate operand, but doesn't have to be
   instead a VPT block, which means we can assert while building up
   the VPT block because we don't find another VPST to being a new
   one.
2) VPSEL still requires a VPR operand even after tail predicating,
   which means we can't remove it unless there is another
   instruction, such as vcmp, that can provide the VPR def.

The first issue should be a relatively simple fix in the logic of the
LowOverheadLoops pass, whereas the second will require us to
represent the 'implicit' tail predication with an explicit value.

Differential Revision: https://reviews.llvm.org/D72629
2020-01-14 11:41:17 +00:00
Anna Welker 72ca86fd34 [ARM][MVE] Masked gathers from base + vector of offsets
Enables the masked gather pass to create a masked
gather loading from a base and vector of offsets.
This also enables v8i16 and v16i8 gather loads.

Differential Revision: https://reviews.llvm.org/D72330
2020-01-14 10:33:52 +00:00
David Green 90555d9253 [Scheduler] Remove superfluous casts. NFC 2020-01-13 16:34:13 +00:00
Sjoerd Meijer add04b9653 ARMLowOverheadLoops: return earlier to avoid printing irrelevant dbg msg. NFC 2020-01-13 10:24:10 +00:00
Fangrui Song 6fdd6a7b3f [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.

If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
2020-01-11 13:34:52 -08:00
Craig Topper bb2553175a [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.

This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.

Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn

Reviewed By: efriedma

Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72536
2020-01-10 19:30:08 -08:00
Sam Parker 3772ea9dd9 [ARM][MVE] Tail predicate VMAX,VMAXA,VMIN,VMINA
Add the MVE min and max instructions to our tail predication
whitelist.

Differential Revision: https://reviews.llvm.org/D72502
2020-01-10 14:24:25 +00:00
Sjoerd Meijer 4569f63ae1 ARMLowOverheadLoops: a few more dbg msgs to better trace rejected TP loops. NFC. 2020-01-10 14:11:52 +00:00
Diogo Sampaio b1bb5ce96d Reverting, broke some bots. Need further investigation.
Summary: This reverts commit 8c12769f30.

Reviewers:

Subscribers:
2020-01-10 13:40:41 +00:00
Diogo Sampaio 8c12769f30 [ARM][Thumb2] Fix ADD/SUB invalid writes to SP
Summary:
This patch fixes pr23772  [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).

Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma

Reviewed By: efriedma

Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70680
2020-01-10 11:25:44 +00:00
Matt Arsenault b4a647449f TableGen/GlobalISel: Add way for SDNodeXForm to work on timm
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.

Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.

Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.

Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.

Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.

One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
2020-01-09 17:37:52 -05:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Sam Parker 9c91d79dad [NFC][ARM] LowOverheadLoop comments
Add a comment describing the dependencies of the pass.
2020-01-09 12:54:01 +00:00
Sam Parker 15c7fa4d11 [ARM][MVE] Don't unroll intrinsic loops.
We don't unroll vector loops for MVE targets, but we miss the case
when loops only contain intrinsic calls. So just move the logic a
bit to catch this case.

Differential Revision: https://reviews.llvm.org/D72440
2020-01-09 11:57:34 +00:00
Sam Parker 1cba261239 Revert "[ARM][LowOverheadLoops] Update liveness info"
This reverts commit e93e0d413f.

There's some ordering problems on some on the buildbots which needs
investigating.
2020-01-09 09:22:06 +00:00
Sam Parker e93e0d413f [ARM][LowOverheadLoops] Update liveness info
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).

Differential Revision: https://reviews.llvm.org/D72131
2020-01-09 08:33:47 +00:00
Simon Tatham dac7b23cc3 [ARM,MVE] Intrinsics for variable shift instructions.
This batch of intrinsics fills in all the shift instructions that take
a variable shift distance in a register, instead of an immediate. Some
of these instructions take a single shift distance in a scalar
register and apply it to all lanes; others take a vector of per-lane
distances.

These instructions are all basically one family, varying in whether
they saturate out-of-range values, and whether they round when bits
are shifted off the bottom. I've implemented them at the IR level by a
much smaller family of IR intrinsics, which take flag parameters to
indicate saturating and/or rounding (along with the usual one to
specify signed/unsigned integers).

An oddity is that all of them are //left// shift instructions – but if
you pass a negative shift count, they'll shift right. So the vector
shift distances are always vectors of //signed// integers, regardless
of whether you're considering the other input vector to be of signed
or unsigned. Also, even the simplest `vshlq` instruction in this
family (neither saturating nor rounding) has to be implemented as an
IR intrinsic, because the ordinary LLVM IR `shl` operation would
consider an out-of-range shift count to be undefined behavior.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72329
2020-01-08 14:42:24 +00:00
Simon Tatham 3100480925 [ARM,MVE] Intrinsics for partial-overwrite imm shifts.
This batch of intrinsics covers two sets of immediate shift
instructions, which have in common that they only overwrite part of
their output register and so they need an extra input giving its
previous value.

The VSLI and VSRI instructions shift each lane of the input vector
left or right just as if they were normal immediate VSHL/VSHR, but
then they only overwrite the output bits that correspond to actual
shifted bits of the input. So VSLI will leave the low n bits of each
output lane unchanged, and VSRI the same with the top n bits.

The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input
vector of 2n-bit integers, shift each lane right by a constant, and
then narrowing the shifted result to only n bits. So they only
overwrite half of the n-bit lanes in the output register, and the B/T
suffix indicates whether it's the bottom or top half of each 2n-bit
lane.

I've implemented the whole of the latter family using a single IR
intrinsic `vshrn`, which takes a lot of i32 parameters indicating
which instruction it expands to (by specifying signedness of the input
and output types, whether it saturates and/or rounds, etc).

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72328
2020-01-08 14:42:24 +00:00
Anna Welker 346f6b54bd [ARM][MVE] Enable masked gathers from vector of pointers
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.

Differential Revision: https://reviews.llvm.org/D71743
2020-01-08 13:43:12 +00:00
Sjoerd Meijer e34801c8e6 [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS
This is a recommit of D71330, but with a few things fixed and changed:

1) ReachingDefAnalysis: this was not running with optnone as it was checking
skipFunction(), which other analysis passes don't do. I guess this is a
copy-paste from a codegen pass.
2) VPTBlockPass: here I've added skipFunction(), because like most/all
optimisations, we don't want to run this with optnone.

This fixes the issues with the initial/previous commit: the VPTBlockPass was
running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was
crashing querying ReachingDefAnalysis.

I've added test case mve-vpt-block-optnone.mir to check that we don't run
VPTBlock with optnone.

Differential Revision: https://reviews.llvm.org/D71470
2020-01-07 13:54:47 +00:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
Fangrui Song 3d87d0b925 [MC] Add parameter `Address` to MCInstrPrinter::printInstruction
Follow-up of D72172.

Reviewed By: jhenderson, rnk

Differential Revision: https://reviews.llvm.org/D72180
2020-01-06 20:44:14 -08:00
Fangrui Song aa708763d3 [MC] Add parameter `Address` to MCInstPrinter::printInst
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.

It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.

Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.

The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.

In any case, downstream projects which don't know `Address` can pass 0 as
the argument.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72172
2020-01-06 20:42:22 -08:00
David Green f88d52728b [ARM] Use the correct opcodes for Thumb2 segmented stack frame lowering
The segmented stack lowering code appears to be using ARM opcodes under
Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR
seems wrong. Either way, using the correct thumb vs arm opcodes is more
correct.

Differential Revision: https://reviews.llvm.org/D72074
2020-01-06 16:38:49 +00:00
David Green 0eb981b8ce [ARM] Use correct TRAP opcode for thumb in FastISel
We were previously unconditionally using the ARM::TRAP opcode, even
under Thumb. My understanding is that these are essentially the same
thing (they both result in a trap under Thumb), but the ARM::TRAP opcode
is marked as requiring IsARM, so it is more correct to use ARM::tTRAP.

Differential Revision: https://reviews.llvm.org/D72075
2020-01-06 16:38:49 +00:00
Simon Tatham 34817e04fe [ARM,MVE] Fix many signedness errors in MVE intrinsics.
Summary:
Running an end-to-end test last week I noticed that a lot of the ACLE
intrinsics that operate differently on vectors of signed and unsigned
integers were ending up generating the signed version of the
instruction unconditionally. This is because the IR intrinsics had no
way to distinguish signed from unsigned: the LLVM type system just
calls them both `v8i16` (or whatever), so you need either separate
intrinsics for signed and unsigned, or a flag parameter that tells
ISel which one to choose.

This patch fixes all the problems of that kind that I've noticed, by
adding an i32 flag parameter to many of the IR intrinsics which is set
to 1 for unsigned (matching the existing practice in cases where we
got it right), and conditioning all the isel patterns on that flag. So
the fundamental change is in `IntrinsicsARM.td`, changing the
low-level IR intrinsics API; there are knock-on changes in
`arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the
modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the
new unsigned flags). The rest of this patch is boringly updating tests.

Reviewers: dmgreen, miyuki, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72270
2020-01-06 16:33:16 +00:00
Simon Tatham b99ef32d04 [ARM,MVE] Generate the right instruction for vmaxnmq_m_f16.
Summary:
Due to a copy-paste error in the isel patterns, the predicated version
of this intrinsic was expanding to the `VMAXNMT.F32` instruction
instead of `VMAXNMT.F16`. Similarly for vminnm.

Reviewers: dmgreen, miyuki, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72269
2020-01-06 16:28:20 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Sjoerd Meijer 0efc9e5a8c [ARM][MVE] More MVETailPredication debug messages. NFC.
I've added a few more debug messages to MVETailPredication because I wanted to
trace better which instructions are added/removed. And while I was at it, I
factored out one function which I thought was clearer, and have added some
comments to describe better the flow between MVETailPredication and
ARMLowOverheadLoops.

Differential Revision: https://reviews.llvm.org/D71549
2020-01-06 09:56:02 +00:00
Fangrui Song 5511861e6d [MC][ARM] Delete MCSection::HasData and move SHF_ARM_PURECODE logic to ARMELFObjectWriter::addTargetSectionFlags
This simplifies the generic interface and also makes SHF_ARM_PURECODE
more robust (fixes a TODO). Inspecting MCDataFragment contents covers
more cases than MCObjectStreamer::EmitBytes.
2020-01-05 14:20:34 -08:00
David Green fb8c9a339a [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors
This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing
the target independent code to handle more folds in more situations (for
example if the fast math flags are present, but the global
AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx
into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately
if needed.

Differential Revision: https://reviews.llvm.org/D72139
2020-01-05 11:24:04 +00:00
David Green c15a56f61a [ARM] Fill in FP16 FMA patterns
This adds fp16 variants of all the fma patterns in the ARM backend.

Differential Revision: https://reviews.llvm.org/D72138
2020-01-05 11:24:04 +00:00
Florian Hahn b8a3c34eee Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."
This reverts commit 51ef53f3bd, as it
breaks some bots.
2020-01-04 18:44:38 +00:00
Florian Hahn 51ef53f3bd [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-01-04 18:29:35 +00:00
Matt Arsenault 21309eafde GlobalISel: Add type argument to getRegBankFromRegClass
AMDGPU can't unambiguously go back from the selected instruction
register class to the register bank without knowing if this was used
in a boolean context.
2020-01-03 16:25:10 -05:00
Reid Kleckner 9c2b72821b Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bd (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
Sam Parker 8f6a67632a [ARM][NFC] Move tail predication checks
Extract the tail predication validation checks out into their own
LowOverHeadLoop method.
2020-01-03 03:50:54 -05:00
QingShan Zhang 2133d3c558 [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal'
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.

Differential Revision: https://reviews.llvm.org/D70000
2020-01-03 03:26:41 +00:00
Matt Arsenault 0d9f919b73 DAG: Use TargetConstant for FENCE operands 2020-01-02 17:16:10 -05:00
Diogo Sampaio f33fd9648c [ARM][Thumb][FIX] Add unwinding information to t4
Summary:
Add missing part of patch D71361. Now that the stack-frame
can be operated using a addw/subw instruction, they should
appear in the unwinding list.

Reviewers: dmgreen, efriedma

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72000
2019-12-30 15:59:48 +00:00
David Green b4abe7afbf [ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.

Differential Revision: https://reviews.llvm.org/D70997
2019-12-30 12:58:14 +00:00
Diogo Sampaio 8232497c31 [ARM][THUMB2] Allow emitting T3 types of add and sub
Summary:
This patch allows to emit thumb2 add and sub
instructions with 12 bit immediates in the
emitT2RegPlusImmediate function.
- Splitting parts of the D70680

Reviewers: eli.friedman, olista01, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71361
2019-12-30 11:03:58 +00:00
Fangrui Song 5edb40c022 [SelectionDAG] Disallow indirect "i" constraint
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.

They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
2019-12-29 16:50:42 -08:00
Martin Storsjö b774aa1011 [ARM] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Instead check the shouldAssumeDSOLocal method and load the address
from a COFF stub.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71720
2019-12-23 12:13:49 +02:00
Victor Campos 2ff5a596cb Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit bbcf1c3496.
2019-12-20 18:08:24 +00:00
Sam Parker acbc9aed72 [ARM][MVE] Fixes for tail predication.
1) Fix an issue with the incorrect value being used for the number of
   elements being passed to [d|w]lstp. We were trying to check that
   the value was available at LoopStart, but this doesn't consider
   that the last instruction in the block could also define the
   register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
   insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
   generating a low-overhead loop when a mov lr could have been
   the last instruction in the block.
4) Fix up some instruction attributes so that not all the
   low-overhead loop instructions are labelled as branches and
   terminators - as this is not true for dls/dlstp.

Differential Revision: https://reviews.llvm.org/D71609
2019-12-20 09:34:18 +00:00
Sam Parker 4042518335 [ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
   internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
   internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
   vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
   and all instructions within the block are predicated upon in.

The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
   instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
   internal vpr def. Need insert a new vpst to predicate the
   remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
   so adjust the size of the mask on the vpst.

Differential Revision: https://reviews.llvm.org/D71107
2019-12-20 08:42:11 +00:00
Sam Parker 4f0fe6b97e [ARM][MVE] Tail predicate bottom/top muls.
Add VMULL and VQDMULL variants to our tail predication white list.

Differential Revision: https://reviews.llvm.org/D71465
2019-12-20 08:33:01 +00:00
Victor Campos bbcf1c3496 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2019-12-19 11:23:01 +00:00
Anna Welker 7cd1cfdd6b [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.

Differential Revision: https://reviews.llvm.org/D71610
2019-12-18 09:14:39 +00:00
Sjoerd Meijer 049f9672d8 [ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC.
In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have
quite a few helper functions all looking at the opcodes of MVE instructions.
This moves all these utility functions to ARMBaseInstrInfo.

Diferential Revision: https://reviews.llvm.org/D71426
2019-12-16 09:13:59 +00:00
Momchil Velikov 8e8e3181aa [ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.

Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.

Differential Revision: https://reviews.llvm.org/D71266
2019-12-13 18:19:40 +00:00
Fangrui Song f16377f11c [ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062 2019-12-13 09:26:26 -08:00
Sam Parker 84593f058b [ARM][MVE] Make VPT invalid for tail predication
We've been marking VPT incompatible instructions as invalid for tail
predication too, though this may not strictly be true. VPT are
incompatible and, unless its the first predicate def in a loop,
they shouldn't be compatible for tail predication either.

Differential Revision: https://reviews.llvm.org/D71410
2019-12-13 15:01:08 +00:00
Mikhail Maltsev 99581fd4c8 [ARM][MVE] Add vector reduction intrinsics with two vector operands
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH

Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.

Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71062
2019-12-13 13:17:29 +00:00
Simon Tatham 25305a9311 [ARM][MVE] Add intrinsics for more immediate shifts.
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.

`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.

I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.

(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71458
2019-12-13 13:07:39 +00:00
John Brawn 01ba201abc [ARM] Add custom strict fp conversion lowering when non-strict is custom
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.

This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.

Differential Revision: https://reviews.llvm.org/D71127
2019-12-13 13:00:00 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sjoerd Meijer e91420e17d Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."
This reverts commit 9468e3334b.

There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
2019-12-13 11:56:44 +00:00
Mark Murray 228c74076d [ARM][MVE][Intrinsics] Add *_x() variants of my *_m() intrinsics.
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.

The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.

To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.

The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.

Reviewers: dmgreen, miyuki, ostannard, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71421
2019-12-13 11:51:23 +00:00
Sjoerd Meijer 9468e3334b [ARM][MVE] findVCMPToFoldIntoVPS. NFC.
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.

Differential Revision: https://reviews.llvm.org/D71330
2019-12-12 15:41:20 +00:00
Sam Parker 1274ac3dc2 [ARM][MVE] Sink vector shift operand
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00
Sam Parker f8ff3bf55b Revert "[ARM][MVE] Sink vector shift operand"
This reverts commit e0b966643f.

Instruction selection is failing with expensive checks.
2019-12-12 07:52:57 +00:00
Sam Parker e0b966643f [ARM][MVE] Sink vector shift operand
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 07:35:21 +00:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Reid Kleckner 85ba5f637a Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.

Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.

Split off from D71320

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D71381
2019-12-11 18:00:20 -08:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
Simon Tatham bd0f271c9e [ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen, MarkMurrayARM

Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-11 10:10:09 +00:00
Mikhail Maltsev e6d3261c67 [ARM][MVE] Refactor complex vector intrinsics [NFCI]
Summary:
This patch refactors instruction selection of the complex vector
addition, multiplication and multiply-add intrinsics, so that it is
now based on TableGen patterns rather than C++ code.

It also changes the first parameter (halving vs non-halving) of the
arm_mve_vcaddq IR intrinsic to match the corresponding instruction
encoding, hence it requires some changes in the tests.

The patch addresses David's comment in https://reviews.llvm.org/D71190

Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71245
2019-12-10 16:21:52 +00:00
Eric Christopher 9c6b7f68b8 Revert "[ARM][MVE] Add intrinsics for immediate shifts."
and two follow-on commits: one warning fix and one functionality.

As it's breaking at least the lto bot:

http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio

This reverts commits:

 8d70f3c933
 ff4dceef92
 d97b3e3e65
2019-12-09 16:47:38 -08:00
Mark Murray fc3417cb5a [ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics.
Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen, miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71198
2019-12-09 17:41:47 +00:00
Mark Murray 2eb61fa5d6 [ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT|POLY) intrinsics.
Summary: Add VMULL[BT]Q_(INT|POLY) intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71066
2019-12-09 17:41:47 +00:00
Simon Tatham 8d70f3c933 [ARM] Fix NEON failure introduced by D71065.
I rewrote the isel tablegen for MVE immediate shifts, and accidentally
removed the `let Predicates=[HasMVEInt]` that was wrapping the old
version, which seems to have allowed those rules to cause trouble on
non-MVE targets. That's what I get for only re-running the MVE tests.
2019-12-09 16:56:00 +00:00
Simon Tatham d97b3e3e65 [ARM][MVE] Add intrinsics for immediate shifts.
Summary:
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-09 15:44:09 +00:00
Mikhail Maltsev 0d1490bf6a [ARM][MVE] Add complex vector intrinsics
Summary:
This patch adds intrinsics for the following MVE instructions:
* VCADD, VHCADD
* VCMUL
* VCMLA

Each of the above 3 groups has a corresponding new LLVM IR intrinsic.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71190
2019-12-09 12:05:59 +00:00
David Green b1aba0378e [ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
2019-12-09 11:37:34 +00:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
David Green f008b5b8ce [ARM] Additional tests and minor formatting. NFC
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
2019-12-09 10:24:33 +00:00
David Stenberg 6965f835b4 [DebugInfo] Make describeLoadedValue() reg aware
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: ormris, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D70431
2019-12-09 10:47:49 +01:00
David Stenberg f3696533f2 Revert "[DebugInfo] Make describeLoadedValue() reg aware"
This reverts commit 3cd93a4efc.
I'll recommit with a well-formatted arcanist commit message.
2019-12-09 10:45:13 +01:00
David Stenberg 3cd93a4efc [DebugInfo] Make describeLoadedValue() reg aware
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
2019-12-09 10:44:17 +01:00
David Green 792fab343b [ARM] Attempt to use whole register vmovs for MVE shuffles.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509
2019-12-08 10:53:54 +00:00
David Green 3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
Alina Sbirlea c7faa68142 Revert "ARM-Darwin: keep the frame register reserved even if not updated."
This reverts commit a7d90af1be.

This revision came back as the root-cause for crashes in internal
ARM-IOS apps.
Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.
2019-12-06 10:59:26 -08:00
Simon Tatham 3fab4276cb [ARM][MVE] Fix copy-paste error in VQSHL instruction ids.
Summary:
The immediate forms of the MVE VQSHL instruction have MC names like
`MVE_VSLIimms8` and `MVE_VSLIimmu32`. Those names are confusing,
because VSLI is a completely different shift instruction with no
semantic relation to VQSHL. But it just happens to be defined
immediately before VQSHL in `ARMInstrMVE.td`, so this looks like a
copy-paste error. Renamed the ids to match the instruction name.

Reviewers: ostannard, dmgreen, MarkMurrayARM, miyuki

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71114
2019-12-06 15:23:23 +00:00
Mark Murray d3f62ceac0 [ARM][MVE][Intrinsics] Add VMULH/VRMULH intrinsics.
Summary: Add MVE VMULH/VRMULH intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70948
2019-12-04 14:27:12 +00:00
Sam Parker bc76dadb3c [CodeGen] Move ARMCodegenPrepare to TypePromotion
Convert ARMCodeGenPrepare into a generic type promotion pass by:
- Removing the insertion of arm specific intrinsics to handle narrow
  types as we weren't using this.
- Removing ARMSubtarget references.
- Now query a generic TLI object to know which types should be
  promoted and what they should be promoted to.
- Move all codegen tests into Transforms folder and testing using opt
  and not llc, which is how they should have been written in the
  first place...

The pass searches up from icmp operands in an attempt to safely
promote types so we can avoid generating unnecessary unsigned extends
during DAG ISel.

Differential Revision: https://reviews.llvm.org/D69556
2019-12-03 11:12:52 +00:00
David Green 469ee617a0 [ARM] Add ARMVCCThen to tablegen and make use of it. NFC
Similar to the parent, this adds some constants to tablegen to replace
the existing magic values.

Differential Revision: https://reviews.llvm.org/D70825
2019-12-02 19:57:12 +00:00
David Green a223a4d66f [ARM] Add ARMCC constants to tablegen. NFC
I got tired of looking at magic constants in tablegen files. This adds
condition codes like ARMCCeq and makes use of them.

I also removed the extra patterns for reverse condition codes from
D70296, they should now be covered by the parent commit.

Differential Revision: https://reviews.llvm.org/D70824
2019-12-02 19:57:12 +00:00
David Green 57d96ab593 [ARM] Add some VCMP folding and canonicalisation
The VCMP instructions in MVE can accept a register or ZR, but only as
the right hand operator. Most of the time this will already be correct
because the icmp will have been canonicalised that way already. There
are some cases in the lowering of float conditions that this will not
apply to though. This code should fix up those cases.

Differential Revision: https://reviews.llvm.org/D70822
2019-12-02 19:57:12 +00:00
Simon Tatham d173fb5d28 [ARM,MVE] Add intrinsics to deal with predicates.
Summary:
This commit adds the `vpselq` intrinsics which take an MVE predicate
word and select lanes from two vectors; the `vctp` intrinsics which
create a tail predicate word suitable for processing the first m
elements of a vector (e.g. in the last iteration of a loop); and
`vpnot`, which simply complements a predicate word and is just
syntactic sugar for the `~` operator.

The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've
already added (and which D70592 just reorganized). I've filled in the
missing isel rule for VCTP64, and added another set of rules to
generate the predicated forms.

I needed one small tweak in MveEmitter to allow the `unpromoted` type
modifier to apply to predicates as well as integers, so that `vpnot`
doesn't pointlessly convert its input integer to an `<n x i1>` before
complementing it.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70485
2019-12-02 16:20:30 +00:00
Simon Tatham 48cce077ef [ARM,MVE] Rename and clean up VCTP IR intrinsics.
Summary:
D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction,
to use in tail predication. But the 64-bit one doesn't work properly:
its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE
type (due to not having a full set of instructions that manipulate it
usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through
`opt` fine, as the test expects, but if you then feed it to `llc` it
causes a type legality failure at isel time.

The usual workaround we've been using in the rest of the MVE
intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the
`vctp64` IR intrinsic to do that, and completely removed the code (and
test) that uses that intrinsic for 64-bit tail predication. That will
allow me to add isel rules (upcoming in D70485) that actually generate
the VCTP64 instruction.

Also renamed all four of these IR intrinsics so that they have `mve`
in the name, since its absence was confusing.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70592
2019-12-02 16:20:30 +00:00
Victor Campos dcf11c5e86 [ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A
Summary:
Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.

The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70862
2019-12-02 14:38:39 +00:00
Mark Murray 510792a2e0 [ARM][MVE][Intrinsics] Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics.
Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests.

Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70829
2019-12-02 11:18:53 +00:00
David Green e9e1daf2b9 [ARM] Remove VHADD patterns
These instructions do not work quite like I expected them to. They
perform the addition and then shift in a higher precision integer, so do
not match up with the patterns that we added.

For example with s8s, adding 100 and 100 should wrap leaving the shift
to work on a negative number. VHADD will instead do the arithmetic in
higher precision, giving 100 overall. The vhadd gives a "better" result,
but not one that matches up with the input.

I am just removing the patterns here. We might be able to re-add them in
the future by checking for wrap flags or changing bitwidths. But for the
moment just remove them to remove the problem cases.
2019-12-02 10:38:14 +00:00
Carey Williams 76fd58d0fe Revert "[ARM] Allocatable Global Register Variables for ARM"
This reverts commit 2d739f98d8.
2019-11-29 17:01:05 +00:00
Victor Campos e478385e77 [ARM] Fix instruction selection for ARMISD::CMOV with f16 type
Summary:
In the cases where the CMOV (f16) SDNode is used with condition codes
LT, LE, VC or NE, it is successfully selected into a VSEL instruction.

In the remaining cases, however, instruction selection fails since VSEL
does not support other condition codes.

This patch handles such cases by using the single-precision version of
the VMOV instruction.

Reviewers: ostannard, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70667
2019-11-29 10:40:37 +00:00
Mark Murray a048bf87fb [ARM][MVE][Intrinsics] Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests.
Summary: Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70547
2019-11-27 16:52:05 +00:00
Mark Murray e8a8dbe9c4 [ARM][MVE][Intrinsics] Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests.
Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70546
2019-11-27 16:52:05 +00:00
Mark Murray f4bba07b87 [ARM][MVE][Intrinsics] Add MVE VABD intrinsics. Add unit tests.
Summary: Add MVE VABD intrinsics. Add unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70545
2019-11-27 16:52:04 +00:00
David Green 9f15fcc271 [ARM] Replace arm_neon_vqadds with sadd_sat
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.

Differential Revision: https://reviews.llvm.org/D69350
2019-11-27 13:32:29 +00:00
David Green 4965779f17 [ARM] Clean up the load and store code. NFC
Some of these patterns have grown quite organically. I've tried to
organise them a little here, moving all the PatFlags together and giving
them a more consistent naming scheme, to allow some of the later
patterns to be merged into a single multiclass.

Differential Revision: https://reviews.llvm.org/D70178
2019-11-26 16:21:01 +00:00
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Sam Parker 28166816b0 [ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of
a given MachineInstr and also to query whether two MachineInstrs use
the same def of a register.

For Arm, while tail-predicating, these helpers are used in the
low-overhead loops to remove the dead code that calculates the number
of loop iterations.

Differential Revision: https://reviews.llvm.org/D70240
2019-11-26 10:27:46 +00:00
Sam Parker cced971fd3 [ARM][ReachingDefs] RDA in LoLoops
Add several new methods to ReachingDefAnalysis:
- getReachingMIDef, instead of returning an integer, return the
  MachineInstr that produces the def.
- getInstFromId, return a MachineInstr for which the given integer
  corresponds to.
- hasSameReachingDef, return whether two MachineInstr use the same
  def of a register.
- isRegUsedAfter, return whether a register is used after a given
  MachineInstr.

These methods have been used in ARMLowOverhead to replace searching
for uses/defs.

Differential Revision: https://reviews.llvm.org/D70009
2019-11-26 10:13:46 +00:00
Sam Parker 4a59eedd2d [ARM][ConstantIslands] Correct block size update
When inserting a non-decrementing LE, the basic block was being
resized to take into consideration that a tCMP and tBcc had been
combined into one T1 instruction. This is not true in the LE case
where we generate a CBN?Z and an LE.

Differential Revision: https://reviews.llvm.org/D70536
2019-11-26 09:55:58 +00:00
Momchil Velikov 09555ce071 [ARM] Generate CMSE instructions from CMSE intrinsics
This patch adds instruction selection patterns for the TT, TTT, TTA, and TTAT
instructions and tests for llvm.arm.cmse.tt, llvm.arm.cmse.ttt,
llvm.arm.cmse.tta, and llvm.arm.cmse.ttat intrinsics (added in a previous
patch).

Patch by Javed Absar.

Differential Revision: https://reviews.llvm.org/D70407
2019-11-25 18:26:12 +00:00
Anna Welker 6fc3e6f2eb [ARM][MVE] Select vqneg
Adds a pattern to ARMInstrMVE.td to use a VQNEG
  instruction if an equivalent multi-instruction
  construct is found.

Differential Revision: https://reviews.llvm.org/D70491
2019-11-25 11:29:14 +00:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Anna Welker 96e94e37e3 [ARM][MVE] Select vqabs
Adds a pattern to ARMInstrMVE.td to use a VQABS
  instruction if an equivalent multi-instruction
  construct is found.

  Differential revision: https://reviews.llvm.org/D70181
2019-11-20 13:58:38 +00:00
David Green 882f23caea [ARM] MVE interleaving load and stores.
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
2019-11-19 18:37:30 +00:00
Simon Tatham 254b4f2500 [ARM,MVE] Add intrinsics for scalar shifts.
This fills in the small family of MVE intrinsics that have nothing to
do with vectors: they implement bit-shift operations on 32- or 64-bit
values held in one or two general-purpose registers. Most of these
shift operations saturate if shifting left, and round to nearest if
shifting right, although LSLL and ASRL behave like ordinary shifts.

When these instructions take a variable shift count in a register,
they pay attention to its sign, so that (for example) LSLL or UQRSHLL
will shift left if given a positive number but right if given a
negative one. That makes even LSLL and ASRL different enough from
standard LLVM IR shift semantics that I couldn't see any better
alternative than to simply model the whole family as a set of
MVE-specific IR intrinsics.

(The //immediate// forms of LSLL and ASRL, on the other hand, do
behave exactly like a standard IR shift of a 64-bit value. In fact,
those forms don't have ACLE intrinsics defined at all, because you can
just write an ordinary C shift operation if you want one of those.)

The 64-bit shifts have to be instruction-selected in C++, because they
deliver two output values. But the 32-bit ones are simple enough that
I could write a DAG isel pattern directly into each Instruction
record.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70319
2019-11-19 14:47:29 +00:00
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Sam Parker d43913ae38 [ARM][MVE] Enable narrow vectors for tail pred
Remove the restriction, from the mve tail predication pass, that the
all masked vectors instructions need to be 128-bits. This allows us
to supported extending loads and truncating stores.

Differential Revision: https://reviews.llvm.org/D69946
2019-11-19 08:51:12 +00:00
Sam Parker 8978c12b39 [ARM][MVE] Tail predication conversion
This patch modifies ARMLowOverheadLoops to convert a predicated
vector low-overhead loop into a tail-predicatd one. This is currently
a very basic conversion, with the following restrictions:
- Operates only on single block loops.
- The loop can only contain a single vctp instruction.
- No other instructions can write to the vpr.
- We only allow a subset of the mve instructions in the loop.

TODO: Pass the number of elements, not the number of iterations to
dlstp/wlstp.

Differential Revision: https://reviews.llvm.org/D69945
2019-11-19 08:22:18 +00:00
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Anna Welker 2d739f98d8 [ARM] Allocatable Global Register Variables for ARM
Provides support for using r6-r11 as globally scoped
      register variables. This requires a -ffixed-rN flag
      in order to reserve rN against general allocation.

      If for a given GRV declaration the corresponding flag
      is not found, or the the register in question is the
      target's FP, we fail with a diagnostic.

      Differential Revision: https://reviews.llvm.org/D68862
2019-11-18 10:07:37 +00:00
James Y Knight bf142fc433 MCObjectStreamer: assign MCSymbols in the dummy fragment to offset 0.
In MCObjectStreamer, when there is no current fragment, initially
symbols are created in a "pending" state and assigned to a dummy
empty fragment.

Previously, they were not being assigned an offset, and thus
evaluateAbsolute would fail if trying to evaluate an expression 'a -
b', where both 'a' and 'b' were in this pending state.

Also slightly refactored the EmitLabel overload which takes an
MCFragment for clarity.

Fixes: https://llvm.org/PR41825

Differential Revision: https://reviews.llvm.org/D70062
2019-11-16 09:52:07 -05:00
Simon Tatham b0c1900820 [ARM,MVE] Add reversed isel patterns for MVE `vcmp qN,rN`
Summary:
As well as vector/vector compare instructions, MVE also has a family
of comparisons taking a vector and a scalar, which compare every lane
of the vector against the same value. We generate those at isel time
using isel patterns that match `(ARMvcmp vector, (ARMvdup scalar))`.

This commit adds corresponding patterns for the operand-reversed form
`(ARMvcmp (ARMvdup scalar), vector)`, with condition codes swapped as
necessary. That way, we can still generate the vector/scalar compare
instruction if the IR happens to have been rearranged to put the
operands the other way round, which can happen in some optimization
phases. Previously, a vcmp the other way round was handled by emitting
a `vdup` instruction to //explicitly// replicate the scalar input into
a vector, and then doing a vector/vector comparison.

I haven't added a new test, because it turned out that several
existing tests were already exhibiting that failure mode. So just
updating the expected output in the existing MVE codegen tests
demonstrates what's been improved.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70296
2019-11-15 14:06:00 +00:00
Sjoerd Meijer 71327707b0 [ARM][MVE] tail-predication
This is a follow up of d90804d, to also flag fmcp instructions as instructions
that we do not support in tail-predicated vector loops.

Differential Revision: https://reviews.llvm.org/D70295
2019-11-15 11:01:13 +00:00
Tim Northover 232cdb3d30 ARM: allow rewriting frame indexes for all prefetch variants.
For some reason we could handle PLD but not PLDW or PLI, but all of them can
potentially refer to the stack region (if weirdly for PLI).
2019-11-14 14:26:28 +00:00
Simon Pilgrim b5f94adbf3 Fix uninitialized variable warnings. NFCI. 2019-11-14 14:21:16 +00:00
Anna Welker e78083929d [NFC] Fix typo in ARMBaseRegisterInfo 2019-11-14 09:56:18 +00:00
Quentin Colombet de94cda81b [LiveInterval] Allow updating subranges with slightly out-dated IR
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.

This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.

E.g., let say we want to merge:
    V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>

We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
    V2: sub0 sub1 sub2 sub3
    V1: <offset>  sub0 sub1

This offset will look like a composed subregidx in the the class:
     V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
 =>  V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>

Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.

Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.

For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)

This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
   subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.

looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.

None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
2019-11-13 11:17:56 -08:00
Matthew Malcomson e5f3760e8c Fix comment spelling {addresing -> addressing} (NFC) 2019-11-13 16:14:32 +00:00
Sjoerd Meijer d90804d26b [ARM][MVE] canTailPredicateLoop
This implements TTI hook 'preferPredicateOverEpilogue' for MVE.  This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.

This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.

I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.

Differential Revision: https://reviews.llvm.org/D69845
2019-11-13 13:24:33 +00:00
Simon Tatham 5b9e4daef0 [ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.
MVE includes instructions that extract an 8- or 16-bit lane from a
vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td`
already included isel patterns to select those instructions in
response to the `ARMISD::VGETLANEs` selection-DAG node type. But
`ARMISD::VGETLANEs` was never actually generated, because the code
that creates it was conditioned on NEON only.

It's an easy fix to enable the same code for integer MVE, and now IR
that sign-extends the result of an extractelement (whether explicitly
or as part of the function call ABI) will use `vmov.s8` instead of
`vmov.u8` followed by `sxtb`.

Reviewers: SjoerdMeijer, dmgreen, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70132
2019-11-13 09:08:41 +00:00
Peter Collingbourne 1549b4699a ARM: Don't emit R_ARM_NONE relocations to compact unwinding decoders in .ARM.exidx on Android.
These relocations are specified by the ARM EHABI (section 6.3). As I understand
it, their purpose is to accommodate unwinder implementations that wish to
reduce code size by placing the implementations of the compact unwinding
decoders in a separate translation unit, and using extern weak symbols to
refer to them from the main unwinder implementation, so that they are only
linked when something in the binary needs them in order to unwind.

However, neither of the unwinders used on Android (libgcc, LLVM libunwind)
use this technique, and in fact emitting these relocations ends up being
counterproductive to code size because they cause a copy of the unwinder
to be statically linked into most binaries, regardless of whether it is
actually needed. Furthermore, these relocations create circular dependencies
(between libc and the unwinder) in cases where the unwinder is dynamically
linked and libc contains compact unwind info.

Therefore, deviate from the EHABI here and stop emitting these relocations
on Android.

Differential Revision: https://reviews.llvm.org/D70027
2019-11-12 10:52:59 -08:00
Matt Arsenault e6c9a9af39 Use MCRegister in copyPhysReg 2019-11-11 14:42:33 +05:30
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
Djordje Todorovic 8d2ccd1ac3 Reland: [TII] Use optional destination and source pair as a return value; NFC
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D69622
2019-11-08 13:00:39 +01:00
Sjoerd Meijer 6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Simon Tatham 6c3fee47a6 [ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.

The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.

At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.

I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.

On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.

Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.

On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').

Reviewers: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69791
2019-11-06 09:01:42 +00:00
Daniel Sanders e74c5b9661 [globalisel] Rename G_GEP to G_PTR_ADD
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD

Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69734
2019-11-05 10:31:17 -08:00
David Green cf581d7977 [ARM] Always enable UseAA in the arm backend
This feature controls whether AA is used into the backend, and was
previously turned on for certain subtargets to help create less
constrained scheduling graphs. This patch turns it on for all
subtargets, so that they can all make use of the extra information to
produce better code.

Differential Revision: https://reviews.llvm.org/D69796
2019-11-05 10:46:56 +00:00
David Green 7d9af03ff7 [Scheduling][ARM] Consistently enable PostRA Machine scheduling
In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.

This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.

To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.

Thanks to David Penry for the identifying this problem.

Differential Revision: https://reviews.llvm.org/D69775
2019-11-05 10:44:55 +00:00
Oliver Stannard 73c3137a82 Fix static analysis warnings in ARM calling convention lowering
Fixes https://bugs.llvm.org/show_bug.cgi?id=43891
2019-11-04 17:17:55 +00:00
David Green 91b0cad813 [ARM] Use isFMAFasterThanFMulAndFAdd for MVE
The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd,
where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually
available for scalar code. For MVE we don't have the non-fused version
though. It makes more sense for isFMAFasterThanFMulAndFAdd to return
true, allowing us to simplify some of the existing ISel patterns.

The tests here are that non of the existing tests failed, and so we are
still selecting VFMA and VFMS. The one test that changed shows we can
now select from fast math flags, as opposed to just relying on the
isFMADLegalForFAddFSub option.

Differential Revision: https://reviews.llvm.org/D69115
2019-11-04 15:05:41 +00:00
David Green 6bae5d16a2 [ARM] Add vrev32 NEON fp16 patterns
Fill in the gaps for vrev32.16 f16 patterns, extending the existing i16
patterns.

Differential Revision: https://reviews.llvm.org/D69508
2019-11-04 13:37:01 +00:00
Diogo Sampaio 3169f0129a [FIX] Removed duplicated v4f16 and v8f16 declarations
Reviewers: RKSimon, ostannard

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69795
2019-11-04 11:33:21 +00:00
Simon Pilgrim d397e29273 A15SDOptimizer::getPrefSPRLane - fix null dereference warning. NFCI 2019-11-02 21:49:12 +00:00
Simon Pilgrim 3842b94c4e Revert rG57ee0435bd47f23f3939f402914c231b4f65ca5e - [TII] Use optional destination and source pair as a return value; NFC
This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375
2019-10-31 18:00:29 +00:00
Djordje Todorovic 57ee0435bd [TII] Use optional destination and source pair as a return value; NFC
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D69622
2019-10-31 15:34:49 +01:00
Evandro Menezes 215da6606c [clang][llvm] Obsolete Exynos M1 and M2 2019-10-30 15:02:59 -05:00
Djordje Todorovic 532815dd5c [ARM][AArch64][DebugInfo] Improve call site instruction interpretation
Extend the describeLoadedValue() with support for target specific ARM and
AArch64 instructions interpretation. The patch provides specialization for
ADD and SUB operations that include a register and an immediate/offset
operand. Some of the instructions can operate with global string addresses
or constant pool indexes but such cases are omitted since we currently lack
flexible support for processing such operands at DWARF production stage.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D67556
2019-10-30 13:58:14 +01:00
Sander de Smalen d6a7da80aa Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)
llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with
EXPENSIVE_CHECKS enabled, causing the patch to be reverted in
rG2c496bb5309c972d59b11f05aee4782ddc087e71.

This patch relands the patch with a proper fix to the
live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the
callee-saves correctly so that the CalleeSaved info is taken from MIR
directly, rather than letting it be recalculated by the PEI pass. I've
done this by running `llc -stop-before=prologepilog` on the LLVM
IR as captured in the test files, adding the extra MOV instructions
that were manually added in the original test file, then running `llc
-run-pass=prologepilog` and finally re-added the comments for the MOV
instructions.
2019-10-29 16:13:07 +00:00
Simon Pilgrim 2c496bb530 Revert rG70f5aecedef9a6e347e425eb5b843bf797b95319 - "Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)"
This fails on EXPENSIVE_CHECKS builds
2019-10-29 11:54:58 +00:00
David Tellenbach e3a45a24d1 [ARM][Thumb2InstrInfo] Fix default `0` opcode when rewriting frame indices
The static functions `positiveOffsetOpcode`, `negativeOffsetOpcode` and
`immediateOffsetOpcode` (lib/Target/ARM/Thumb2InstrInfo.cpp) currently can
return `0` as default opcode which is meaningless in this situation.

This patch replaces this default value by llvm_unreachable.

Reviewers: t.p.northover, tellenbach

Reviewed By: tellenbach

Subscribers: tellenbach, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69432

Patch By: Lorenzo Casalino <lorenzo.casalino93@gmail.com>
2019-10-28 18:58:45 +00:00
Sander de Smalen 70f5aecede Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)
Fixed up test/DebugInfo/MIR/Mips/live-debug-values-reg-copy.mir that
broke r375425.
2019-10-28 18:05:19 +00:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
vhscampos f6e11a36c4 [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE
Summary:
Writing support for three ACLE functions:
  unsigned int __cls(uint32_t x)
  unsigned int __clsl(unsigned long x)
  unsigned int __clsll(uint64_t x)

CLS stands for "Count number of leading sign bits".

In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69250
2019-10-28 11:06:58 +00:00
Jian Cai a6b0219fc4 Revert "[ARM] Uses "Sun Style" syntax for section switching"
This reverts commit 03de2f84fc.
2019-10-25 14:03:07 -07:00
Jian Cai 03de2f84fc [ARM] Uses "Sun Style" syntax for section switching
Summary:
Support "Sun Style" syntax for section switching ("#alloc,#write" etc).
https://bugs.llvm.org/show_bug.cgi?id=43759

Reviewers: peter.smith, eli.friedman, kristof.beyls, t.p.northover

Reviewed By: peter.smith

Subscribers: MaskRay, llozano, manojgupta, nickdesaulniers, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69296
2019-10-25 13:27:35 -07:00
Guillaume Chatelet a4783ef58d [Alignment][NFC] getMemoryOpCost uses MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69307
2019-10-25 21:26:59 +02:00
Simon Tatham e0ef4ebe2f [ARM] Add IR intrinsics for MVE VLD[24] and VST[24].
The VST2 and VST4 instructions take two or four vector registers as
input, and store part of each register to memory in an interleaved
pattern. They come in variants indicating which part of each register
they store (VST20 and VST21; VST40 to VST43 inclusive); the intention
is that issuing each of those variants in turn has the combined effect
of loading or storing the whole set of registers to a memory block of
equal size. The corresponding VLD2 and VLD4 instructions load from
memory in the same interleaved format: each one overwrites only part
of its output register set, and again, the idea is that if you use
VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the
whole of each register.

I've implemented the stores and loads quite differently. The loads
were easiest to implement as a single intrinsic that expands to all
four VLD4x instructions or both VLD2x, delivering four complete output
registers. (Implementing each individual load as a separate
instruction taking four input registers to partially overwrite is
possible in theory, but pointless, and when I tried it, I found it
would need extra work to get the register allocation not to be
horrible.) Since that intrinsic delivers multiple outputs, it has to
be instruction-selected in custom C++.

But the store instructions are easier to model individually, because
they don't overwrite any register at all and you can write a DAG Isel
pattern in Tablegen for each one.

Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load
instructions, delivers four full output vectors, and is handled by C++
code, whereas `int_arm_mve_vst4q` expands to just one store
instruction, takes four input vectors and a constant indicating which
lanes to store, and is handled entirely in Tablegen. (And similarly
for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do
each one.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68700
2019-10-24 16:33:13 +01:00
Simon Tatham ceeff95ca4 [ARM] Add some sample IR MVE intrinsics with C++ isel.
This adds some initial example IR intrinsics for MVE instructions that
deliver multiple output values, and hence, have to be instruction-
selected by custom C++ code instead of Tablegen patterns.

I've added the writeback gather load instructions (taking a vector of
base addresses and a single common offset, returning a vector of
loaded values and an updated vector of base addresses); one example
from the long shift family (taking and returning a 64-bit value in two
GPRs); and the VADC instruction (which propagates a carry bit from
each vector-lane addition to the next, taking an input carry flag in
FPSCR and outputting the final one in FPSCR as well).

To support the VPT-predicated forms of these instructions, I've
written some helper functions to add the cluster of MVE predicate
operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used
when the instruction actually is predicated (so it takes a predicate
mask argument), and `AddEmptyMVEPredicateToOps` is for when the
instruction is unpredicated (so it fills in $noreg for the mask). Each
one comes in a form suitable for `vpred_n`, and one for `vpred_r`
which takes the extra 'inactive' parameter.

For VADC, the representation of the carry flag in the IR intrinsic is
a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e.
with the carry flag in bit 29 of the word. (The user-facing ACLE
intrinsic will want it to be in bit 0, but I'll do that on the clang
side.)

Reviewers: dmgreen, miyuki, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68699
2019-10-24 16:33:13 +01:00
Simon Tatham 1b45297e01 [ARM] Begin adding IR intrinsics for MVE instructions.
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67158
2019-10-24 16:33:13 +01:00
Mirko Brkusanin 4b63ca1379 [Mips] Use appropriate private label prefix based on Mips ABI
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.

Tags: #llvm, #clang, #lldb

Differential Revision: https://reviews.llvm.org/D66795
2019-10-23 12:24:35 +02:00
Sander de Smalen 8f2dac471a Reverted r375425 as it broke some buildbots.
llvm-svn: 375444
2019-10-21 19:11:40 +00:00
Sander de Smalen 814548ec8e [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)
Commit message from D66935:

This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.

In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.

A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.

Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.

This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*`

Changes after D66935:

Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return
the uninitialized CalleeSavedStackSize when running `llc` on a specific
pass where the MIR code has already been expected to have gone through PEI.

Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try
to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug
mode, the compiler will assert the recalculated size equals the cached
size as calculated through a call to determineCalleeSaves.

This fixes two tests:
  test/DebugInfo/AArch64/asan-stack-vars.mir
  test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir
that otherwise fail when compiled using msan.

Reviewed By: omjavaid, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68783

llvm-svn: 375425
2019-10-21 17:12:56 +00:00
David Green 0765a4c288 [ARM] Extra qdadd patterns
This adds some new qdadd patterns to go along with the other recently added
qadd's.

Differential Revision: https://reviews.llvm.org/D68999

llvm-svn: 375414
2019-10-21 14:06:49 +00:00
David Green d7b77f2203 [ARM] Add qadd lowering from a sadd_sat
This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the
same time.

The qadd instruction sets the q flag, but we already have many cases where we
do not model this in llvm.

Differential Revision: https://reviews.llvm.org/D68976

llvm-svn: 375411
2019-10-21 12:33:46 +00:00
Guillaume Chatelet bac5f6bd21 [Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69243

llvm-svn: 375407
2019-10-21 11:01:55 +00:00
David Green fba831e791 [ARM] Lower sadd_sat to qadd8 and qadd16
Lower the target independent signed saturating intrinsics to qadd8 and qadd16.
This custom lowers them from a sadd_sat, catching the node early before it is
promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a
qadd8/qadd16, so that we can call demand bits on it to show that it does not
use the upper bits.

Also handles QSUB8 and QSUB16.

Differential Revision: https://reviews.llvm.org/D68974

llvm-svn: 375402
2019-10-21 09:53:38 +00:00
Guillaume Chatelet 3cc4835c00 Use Align for TFL::TransientStackAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69216

llvm-svn: 375398
2019-10-21 08:31:25 +00:00
Reid Kleckner 904cd3e06b Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each
Now X86ISelLowering doesn't depend on many IR analyses.

llvm-svn: 375320
2019-10-19 01:31:09 +00:00
Reid Kleckner 1d7b41361f Prune two MachineInstr.h includes, fix up deps
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.

Noticed with -ftime-trace.

llvm-svn: 375311
2019-10-19 00:22:07 +00:00
Quentin Colombet 9f9151d494 [GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual method
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.

NFC

Differential Revision: https://reviews.llvm.org/D69187

llvm-svn: 375277
2019-10-18 20:13:42 +00:00
Sam Parker 8e6a638c74 [ARM][MVE] Enable truncating masked stores
Allow us to generate truncating masked store which take v4i32 and
v8i16 vectors and can store to v4i8, v4i16 and v8i8 and memory.
Removed support for unaligned masked stores.

Differential Revision: https://reviews.llvm.org/D68461

llvm-svn: 375108
2019-10-17 12:11:18 +00:00
Sam Parker 3ff961cabd [ARM][MVE] Change VPST to use, not def, VPR
Unlike VPT, VPST just uses the current value of VPR.P0.

Differential Revision: https://reviews.llvm.org/D69037

llvm-svn: 375087
2019-10-17 08:46:31 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
Guillaume Chatelet 882c43d703 [Alignment][NFC] Use Align for TargetFrameLowering/Subtarget
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68993

llvm-svn: 375084
2019-10-17 07:49:39 +00:00
Mikhail Maltsev 95b5d459a0 [ARM] Add a register class for GPR pairs without SP and use it. NFCI
Summary:
Currently Thumb2InstrInfo.cpp uses a register class which is
auto-generated by tablegen. Such approach is fragile because
auto-generated classes might change when other register classes are
added. For example, before https://reviews.llvm.org/D62667
we were using GPRPair_with_gsub_1_in_rGPRRegClass, but had to
change it to GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass
because the former class stopped being generated (this did not change
the functionality though).

This patch adds a register class consisting of even-odd GPR register
pairs from (R0, R1) to (R10, R11), which excludes (R12, SP) and uses
it in Thumb2InstrInfo.cpp instead of
GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass.

Reviewers: ostannard, simon_tatham, dmgreen, efriedma

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69026

llvm-svn: 374990
2019-10-16 10:40:57 +00:00
Sam Parker 1c3ca61294 [ARM][ParallelDSP] Change smlad insertion order
Instead of inserting everything after the 'root' of the reduction,
insert all instructions as close to their operands as possible. This
can help reduce register pressure.

Differential Revision: https://reviews.llvm.org/D67392

llvm-svn: 374981
2019-10-16 09:37:03 +00:00
Sam Parker ce39278f25 [ARM][MVE] validForTailPredication insts
Reverse the logic for valid tail predication instructions and create
a whitelist instead. Added other instruction groups that aren't
obviously safe:
- instructions that 'narrow' their result.
- lane moves.
- byte swapping instructions.
- interleaving loads and stores.
- cross-beat carries.
- top/bottom instructions.
- complex operations.

Hopefully we should be able to add more of these instructions to the
whitelist, once we have a more concrete idea of the transform.

Differential Revision: https://reviews.llvm.org/D67904

llvm-svn: 374887
2019-10-15 13:12:51 +00:00
Jian Cai e9089c223c [ARM][AsmParser] handles offset expression in parentheses
Summary:
Integrated assembler does not accept offset expressions surrounded by
parenthesis. Handle this case for GAS compability.
https://bugs.llvm.org/show_bug.cgi?id=43631

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68764

llvm-svn: 374832
2019-10-14 22:22:26 +00:00
David Green 543236232c [ARM] Selection for MVE VMOVN
The adds both VMOVNt and VMOVNb instruction selection from the appropriate
shuffles. We detect shuffle masks of the form:
0, N, 2, N+2, 4, N+4, ...
or
0, N+1, 2, N+3, 4, N+5, ...
ISel will also try the opposite patterns, with inputs reversed. These are
selected to VMOVNt and VMOVNb respectively.

Differential Revision: https://reviews.llvm.org/D68283

llvm-svn: 374781
2019-10-14 15:19:33 +00:00
Sam Parker 527a35e155 [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

llvm-svn: 374763
2019-10-14 10:00:21 +00:00
Zi Xuan Wu 9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
David Green 8628bb0491 [ARM] VQSUB instruction
Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics.

Differential Revision: https://reviews.llvm.org/D68567

llvm-svn: 374377
2019-10-10 16:34:30 +00:00
David Green 39596ec2fe [ARM] VQADD instructions
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.

Differential Revision: https://reviews.llvm.org/D68566

llvm-svn: 374336
2019-10-10 13:05:04 +00:00
Oliver Stannard 4f454b2275 [IfCvt][ARM] Optimise diamond if-conversion for code size
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.

This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.

Differential revision: https://reviews.llvm.org/D67350

llvm-svn: 374301
2019-10-10 09:58:28 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Nikola Prica 98603a8153 [DebugInfo][If-Converter] Update call site info during the optimization
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.

Reviewers: aprantl, vsk, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D66955

llvm-svn: 374068
2019-10-08 15:43:12 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Tim Northover a7d90af1be ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.

llvm-svn: 373738
2019-10-04 12:29:32 +00:00
Simon Pilgrim b327dc1966 Fix uninitialized variable warning. NFCI
llvm-svn: 373582
2019-10-03 11:21:46 +00:00
Benjamin Kramer 12e915b3fc [ARM] Make helpers static. NFC.
llvm-svn: 373503
2019-10-02 18:20:24 +00:00
David Green c9b5ab8b1c [ARM] Identity shuffles are legal
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.

This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.

Differential Revision: https://reviews.llvm.org/D68241

llvm-svn: 373446
2019-10-02 11:40:51 +00:00
Matt Arsenault f24ac13aaa TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.

The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.

llvm-svn: 373292
2019-10-01 01:44:39 +00:00
Sam Parker aac03ae06a [ARM][MVE] Change VCTP operand
The VCTP instruction will calculate the predicate masked based upon
the number of elements that need to be processed. I had inserted the
sub before the vctp intrinsic and supplied it as the operand, but
this is incorrect as the phi should directly feed the vctp. The sub
is calculating the value for the next iteration.

Differential Revision: https://reviews.llvm.org/D67921

llvm-svn: 373188
2019-09-30 08:03:23 +00:00
Sam Parker b3438f1cc0 [ARM][CGP] Allow signext arguments
As we perform a zext on any arguments used in the promoted tree, it
doesn't matter if they're marked as signext. The only permitted
user(s) in the tree which would interpret the sign bits are signed
icmps. For these instructions, their promoted operands are truncated
before the icmp uses them.

Differential Revision: https://reviews.llvm.org/D68019

llvm-svn: 373186
2019-09-30 07:52:10 +00:00
David Green 120a5e9a74 [ARM] Cortex-M4 schedule additions
This is an attempt to fill in some of the missing instructions from the
Cortex-M4 schedule, and make it easier to do the same for other ARM cpus.

- Some instructions are marked as hasNoSchedulingInfo as they are pseudos or
  otherwise do not require scheduling info
- A lot of features have been marked not supported
- Some WriteRes's have been added for cvt instructions.
- Some extra instruction latencies have been added, notably by relaxing the
  regex for dsp instruction to catch more cases, and some fp instructions.

This goes a long way to get the CompleteModel working for this CPU. It does not
go far enough as to get all scheduling info for all output operands correct.

Differential Revision: https://reviews.llvm.org/D67957

llvm-svn: 373163
2019-09-29 08:38:48 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Alexandros Lamprineas c006b6f4cb [MC][ARM] vscclrm disassembles as vldmia
Happens only when the mve.fp subtarget feature is enabled:

$ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b"
  .text
  vldmia  pc, {d0, d1, d2, d3}
$ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b"
  .text
  vscclrm {d0, d1, d2, d3, vpr}

Assembling returns the correct encoding with or without mve.fp:

$ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}"
  .text
  vscclrm {d0, d1, d2, d3, vpr}   @ encoding: [0x9f,0xec,0x08,0x0b]
$ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}"
  .text
  vscclrm {d0, d1, d2, d3, vpr}   @ encoding: [0x9f,0xec,0x08,0x0b]

The problem seems to be in the TableGen description of VSCCLRMD.
The least significant bit should be set to zero.

Differential Revision: https://reviews.llvm.org/D68025

llvm-svn: 373052
2019-09-27 08:22:24 +00:00
Simon Pilgrim 2cf54d7b71 ARMBaseInstrInfo getOperandLatency - silence static analyzer dyn_cast<> null dereference warnings. NFCI.
The static analyzer is warning about potential null dereferences, but we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 372992
2019-09-26 16:05:55 +00:00
David Green 10d10102a4 [ARM] Ensure we do not attempt to create lsll #0
During legalisation we can end up with some pretty strange nodes, like shifts
of 0. We need to make sure we don't try to make long shifts of these, ending up
with invalid assembly instructions. A long shift with a zero immediate actually
encodes a shift by 32.

Differential Revision: https://reviews.llvm.org/D67664

llvm-svn: 372839
2019-09-25 10:16:48 +00:00