Commit Graph

1931 Commits

Author SHA1 Message Date
Craig Topper 735d27dc40 [SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.

This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.

Differential Revision: https://reviews.llvm.org/D75132
2020-02-25 16:58:23 -08:00
Hans Wennborg decd021fac Don't generate libcalls for wide shift on Windows ARM (PR42711)
The previous patch (cff90f07cb) didn't
cover ARM.
2020-02-25 11:54:07 +01:00
Sam Parker 03756a4197 [ARM][MVE] Combine more extending masked loads
For MVE, don't look at the users of the extending loads so that more
as desirable for folding.

Differential Revision: https://reviews.llvm.org/D74958
2020-02-24 07:50:15 +00:00
David Green 83012cb217 [ARM] Correct Formatting. NFC
Also removed an unnecessary TODO that I don't believe is relevant for
the instruction in question.
2020-02-21 16:08:56 +00:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
David Green 33aa5dfe9c [ARM] VMLAVA reduction patterns
Similar to VADDV and VADDLV that have been added recently, this adds
lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They
perform the same roles as the add's, just folding a mul into the same
instruction (and so taking two inputs). As such, they need to be lowered
in the same way as the types are often not legal.

Differential Revision: https://reviews.llvm.org/D74390
2020-02-19 12:39:58 +00:00
David Green fceb3e3b4a [ARM] MVE VADDLV lowering
Following on from the extra VADDV lowering, this extends things to
handle VADDLV which allows summing values into a pair of i32 registers,
together treated as a i64. This needs to be done in DAGCombine too as
the types are otherwise illegal, which is a fairly simple addition on
top of the existing code.

There is also a VADDLVA instruction handled here, that adds the incoming
values from the two general purpose registers. As opposed to the
non-long version where we could just add patterns for add(x, VADDV), the
long version needs to handle this early before the i64 has being split
into too many pieces.

Differential Revision: https://reviews.llvm.org/D74224
2020-02-19 11:07:20 +00:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
David Green 51c6e9445c [ARM] Extra MVE VADDV reduction patterns
We already make use of the VADDV vector reduction instruction for cases
where the input and the output start out at the same type. The MVE
instruction however will sum into an i32, so if we are summing a v16i8
into an i32, we can still use the same instructions. In terms of IR,
this looks like a sext of a legal type (v16i8) into a very illegal type
(v16i32) and a vecreduce.add of that into the result. This means we have
to catch the pattern early in a DAG combine, producing a target VADDVs/u
node, where the signedness is now important.

This is the first part, handling VADDV and VADDVA. There are also
VADDVL/VADDVLA instructions, which are interesting because they sum into
a 64bit value. And VMLAV and VMLALV, which are interesting because they
also do a multiply of two values. It may look a little odd in places as
a result.

On it's own this will probably not do very much, as the vectorizer will
not produce this IR yet.

Differential Revision: https://reviews.llvm.org/D74218
2020-02-19 09:45:35 +00:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
John Brawn 594a89f727 [FPEnv][ARM] Don't call mutateStrictFPToFP when lowering
mutateStrictFPToFP can delete the node and replace it with another with the same
value which can later cause problems, and returning the result of
mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the
returned value has the same number of results as the original. Instead handle
things by doing the mutation manually.

Differential Revision: https://reviews.llvm.org/D74726
2020-02-17 18:19:25 +00:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
John Brawn 0ec5797296 [ARM] Fix infinite loop when lowering STRICT_FP_EXTEND
If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND
and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the
current implementation will cause in infinite loop for STRICT_FP_EXTEND due to
emitting a merge_values of the original node which after replacement becomes a
merge_values of itself.

Fix this by not doing anything for f32 to f64 extend when we have FP64, though
for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that
doesn't happen automatically for opcodes with custom lowering.

Differential Revision: https://reviews.llvm.org/D74559
2020-02-13 16:12:50 +00:00
David Green 9d4c597541 [ARM] Fix ReconstructShuffle for bigendian
Simon pointed out that this function is doing a bitcast, which can be
incorrect for big endian. That makes the lowering of VMOVN in MVE
wrong, but the function is shared between Neon and MVE so both can
be incorrect.

This attempts to fix things by using the newly added VECTOR_REG_CAST
instead of the BITCAST. As it may now be used on Neon, I've added the
relevant patterns for it there too. I've also added a quick dag combine
for it to remove them where possible.

Differential Revision: https://reviews.llvm.org/D74485
2020-02-13 09:56:46 +00:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Victor Campos af2a384581 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 60e0120c91.
2020-02-08 13:18:45 +00:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
John Brawn b37d59353f [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.

The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.

Differential Revision: https://reviews.llvm.org/D73194
2020-02-03 12:59:12 +00:00
Simon Tatham 961530fdc9 [ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.

So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?

The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.

To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.

In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.

For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73786
2020-02-03 11:20:06 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Simon Tatham 4321c6af28 [ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)

The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.

The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.

This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.

Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 11:53:52 +00:00
David Green ff2e67a4f7 [ARM] MVE VLDn postinc
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.

It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.

Differential Revision: https://reviews.llvm.org/D71194
2020-01-20 06:57:07 +00:00
Craig Topper bb2553175a [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.

This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.

Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn

Reviewed By: efriedma

Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72536
2020-01-10 19:30:08 -08:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
David Green fb8c9a339a [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors
This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing
the target independent code to handle more folds in more situations (for
example if the fast math flags are present, but the global
AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx
into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately
if needed.

Differential Revision: https://reviews.llvm.org/D72139
2020-01-05 11:24:04 +00:00
Reid Kleckner 9c2b72821b Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bd (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
QingShan Zhang 2133d3c558 [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal'
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.

Differential Revision: https://reviews.llvm.org/D70000
2020-01-03 03:26:41 +00:00
David Green b4abe7afbf [ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.

Differential Revision: https://reviews.llvm.org/D70997
2019-12-30 12:58:14 +00:00
Martin Storsjö b774aa1011 [ARM] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Instead check the shouldAssumeDSOLocal method and load the address
from a COFF stub.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71720
2019-12-23 12:13:49 +02:00
Victor Campos 2ff5a596cb Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit bbcf1c3496.
2019-12-20 18:08:24 +00:00
Victor Campos bbcf1c3496 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2019-12-19 11:23:01 +00:00
John Brawn 01ba201abc [ARM] Add custom strict fp conversion lowering when non-strict is custom
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.

This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.

Differential Revision: https://reviews.llvm.org/D71127
2019-12-13 13:00:00 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sam Parker 1274ac3dc2 [ARM][MVE] Sink vector shift operand
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00
Sam Parker f8ff3bf55b Revert "[ARM][MVE] Sink vector shift operand"
This reverts commit e0b966643f.

Instruction selection is failing with expensive checks.
2019-12-12 07:52:57 +00:00
Sam Parker e0b966643f [ARM][MVE] Sink vector shift operand
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 07:35:21 +00:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
David Green 792fab343b [ARM] Attempt to use whole register vmovs for MVE shuffles.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509
2019-12-08 10:53:54 +00:00
David Green 3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
David Green 57d96ab593 [ARM] Add some VCMP folding and canonicalisation
The VCMP instructions in MVE can accept a register or ZR, but only as
the right hand operator. Most of the time this will already be correct
because the icmp will have been canonicalised that way already. There
are some cases in the lowering of float conditions that this will not
apply to though. This code should fix up those cases.

Differential Revision: https://reviews.llvm.org/D70822
2019-12-02 19:57:12 +00:00
Carey Williams 76fd58d0fe Revert "[ARM] Allocatable Global Register Variables for ARM"
This reverts commit 2d739f98d8.
2019-11-29 17:01:05 +00:00
David Green 9f15fcc271 [ARM] Replace arm_neon_vqadds with sadd_sat
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.

Differential Revision: https://reviews.llvm.org/D69350
2019-11-27 13:32:29 +00:00
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
David Green 882f23caea [ARM] MVE interleaving load and stores.
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
2019-11-19 18:37:30 +00:00
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Anna Welker 2d739f98d8 [ARM] Allocatable Global Register Variables for ARM
Provides support for using r6-r11 as globally scoped
      register variables. This requires a -ffixed-rN flag
      in order to reserve rN against general allocation.

      If for a given GRV declaration the corresponding flag
      is not found, or the the register in question is the
      target's FP, we fail with a diagnostic.

      Differential Revision: https://reviews.llvm.org/D68862
2019-11-18 10:07:37 +00:00
Simon Tatham 5b9e4daef0 [ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.
MVE includes instructions that extract an 8- or 16-bit lane from a
vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td`
already included isel patterns to select those instructions in
response to the `ARMISD::VGETLANEs` selection-DAG node type. But
`ARMISD::VGETLANEs` was never actually generated, because the code
that creates it was conditioned on NEON only.

It's an easy fix to enable the same code for integer MVE, and now IR
that sign-extends the result of an extractelement (whether explicitly
or as part of the function call ABI) will use `vmov.s8` instead of
`vmov.u8` followed by `sxtb`.

Reviewers: SjoerdMeijer, dmgreen, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70132
2019-11-13 09:08:41 +00:00
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
David Green 91b0cad813 [ARM] Use isFMAFasterThanFMulAndFAdd for MVE
The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd,
where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually
available for scalar code. For MVE we don't have the non-fused version
though. It makes more sense for isFMAFasterThanFMulAndFAdd to return
true, allowing us to simplify some of the existing ISel patterns.

The tests here are that non of the existing tests failed, and so we are
still selecting VFMA and VFMS. The one test that changed shows we can
now select from fast math flags, as opposed to just relying on the
isFMADLegalForFAddFSub option.

Differential Revision: https://reviews.llvm.org/D69115
2019-11-04 15:05:41 +00:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
vhscampos f6e11a36c4 [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE
Summary:
Writing support for three ACLE functions:
  unsigned int __cls(uint32_t x)
  unsigned int __clsl(unsigned long x)
  unsigned int __clsll(uint64_t x)

CLS stands for "Count number of leading sign bits".

In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69250
2019-10-28 11:06:58 +00:00
Simon Tatham 1b45297e01 [ARM] Begin adding IR intrinsics for MVE instructions.
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67158
2019-10-24 16:33:13 +01:00
David Green d7b77f2203 [ARM] Add qadd lowering from a sadd_sat
This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the
same time.

The qadd instruction sets the q flag, but we already have many cases where we
do not model this in llvm.

Differential Revision: https://reviews.llvm.org/D68976

llvm-svn: 375411
2019-10-21 12:33:46 +00:00
Guillaume Chatelet bac5f6bd21 [Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69243

llvm-svn: 375407
2019-10-21 11:01:55 +00:00
David Green fba831e791 [ARM] Lower sadd_sat to qadd8 and qadd16
Lower the target independent signed saturating intrinsics to qadd8 and qadd16.
This custom lowers them from a sadd_sat, catching the node early before it is
promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a
qadd8/qadd16, so that we can call demand bits on it to show that it does not
use the upper bits.

Also handles QSUB8 and QSUB16.

Differential Revision: https://reviews.llvm.org/D68974

llvm-svn: 375402
2019-10-21 09:53:38 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
David Green 543236232c [ARM] Selection for MVE VMOVN
The adds both VMOVNt and VMOVNb instruction selection from the appropriate
shuffles. We detect shuffle masks of the form:
0, N, 2, N+2, 4, N+4, ...
or
0, N+1, 2, N+3, 4, N+5, ...
ISel will also try the opposite patterns, with inputs reversed. These are
selected to VMOVNt and VMOVNb respectively.

Differential Revision: https://reviews.llvm.org/D68283

llvm-svn: 374781
2019-10-14 15:19:33 +00:00
David Green 8628bb0491 [ARM] VQSUB instruction
Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics.

Differential Revision: https://reviews.llvm.org/D68567

llvm-svn: 374377
2019-10-10 16:34:30 +00:00
David Green 39596ec2fe [ARM] VQADD instructions
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.

Differential Revision: https://reviews.llvm.org/D68566

llvm-svn: 374336
2019-10-10 13:05:04 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
Benjamin Kramer 12e915b3fc [ARM] Make helpers static. NFC.
llvm-svn: 373503
2019-10-02 18:20:24 +00:00
David Green c9b5ab8b1c [ARM] Identity shuffles are legal
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.

This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.

Differential Revision: https://reviews.llvm.org/D68241

llvm-svn: 373446
2019-10-02 11:40:51 +00:00
Matt Arsenault f24ac13aaa TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.

The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.

llvm-svn: 373292
2019-10-01 01:44:39 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
David Green 10d10102a4 [ARM] Ensure we do not attempt to create lsll #0
During legalisation we can end up with some pretty strange nodes, like shifts
of 0. We need to make sure we don't try to make long shifts of these, ending up
with invalid assembly instructions. A long shift with a zero immediate actually
encodes a shift by 32.

Differential Revision: https://reviews.llvm.org/D67664

llvm-svn: 372839
2019-09-25 10:16:48 +00:00
David Green 2fb41fc70c [ARM] Split large widening MVE loads
Similar to rL372717, we can force the splitting of extends of vector loads in
MVE, in order to use the better widening loads as opposed to going through
expensive extends. This adds a combine to early-on detect extends of loads and
split the load in two, from where normal legalisation will kick in and we get a
series of widening loads.

Differential Revision: https://reviews.llvm.org/D67909

llvm-svn: 372721
2019-09-24 10:53:09 +00:00
David Green 49d851f403 [ARM] Split large truncating MVE stores
MVE does not have a simple sign extend instruction that can move elements
across lanes. We currently often end up moving each lane into and out of a GPR,
in order to get elements into the correct places. When we have a store of a
trunc (or a extend of a load), we can instead just split the store/load in two,
using the narrowing/widening load/store instructions from each half of the
vector.

This does that for stores. It happens very early in a store combine, so as to
easily detect the truncates. (It would be possible to do this later, but that
would involve looking through a buildvector of extract elements. Not impossible
but this way seemed simpler).

By enabling store combines we also get a vmovdrr combine for free, helping some
other tests.

Differential Revision: https://reviews.llvm.org/D67828

llvm-svn: 372717
2019-09-24 10:10:41 +00:00
Guillaume Chatelet c281b40814 [Alignment] Get DataLayout::StackAlignment as Align
Summary:
Internally it is needed to know if StackAlignment is set but we can
expose it as llvm::Align.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67852

llvm-svn: 372585
2019-09-23 12:01:32 +00:00
Oliver Cruickshank c84722ff27 [ARM] Fix CTTZ not generating correct instructions MVE
CTTZ intrinsic should have been set to Custom, not Expand

llvm-svn: 372401
2019-09-20 15:03:44 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
David Green 0cfb78e52a [ARM] MVE i1 splat
We needn't BFI each lane individually into a predicate register when each lane
in the same. A simple sign extend and a vmsr will do.

Differential Revision: https://reviews.llvm.org/D67653

llvm-svn: 372313
2019-09-19 12:17:41 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
David Green 8d21460dc5 [ARM] A predicate cast of a predicate cast is a predicate cast
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.

Differential Revision: https://reviews.llvm.org/D67591

llvm-svn: 372012
2019-09-16 17:29:07 +00:00
Oliver Cruickshank ee6fbebbaf [ARM] Add patterns for BSWAP intrinsic on MVE
BSWAP can use the VREV instruction on MVE to produce better results than
expanding.

llvm-svn: 372002
2019-09-16 15:20:10 +00:00
Oliver Cruickshank e9510a6cad [ARM] Add patterns for bitreverse intrinsic on MVE
BITREVERSE can use the VBRSR which will reverse and right shift.
Shifting right by 0 will just reverse the bits.

llvm-svn: 372001
2019-09-16 15:20:03 +00:00
Oliver Cruickshank 5f799ef162 [ARM] Lower CTTZ on MVE
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).

llvm-svn: 372000
2019-09-16 15:19:56 +00:00
Oliver Cruickshank cd1a0b9271 [ARM] Add patterns for CTLZ on MVE
CTLZ intrinsic can use the VCLS instruction on MVE, which produces
better results than expanding.

llvm-svn: 371999
2019-09-16 15:19:49 +00:00
David Green b325c05732 [ARM] Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

Differential Revision: https://reviews.llvm.org/D67186

llvm-svn: 371932
2019-09-15 14:14:47 +00:00
Sam Tebbs 1572b68509 [ARM] Add support for MVE vmaxv and vminv
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.

Differential Revision: https://reviews.llvm.org/D66413

llvm-svn: 371827
2019-09-13 09:11:46 +00:00
Guillaume Chatelet b6722af068 [Alignment] Use Align for TargetLowering::MinStackArgumentAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67288

llvm-svn: 371498
2019-09-10 09:01:18 +00:00
David Green 2b7089949e [ARM] Fix loads and stores for predicate vectors
These predicate vectors can usually be loaded and stored with a single
instruction, a VSTR_P0. However this instruction will store the entire P0
predicate, 16 bits, zeroextended to 32bits. Each lane of the the
v4i1/v8i1/v16i1 representing 4/2/1 bits.

As far as I understand, when llvm says "store this v4i1", it really does need
to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the
interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a
store followed by a load, which is how the code is expanded.

So this instead lowers the v4i1/v8i1 load/store through some shuffles to get
the bits into the correct positions. This, as you might imagine, is not as
efficient as a single instruction. But I believe it is needed for correctness.
v16i1 equally should not load/store 32bits, only storing the 16bits of data.
Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not
changing). This is fine as they are self-consistent, it is only "externally
observable loads/stores" (from our point of view) that need to be corrected.

Differential revision: https://reviews.llvm.org/D67085

llvm-svn: 371419
2019-09-09 16:35:49 +00:00
Simon Pilgrim d7d8bb937a Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.
llvm-svn: 371302
2019-09-07 11:04:04 +00:00
Sam Tebbs f1cdd95a2f [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.

This is useful for various MVE instructions, such as vmla and others that take R registers.

Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.

Differential revision: https://reviews.llvm.org/D66295

llvm-svn: 371218
2019-09-06 16:01:32 +00:00
Guillaume Chatelet 9fcf066d0c [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67278

llvm-svn: 371210
2019-09-06 14:51:15 +00:00
Guillaume Chatelet 4fc3ad9e13 [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67229

llvm-svn: 371200
2019-09-06 12:48:34 +00:00
David Candler a59bffb576 [ARM] Add support for the s,j,x,N,O inline asm constraints
A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang:

Target independent constraints:

s: An integer constant, but allowing only relocatable values

ARM specific constraints:

j: An immediate integer between 0 and 65535 (valid for MOVW)
x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3
N: An immediate integer between 0 and 31 (Thumb1 only)
O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only)

This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation.

Differential Revision: https://reviews.llvm.org/D65863

Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a
llvm-svn: 371079
2019-09-05 15:17:25 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
David Green 61973d978b [ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise
This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can
also be used in ISel Lowering, adding codesize values to the computed costs, to
be able to compare either approximate instruction counts or codesize costs.

It also adds a HasLowerConstantMaterializationCost, which compares the
ConstantMaterializationCost of two values, returning true if the first is
smaller either in instruction count/codesize, or falling back to the other in
the case that they are equal.

This is used in constant CSEL lowering to invert the predicate if the opposite
is easier to materialise.

Differential revision: https://reviews.llvm.org/D66701

llvm-svn: 370741
2019-09-03 11:06:24 +00:00
David Green 57cc65ff47 [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.
Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value.

This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used.

Code by Ranjeet Singh and Simon Tatham, with some modifications from me.

Differential revision: https://reviews.llvm.org/D66483

llvm-svn: 370739
2019-09-03 10:53:07 +00:00
David Green a95ec59fa5 [ARM] Use MQPR not QPR for MVE registers
We should be using MQPR, and if we don't we can get COPYs and PHIs created for
QPR. These get folded into instructions, failing verification checks.

Differential revision: https://reviews.llvm.org/D66214

llvm-svn: 370676
2019-09-02 17:18:23 +00:00
David Green 8469a39af3 [ARM] Remove MVE masked loads/stores
These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.

This reverts r370329.

llvm-svn: 370607
2019-09-01 10:11:40 +00:00
David Green 942c2e3795 [ARM] MVE Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

We also need to do something with unaligned loads/stores. Currently this uses a
similar method used in big endian, using an VLDRB.8 (and potentially a VREV in
BE). This does mean that the predicate mask is converted from, for example, a
v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of
the relevant mask lane, so this could potentially load different results if the
predicate is little odd. As the input is a v4i1 however, I believe this is OK
and all the bits required should be set in the predicate, making the VLDRB.8
load the same data.

Differential Revision: https://reviews.llvm.org/D66534

llvm-svn: 370329
2019-08-29 10:54:35 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
Sam Tebbs a69d9d6156 Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the
changes from the above patch.

This reverts commit cd53ff6, reapplying 7c6b229.

llvm-svn: 369638
2019-08-22 10:29:20 +00:00
Hans Wennborg cd53ff6c0d Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"
It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/

> This patch fixes shifts by a 128/256 bit shift amount. It also fixes
> codegen for shifts of 32 by delegating to LLVM's default optimisation
> instead of emitting a long shift.
>
> Tests that used to generate long shifts of 32 are updated to check for the
> more optimised codegen.
>
> Differential revision: https://reviews.llvm.org/D66519
>
> llvm-svn: 369626

llvm-svn: 369636
2019-08-22 09:16:53 +00:00
Sam Tebbs 7c6b229204 [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
This patch fixes shifts by a 128/256 bit shift amount. It also fixes
codegen for shifts of 32 by delegating to LLVM's default optimisation
instead of emitting a long shift.

Tests that used to generate long shifts of 32 are updated to check for the
more optimised codegen.

Differential revision: https://reviews.llvm.org/D66519

llvm-svn: 369626
2019-08-22 08:12:06 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Sam Tebbs f312c1ecf4 [ARM] Add support for MVE vaddv
This patch adds vecreduce_add and the relevant instruction selection for
vaddv.

Differential revision: https://reviews.llvm.org/D66085

llvm-svn: 369245
2019-08-19 09:38:28 +00:00
Jian Cai 16fa8b0970 Reland "[ARM] push LR before __gnu_mcount_nc"
This relands r369147 with fixes to unit tests.

https://reviews.llvm.org/D65019

llvm-svn: 369173
2019-08-16 23:30:16 +00:00
Jian Cai 2d957cfe02 Revert "[ARM] push LR before __gnu_mcount_nc"
This reverts commit f4cf3b9593.

llvm-svn: 369149
2019-08-16 20:40:21 +00:00
Jian Cai f4cf3b9593 [ARM] push LR before __gnu_mcount_nc
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.

Differential Revision: https://reviews.llvm.org/D65019

llvm-svn: 369147
2019-08-16 20:21:08 +00:00
David Green 8c2c5f5045 [ARM] Don't pretend we know how to generate MVE VLDn
We don't yet know how to generate these instructions for MVE. And in the case
of VLD3, we don't even have the instruction. For the moment don't tell the
vectoriser that we have VLD4, just to end up serialising the results.

Differential Revision: https://reviews.llvm.org/D66009

llvm-svn: 369101
2019-08-16 13:06:49 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
David Green 27ca82f32a [ARM] Add support for MVE pre and post inc loads and stores
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.

Differential Revision: https://reviews.llvm.org/D63840

llvm-svn: 368305
2019-08-08 15:27:58 +00:00
David Green 824ffd8b12 [ARM] MVE big endian loads/stores
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.

Differential Revision: https://reviews.llvm.org/D65583

llvm-svn: 368304
2019-08-08 15:15:19 +00:00
David Green 1becefd3f7 [ARM] Tighten up VLDRH.32 with low alignments
VLDRH needs to have an alignment of at least 2, including the
widening/narrowing versions. This tightens up the ISel patterns for it and
alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded
through the stack. It also fixed some incorrect shift amounts, which seemed to
be passing a multiple not a shift.

Differential Revision: https://reviews.llvm.org/D65580

llvm-svn: 368256
2019-08-08 06:22:03 +00:00
Oliver Cruickshank 4d4eefda6c [ARM] Expand CTPOP intrinsic for MVE
llvm-svn: 368180
2019-08-07 15:47:45 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Eli Friedman 89b80f1239 [ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1.
This is extremely specific, but saves three instructions when it's
legal.  I don't think the code can be usefully generalized.

Differential Revision: https://reviews.llvm.org/D65351

llvm-svn: 367492
2019-07-31 23:19:21 +00:00
Eli Friedman 2f45ec1c39 [ARM] Transform compare of masked value to shift on Thumb1.
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.

It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.

Differential Revision: https://reviews.llvm.org/D65175

llvm-svn: 367491
2019-07-31 23:17:34 +00:00
David Green 9cf344e739 [ARM] Better patterns for fp <> predicate vectors
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.

Differential Revision: https://reviews.llvm.org/D65103

llvm-svn: 367191
2019-07-28 13:53:39 +00:00
David Green cd7a6fa314 [ARM] Rewrite how VCMP are lowered, using a single node
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.

Differential Revision: https://reviews.llvm.org/D65072

llvm-svn: 366934
2019-07-24 17:36:47 +00:00
David Green 047a0b6575 [ARM] Disable MVE fptosi and friends
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.

Differential Revision: https://reviews.llvm.org/D65066

llvm-svn: 366931
2019-07-24 17:26:26 +00:00
David Green bab4d8ac5a [ARM] Better OR's for MVE compares
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.

Differential Revision: https://reviews.llvm.org/D65059

llvm-svn: 366920
2019-07-24 16:42:09 +00:00
David Green 4fc78c496e [ARM] MVE floating point compares and selects
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.

Some original code by David Sherwood

Differential Revision: https://reviews.llvm.org/D65054

llvm-svn: 366909
2019-07-24 14:28:22 +00:00
David Green c7e55d4f52 [ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.

Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.

Some code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65052

llvm-svn: 366890
2019-07-24 11:51:36 +00:00
David Green b9d96ceca0 [ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).

An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).

VPSEL is also added here, simply selecting on the vselect.

Original code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65051

llvm-svn: 366885
2019-07-24 11:08:14 +00:00
Sam Parker 57e87dd81b [ARM][LowOverheadLoops] Fix branch target codegen
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
    
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.

Differential Revision: https://reviews.llvm.org/D64616

llvm-svn: 366809
2019-07-23 14:08:46 +00:00
David Green fdedf240f8 [ARM] Rename NEONModImm to VMOVModImm. NFC
Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE.

llvm-svn: 366790
2019-07-23 09:19:24 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
Diogo N. Sampaio 11512e742b [ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine
Summary:
PerformVMOVRRDCombine ommits adding a offset
of 4 to the PointerInfo, when converting a
f64 = load[M]
to
{i32, i32} = {load[M], load[M + 4]}

Which would allow the machine scheduller
to break dependencies with the second load.

 - pr42638

Reviewers: eli.friedman, dmgreen, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64870

llvm-svn: 366423
2019-07-18 10:05:56 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
David Green dc56995c57 [ARM] MVE vector for 64bit types
We need to make sure that we are sensibly dealing with vectors of types v2i64
and v2f64, even if most of the time we cannot generate native operations for
them. This mostly adds a lot of testing, plus fixes up a couple of the issues
found. And, or and xor can be legal for v2i64, and shifts combining needs a
slight fixup.

Differential Revision: https://reviews.llvm.org/D64316

llvm-svn: 366106
2019-07-15 18:42:54 +00:00
David Green 6e89887642 [ARM] MVE Vector Shifts
This adds basic lowering for MVE shifts. There are many shifts in MVE, but the
instructions handled here are:
 VSHL (imm)
 VSHRu (imm)
 VSHRs (imm)
 VSHL (vector)
 VSHL (register)

MVE, like NEON before it, doesn't have shift right by a vector (or register).
We instead have to negate the amount and shift in the opposite direction. This
means we have to convert any SHR's into a form of SHL (that is still signed or
unsigned) with a negated condition and selecting from there. MVE still does
have shifting by an immediate for SHL, ASR and LSR.

This adds lowering for these and for register forms, which work well for shift
lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially
work optimally for right shifts.

Differential Revision: https://reviews.llvm.org/D64212

llvm-svn: 366056
2019-07-15 11:35:39 +00:00
David Green da750b1688 [ARM] Adjust how NEON shifts are lowered
This adjusts the way that we lower NEON shifts to use a DAG target node, not
via a neon intrinsic. This is useful for handling MVE shifts operations in the
same the way. It also renames some of the immediate shift nodes for
consistency, and moves some of the processing of immediate shifts into
LowerShift allowing it to capture more cases.

Differential Revision: https://reviews.llvm.org/D64426

llvm-svn: 366051
2019-07-15 10:44:50 +00:00
David Green 458a720ec1 [ARM] Add sign and zero extend patterns for MVE
The vmovlb instructions can be uses to sign or zero extend vector registers
between types. This adds some patterns for them and relevant testing. The
VBICIMM generation is also put behind a hasNEON check (as is already done for
VORRIMM).

Code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D64069

llvm-svn: 366008
2019-07-13 15:43:00 +00:00
David Green 4ce648b5e8 [ARM] MVE integer abs
Similar to floating point abs, we also have instructions for integers.

Differential Revision: https://reviews.llvm.org/D64027

llvm-svn: 366005
2019-07-13 14:58:32 +00:00
David Green 701bf714db [ARM] MVE integer min and max
This simply makes the MVE integer min and max instructions legal and adds the
relevant patterns for them.

Differential Revision: https://reviews.llvm.org/D64026

llvm-svn: 366004
2019-07-13 14:48:54 +00:00
David Green ac5bcbeb9f [ARM] MVE VRINT support
This adds support for the floor/ceil/trunc/... series of instructions,
converting to various forms of VRINT. They use the same suffixes as their
floating point counterparts. There is not VTINTR, so nearbyint is expanded.

Also added a copysign test, to show it is expanded.

Differential Revision: https://reviews.llvm.org/D63985

llvm-svn: 366003
2019-07-13 14:38:53 +00:00
David Green ec8af0db6c [ARM] MVE minnm and maxnm instructions
This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes,
similar to scalar types.

Original patch by Simon Tatham

Differential Revision: https://reviews.llvm.org/D63870

llvm-svn: 366002
2019-07-13 14:29:02 +00:00
Sam Parker 08b4a8da07 [ARM][LowOverheadLoops] Correct offset checking
This patch addresses a couple of problems:
1) The maximum supported offset of LE is -4094.
2) The offset of WLS also needs to be checked, this uses a
   maximum positive offset of 4094.
    
The use of BasicBlockUtils has been changed because the block offsets
weren't being initialised, but the isBBInRange checks both positive
and negative offsets.
    
ARMISelLowering has been tweaked because the test case presented
another pattern that we weren't supporting.

llvm-svn: 365749
2019-07-11 09:56:15 +00:00
Fangrui Song 7d63be09b6 [ARM] Fix null pointer dereference in CodeGen/ARM/Windows/stack-protector-msvc.ll.test after D64292/r365283
CLI.CS may not be set.

llvm-svn: 365299
2019-07-08 08:43:31 +00:00
Martin Storsjo 8d9d290d4c [ARM] Add support for MSVC stack cookie checking
Heavily based on the same for AArch64, from SVN r346469.

Differential Revision: https://reviews.llvm.org/D64292

llvm-svn: 365283
2019-07-07 18:57:31 +00:00