Commit Graph

1304 Commits

Author SHA1 Message Date
Igor Kudrin 2c14798ead [ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions
This extends D105979 and adds support for VLDR instructions.

Differential Revision: https://reviews.llvm.org/D105980
2021-08-05 14:11:11 +07:00
Igor Kudrin ddbe812bcc [ARM][llvm-objdump] Annotate PC-relative memory operands
This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for
Arm so that the disassembler can print the target address of memory
operands that use PC+immediate addressing.

Differential Revision: https://reviews.llvm.org/D105979
2021-08-05 14:11:11 +07:00
Daniel Egger 98c2e4115d [ARM] Add lowering of uadd_sat to uq{add|sub}8 and uq{add|sub}16
This follow the lead of https://reviews.llvm.org/D68974 to add lowering
of unsigned saturated addition/subtraction.

Differential Revision: https://reviews.llvm.org/D105413
2021-07-11 15:58:11 +01:00
Eli Friedman 74909e4b6e Rename MachineMemOperand::getOrdering -> getSuccessOrdering.
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving.  This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.

We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.

Differential Revision: https://reviews.llvm.org/D103338
2021-06-21 16:49:27 -07:00
Tim Northover d88f96dff3 ARM: support mandatory tail calls for tailcc & swifttailcc
This adds support for callee-pop conventions to the ARM backend so that it can
ensure a call marked "tail" is actually a tail call.
2021-05-28 11:10:51 +01:00
Tomas Matheson 9d86095ff8 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 753185031d.
2021-05-03 21:48:20 +01:00
Tomas Matheson 753185031d [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164

Originally submitted as 3338290c18.
Reverted in c7df6b1223.
2021-05-03 20:25:15 +01:00
Tomas Matheson c7df6b1223 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 3338290c18.

Broke expensive checks on debian.
2021-04-30 16:53:14 +01:00
Tomas Matheson 3338290c18 [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164
2021-04-30 16:40:33 +01:00
David Green 8de7d8b2c2 [ARM] Recognize VIDUP from BUILDVECTORs of additions
This adds a pattern to recognize VIDUP from BUILD_VECTOR of incrementing
adds. This can come up from either geps or adds, and came up recently in
D100550. We are just looking for a BUILD_VECTOR where each lane is an
add of the first lane with N*i, where i is the lane and N is one of 1,
2, 4, or 8, supported by the VIDUP instruction.

Differential Revision: https://reviews.llvm.org/D101263
2021-04-27 19:33:24 +01:00
Oliver Stannard aac056c528 [objdump][ARM] Use correct offset when printing ARM/Thumb branch targets
llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb
code is mixed in one object, or if an object is disassembled without
explicitly setting the triple to match the ISA used, then branch and
call targets will be printed incorrectly.

This could be fixed by creating two MCInstrAnalysis objects in
llvm-objdump, like we currently do for SubtargetInfo. However, I don't
think there's any reason we need two separate sub-classes of
MCInstrAnalysis, so instead these can be merged into one, and the ISA
determined by checking the opcode of the instruction.

Differential revision: https://reviews.llvm.org/D97766
2021-03-04 11:15:57 +00:00
Kristof Beyls df8ed39283 [ARM] harden-sls-blr: avoid r12 and lr in indirect calls.
As a linker is allowed to clobber r12 on function calls, the code
transformation that hardens indirect calls is not correct in case a
linker does so.  Similarly, the transformation is not correct when
register lr is used.

This patch makes sure that r12 or lr are not used for indirect calls
when harden-sls-blr is enabled.

Differential Revision: https://reviews.llvm.org/D92469
2020-12-19 12:39:59 +00:00
Kristof Beyls 195f44278c [ARM] Implement harden-sls-retbr for ARM mode
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.

To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.

Hardening of indirect function calls will be done in a later,
independent patch.

This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change.  I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.

This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.

The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision: https://reviews.llvm.org/D92395
2020-12-19 11:42:39 +00:00
David Green 92205bf122 [ARM] Remove some dead code. NFC 2020-10-24 17:22:49 +01:00
Meera Nakrani 675431b987 [ARM] Added more patterns to generate SSAT/USAT with shift
Added patterns to generate an SSAT or USAT with shift for
SSAT/USAT instructions that are matched from IR patterns.

Differential Revision: https://reviews.llvm.org/D88145
2020-09-28 14:50:19 +00:00
Meera Nakrani 20283ff491 [ARM] Generated SSAT and USAT instructions with shift
Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests.

Differential Review: https://reviews.llvm.org/D85120
2020-08-04 09:38:17 +00:00
Sjoerd Meijer 85342c27a3 [ARM] Optimize immediate selection
Optimize some specific immediates selection by materializing them with sub/mvn
instructions as opposed to loading them from the constant pool.

Patch by Ben Shi, powerman1st@163.com.

Differential Revision: https://reviews.llvm.org/D83745
2020-07-29 13:29:17 +01:00
David Green f8abecf337 [ARM] Extra MVE select(binop) patterns
This is very similar to 243970d03cace2, but handling a slightly
different form of predicated operations. When starting with a pattern of
the form select(p, BinOp(x, y), x), Instcombine will often transform
this to BinOp(x, select(p, y, 0)), where 0 is the identity value of the
binop (0 for adds/subs, 1 for muls, -1 for ands etc). This adds the
patterns that transforms those back into predicated binary operations.

There is also a very minor adjustment to tablegen null_frag in here, to
allow it to also be recognized as a PatLeaf node, so that it can be used
in MVE_TwoOpPattern to easily exclude the cases where we do not need the
alternate transform.

Differential Revision: https://reviews.llvm.org/D84091
2020-07-22 14:08:29 +01:00
David Green 76e0e1a55d [ARM] VCVTT instruction selection
We current extract and convert from a top lane of a f16 vector using a
VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The
pattern is mostly copied from a vector extract pattern, but produces a
VCVTTHS f32 directly.

This had to move some code around so that ARMInstrVFP had access to the
required pattern frags that were previously part of ARMInstrNEON.

Differential Revision: https://reviews.llvm.org/D81556
2020-06-26 08:58:55 +01:00
Stefan Agner b7d41a11cd [ARM] Make cp10 and cp11 usage a warning
The ARM ARM considers p10/p11 valid arguments for MCR/MRC instructions.
MRC instructions with p10 arguments are also used in kernel code which
is shared for different architectures. Turn usage of p10/p11 to warnings
for ARMv7/ARMv8-M.

Reviewers: rengolin, olista01, t.p.northover, efriedma, psmith, simon_tatham

Reviewed By: simon_tatham

Subscribers: hiraditya, danielkiss, jcai19, tpimh, nickdesaulniers, peter.smith, javed.absar, kristof.beyls, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59733
2020-06-24 23:37:54 +02:00
Victor Campos c010d4d195 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Differential Revision: https://reviews.llvm.org/D70072
2020-05-28 10:52:43 +01:00
Victor Campos 872ee78f65 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 8a12553223.

A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.

This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
2020-05-22 11:01:57 +01:00
Momchil Velikov bc2e572f51 Re-commit: [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to no use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns

* when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-14 16:46:16 +01:00
David Green 6eee2d9b5b [ARM] Convert VDUPLANE to VDUP under MVE
Unlike Neon, MVE does not have a way of duplicating from a vector lane,
so a VDUPLANE currently selects to a VDUP(move_from_lane(..)). This
forces that to be done earlier as a dag combine to allow other folds to
happen.

It converts to a VDUP(EXTRACT). On FP16 this is then folded to a
VGETLANEu to prevent it from creating a vmovx;vmovhr pair, using a
single move_from_reg instead.

Differential Revision: https://reviews.llvm.org/D79606
2020-05-09 18:58:13 +01:00
Momchil Velikov fb18dffaeb Revert "[ARM] CMSE code generation"
This reverts commit 7cbbf89d23.

The regression tests fail with the expensive checks.
2020-05-05 19:05:40 +01:00
Momchil Velikov 7cbbf89d23 [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to not use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns
  when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-05 18:23:28 +01:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
David Green fbd53ffc3a [ARM] MVE VMULL patterns
This adds MVE vmull patterns, which are conceptually the same as
mul(vmovl, vmovl), and so the tablegen patterns follow the same
structure.

For i8 and i16 this is simple enough, but in the i32 version the
multiply (in 64bits) is illegal, meaning we need to catch the pattern
earlier in a dag fold. Because bitcasts are involved in the zext
versions and the patterns are a little different in little and big
endian. I have only added little endian support in this patch.

Differential Revision: https://reviews.llvm.org/D76740
2020-04-02 10:57:40 +01:00
David Green 2c5f43f9dd [ARM] Fix qdadd operand order
qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched
the wrong way around.

Differential Revision: https://reviews.llvm.org/D77049
2020-03-31 10:11:36 +01:00
Stefan Agner f87563661d [MC][ARM] add implicit immediate form for ldrsbt/ldrht/ldrsht
Add pseudo instructions for ldrsbt/ldrht/ldrsht with implicit immediate
and add fall back C++ code to transform the instruction to the
equivalent LDRSBTi/LDRHTi/LDRSHTi form.

This is similar to how it has been done in commit
fb3950ec63

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=45070
2020-03-19 22:36:42 +01:00
Victor Campos 8a12553223 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-03-11 10:19:27 +00:00
Simon Tatham 9dcc1667ab [ARM] Allow `ARMVectorRegCast` to match bitconverts too. (NFC)
Summary:
When we start putting instances of `ARMVectorRegCast` in complex isel
patterns, it will be awkward that they're often turned into the more
standard `bitconvert` in little-endian mode. We'd rather not have to
write separate isel patterns for the two endiannesses, matching
different but equivalent cast operations.

This change aims to fix that awkwardness in advance, by turning the
Tablegen record `ARMVectorRegCast` from a simple `SDNode` instance
into a `PatFrags` that can match either kind of cast – with a
predicate that prevents it matching a bitconvert in the big-endian
case, where bitconvert isn't semantically identical.

No existing code generation should be affected by this change, but it
will enable the patterns introduced by D74336 to work in both
endiannesses.

Reviewers: dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74716
2020-02-18 09:34:50 +00:00
Mikhail Maltsev dd4d093762 [ARM] Add initial support for Custom Datapath Extension (CDE)
Summary:
This patch adds assembly-level support for a new Arm M-profile
architecture extension, Custom Datapath Extension (CDE).

A brief description of the extension is available at
https://developer.arm.com/architectures/instruction-sets/custom-instructions

The latest specification for CDE is currently a beta release and is
available at
https://static.docs.arm.com/ddi0607/aa/DDI0607A_a_armv8m_arm_supplement_cde.pdf

CDE allows chip vendors to add custom CPU instructions.  The CDE
instructions re-use the same encoding space as existing coprocessor
instructions (such as MRC, MCR, CDP etc.). Each coprocessor in range
cp0-cp7 can be configured as either general purpose (GCP) or custom
datapath (CDEv1).  This configuration is defined by the CPU vendor and
is provided to LLVM using 8 subtarget features: cdecp0 ... cdecp7.

The semantics of CDE instructions are implementation-defined, but the
instructions are guaranteed to be pure (that is, they are stateless,
they do not access memory or any registers except their explicit
inputs/outputs).

CDE requires the CPU to support at least Armv8.0-M mainline
architecture. CDE includes 3 sets of instructions:
* Instructions that operate on general purpose registers and NZCV
  flags
* Instructions that operate on the S or D register file (require
  either FP or MVE extension)
* Instructions that operate on the Q register file, require MVE

The user-facing names that can be specified on the command line are
the same as the 8 subtarget feature names. For example:

    $ clang -target arm-none-none-eabi -march=armv8m.main+cdecp0+cdecp3

tells the compiler that the coprocessors 0 and 3 are configured as
CDEv1 and the remaining coprocessors are configured as GCP (which is
the default).

Reviewers: simon_tatham, ostannard, dmgreen, eli.friedman

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74044
2020-02-17 15:39:16 +00:00
David Green 9d4c597541 [ARM] Fix ReconstructShuffle for bigendian
Simon pointed out that this function is doing a bitcast, which can be
incorrect for big endian. That makes the lowering of VMOVN in MVE
wrong, but the function is shared between Neon and MVE so both can
be incorrect.

This attempts to fix things by using the newly added VECTOR_REG_CAST
instead of the BITCAST. As it may now be used on Neon, I've added the
relevant patterns for it there too. I've also added a quick dag combine
for it to remove them where possible.

Differential Revision: https://reviews.llvm.org/D74485
2020-02-13 09:56:46 +00:00
Victor Campos af2a384581 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 60e0120c91.
2020-02-08 13:18:45 +00:00
Simon Tatham 4321c6af28 [ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)

The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.

The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.

This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.

Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 11:53:52 +00:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
Matt Arsenault 0d9f919b73 DAG: Use TargetConstant for FENCE operands 2020-01-02 17:16:10 -05:00
Victor Campos 2ff5a596cb Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit bbcf1c3496.
2019-12-20 18:08:24 +00:00
Victor Campos bbcf1c3496 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2019-12-19 11:23:01 +00:00
Simon Tatham 25305a9311 [ARM][MVE] Add intrinsics for more immediate shifts.
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.

`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.

I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.

(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71458
2019-12-13 13:07:39 +00:00
David Green 469ee617a0 [ARM] Add ARMVCCThen to tablegen and make use of it. NFC
Similar to the parent, this adds some constants to tablegen to replace
the existing magic values.

Differential Revision: https://reviews.llvm.org/D70825
2019-12-02 19:57:12 +00:00
David Green a223a4d66f [ARM] Add ARMCC constants to tablegen. NFC
I got tired of looking at magic constants in tablegen files. This adds
condition codes like ARMCCeq and makes use of them.

I also removed the extra patterns for reverse condition codes from
D70296, they should now be covered by the parent commit.

Differential Revision: https://reviews.llvm.org/D70824
2019-12-02 19:57:12 +00:00
David Green 0765a4c288 [ARM] Extra qdadd patterns
This adds some new qdadd patterns to go along with the other recently added
qadd's.

Differential Revision: https://reviews.llvm.org/D68999

llvm-svn: 375414
2019-10-21 14:06:49 +00:00
David Green d7b77f2203 [ARM] Add qadd lowering from a sadd_sat
This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the
same time.

The qadd instruction sets the q flag, but we already have many cases where we
do not model this in llvm.

Differential Revision: https://reviews.llvm.org/D68976

llvm-svn: 375411
2019-10-21 12:33:46 +00:00
David Green fba831e791 [ARM] Lower sadd_sat to qadd8 and qadd16
Lower the target independent signed saturating intrinsics to qadd8 and qadd16.
This custom lowers them from a sadd_sat, catching the node early before it is
promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a
qadd8/qadd16, so that we can call demand bits on it to show that it does not
use the upper bits.

Also handles QSUB8 and QSUB16.

Differential Revision: https://reviews.llvm.org/D68974

llvm-svn: 375402
2019-10-21 09:53:38 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
David Green 120a5e9a74 [ARM] Cortex-M4 schedule additions
This is an attempt to fill in some of the missing instructions from the
Cortex-M4 schedule, and make it easier to do the same for other ARM cpus.

- Some instructions are marked as hasNoSchedulingInfo as they are pseudos or
  otherwise do not require scheduling info
- A lot of features have been marked not supported
- Some WriteRes's have been added for cvt instructions.
- Some extra instruction latencies have been added, notably by relaxing the
  regex for dsp instruction to catch more cases, and some fp instructions.

This goes a long way to get the CompleteModel working for this CPU. It does not
go far enough as to get all scheduling info for all output operands correct.

Differential Revision: https://reviews.llvm.org/D67957

llvm-svn: 373163
2019-09-29 08:38:48 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00