Commit Graph

11887 Commits

Author SHA1 Message Date
Chris Bieneman 9130e471fe Add DXContainer
DXIL is wrapped in a container format defined by the DirectX 11
specification. Codebases differ in calling this format either DXBC or
DXILContainer.

Since eventually we want to add support for DXBC as a target
architecture and the format is used by DXBC and DXIL, I've termed it
DXContainer here.

Most of the changes in this patch are just adding cases to switch
statements to address warnings.

Reviewed By: pete

Differential Revision: https://reviews.llvm.org/D122062
2022-03-29 14:34:23 -05:00
Shao-Ce SUN 662b9fa02c [NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122557
2022-03-29 09:53:24 +08:00
Kazu Hirata 6212871968 [Target] Apply clang-tidy fixes for readability-redundant-member-init (NFC) 2022-03-27 22:22:37 -07:00
Maksim Panchenko 4ae9745af1 [Disassember][NFCI] Use strong type for instruction decoder
All LLVM backends use MCDisassembler as a base class for their
instruction decoders. Use "const MCDisassembler *" for the decoder
instead of "const void *". Remove unnecessary static casts.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D122245
2022-03-25 18:53:59 -07:00
Vasileios Porpodas 39aa202aff Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit e6ead19b77.
2022-03-23 18:32:17 -07:00
Arthur Eubanks e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f9492.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
Vasileios Porpodas 27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d.
2022-03-22 16:41:55 -07:00
Arthur Eubanks f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d3.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Craig Topper 9b0f227d7b [TableGen][RISCV] Add InstAliases with zero_reg to cover unmasked vnot.v, vncvt.x.x.w, vneg.v, etc.
The mask being NoRegister prevented the existing aliases from matching
since NoRegister isn't in the VMV0 register class.

To workaround this I've added new aliases that look for zero_reg.
I had to motify tablegen to generate matching code for zero_reg.
And as a consequence, I had to change the EmitPriority for an ARM
alias that used zero_reg that started printing.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D121496
2022-03-22 10:14:43 -07:00
Vasileios Porpodas 79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb0 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Eli Friedman ddca66622c [ARM] Fix shouldExpandAtomicLoadInIR for subtargets without ldrexd.
Regression from 2f497ec3; we should not try to generate ldrexd on
targets that don't have it.

Also, while I'm here, fix shouldExpandAtomicStoreInIR, for consistency.
That doesn't really have any practical effect, though.  On Thumb targets
where we need to use __sync_* libcalls, there is no libcall for stores,
so SelectionDAG calls __sync_lock_test_and_set_8 anyway.
2022-03-18 15:54:38 -07:00
Eli Friedman 2f497ec3a0 [ARM] Fix ARM backend to correctly use atomic expansion routines.
Without this patch, clang would generate calls to __sync_* routines on
targets where it does not make sense; we can't assume the routines exist
on unknown targets. Linux has special implementations of the routines
that work on old ARM targets; other targets have no such routines. In
general, atomics operations which aren't natively supported should go
through libatomic (__atomic_*) APIs, which can support arbitrary atomics
through locks.

ARM targets older than v6, where this patch makes a difference, are rare
in practice, but not completely extinct. See, for example, discussion on
D116088.

This also affects Cortex-M0, but I don't think __sync_* routines
actually exist in any Cortex-M0 libraries. So in practice this just
leads to a slightly different linker error for those cases, I think.

Mechanically, this patch does the following:

- Ensures we run atomic expansion unconditionally; it never makes sense to
completely skip it.
- Fixes getMaxAtomicSizeInBitsSupported() so it returns an appropriate
number on all ARM subtargets.
- Fixes shouldExpandAtomicRMWInIR() and shouldExpandAtomicCmpXchgInIR() to
correctly handle subtargets that don't have atomic instructions.

Differential Revision: https://reviews.llvm.org/D120026
2022-03-18 12:43:57 -07:00
Tomas Matheson 831ab35b2f [ARM][AArch64] generate subtarget feature flags
Reland of D120906 after sanitizer failures.

This patch aims to reduce a lot of the boilerplate around adding new subtarget
features. From the SubtargetFeatures tablegen definitions, a series of calls to
the macro GET_SUBTARGETINFO_MACRO are generated in
ARM/AArch64GenSubtargetInfo.inc.  ARMSubtarget/AArch64Subtarget can then use
this macro to define bool members and the corresponding getter methods.

Some naming inconsistencies have been fixed to allow this, and one unused
member removed.

This implementation only applies to boolean members; in future both BitVector
and enum members could also be generated.

Differential Revision: https://reviews.llvm.org/D120906
2022-03-18 16:07:00 +00:00
Tomas Matheson 62c481542e Revert "[ARM][AArch64] generate subtarget feature flags"
This reverts commit dd8b0fecb9.
2022-03-18 11:58:20 +00:00
Tomas Matheson dd8b0fecb9 [ARM][AArch64] generate subtarget feature flags
This patch aims to reduce a lot of the boilerplate around adding new subtarget
features. From the SubtargetFeatures tablegen definitions, a series of calls to
the macro GET_SUBTARGETINFO_MACRO are generated in
ARM/AArch64GenSubtargetInfo.inc.  ARMSubtarget/AArch64Subtarget can then use
this macro to define bool members and the corresponding getter methods.

Some naming inconsistencies have been fixed to allow this, and one unused
member removed.

This implementation only applies to boolean members; in future both BitVector
and enum members could also be generated.

Differential Revision: https://reviews.llvm.org/D120906
2022-03-18 11:48:20 +00:00
Sterling Augustine bd38234d76 Reland "Use a stable-sort when combining bases"
Differential Revision: https://reviews.llvm.org/D121922
2022-03-17 11:32:16 -07:00
Archibald Elliott f496330f97 [ARM] Fix Decode of tsb csync
There is a crash in the ARM backend when attempting to decode a "tsb
csync" instruction using `llvm-objdump --triple=armv8.4a -d`. The crash
was in `ARMMCInstrAnalysis::evaluateBranch` where the number of operands
in the decoded instruction (0) did not match the number of operands in
the instruction description (1).

This is becuase `tsb csync` looks like it has an operand during
assembly, but there is only one valid operand (csync), so there is no
encoding space in the instruction for the operand, so the decoder never
has a field to decode that represents `csync`.

The fix is to add a custom decode method, which ensures that this
instruction does have the right number of operands after decoding. This
method merely adds the only available operand value, `ARM_TSB::CSYNC`.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D121479
2022-03-17 17:29:31 +00:00
Sterling Augustine 84810e1f74 Revert "Use a stable-sort when combining bases"
This reverts commit 81417261a1.
2022-03-17 09:54:13 -07:00
Sterling Augustine 81417261a1 Use a stable-sort when combining bases
While experimenting with different algorithms for std::sort
I discovered that combine-vmovdrr.ll fails if this sort is not
stable.

I suspect that the test is too stringent in its check--the resultant
code looks functionally identical to me under both stable and unstable
sorting, but a generic fix is quite a bit more difficult to implement.

Thanks to scw@google.com for finding the proper fix.

Differential Revision: https://reviews.llvm.org/D121870
2022-03-17 09:02:13 -07:00
Arthur Eubanks 2371c5a0e0 [OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Basically the same as D120527.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D121847
2022-03-16 14:11:53 -07:00
Shengchen Kan 37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
serge-sans-paille 989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
serge-sans-paille ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Fangrui Song 689c3a2552 [MC] Fix letter case of some MCSection member functions 2022-03-11 20:07:00 -08:00
Zhiyao Ma adc26b4eae [ARM] Fix 8-bit immediate overflow in the instruction of segmented stack prologue.
It fixes the overflow of 8-bit immediate field in the emitted
instruction that allocates large stacklet.

For thumb2 targets, load large immediate by a pair of movw and movt
instruction. For thumb1 and ARM targets, load large immediate by reading
from literal pool.

Differential Revision: https://reviews.llvm.org/D118545
2022-03-10 15:15:24 -08:00
Nico Weber a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeea.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille 7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Mircea Trofin cb2160760e [nfc][codegen] Move RegisterBank[Info].h under CodeGen
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.

Differential Revision: https://reviews.llvm.org/D119876
2022-03-01 21:53:25 -08:00
Jameson Nash c4b1a63a1b mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D120518
2022-02-25 14:30:44 -05:00
Craig Topper c7d6448d03 [DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable.
Internally to DAGCombiner the SDValues were passed by non-const
reference despite not being modified. They were then passed by
const reference to TLI.

This patch passes them by value which is consistent with the vast
majority of code.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120420
2022-02-23 12:40:45 -08:00
David Green a10789d6cd [ARM] Recognize SSAT and USAT from SMIN/SMAX
We have some recognition of SSAT and USAT from SELECT_CC at the moment.
This extends the matching to SMIN/SMAX which can help catch more cases,
either from min/max being the canonical form in instcombine or from some
expanded nodes like fp_to_si_sat.

Differential Revision: https://reviews.llvm.org/D119819
2022-02-23 08:55:54 +00:00
Egor Zhdan 3a1cb36237 Add DriverKit support
This patch is the first in a series of patches to upstream the support for Apple's DriverKit. Once complete, it will allow targeting DriverKit platform with Clang similarly to AppleClang.

This code was originally authored by JF Bastien.

Differential Revision: https://reviews.llvm.org/D118046
2022-02-22 13:42:53 +00:00
Craig Topper a6fb1bb306 [ARM] Remove unused lowerABS function. NFC
This function was added in D49837, but no setOperationAction call
was added with it. The code is equivalent to what is done by the
default ExpandIntRes_ABS implementation when ADDCARRY is supported.
Test case added to verify this. There was some existing coverage
from Thumb2 MVE tests, but they started from vectors.
2022-02-20 22:43:23 -08:00
Craig Topper 440c4b705a [SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).

By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.

Some X86 tests for Z - abs(X) seem to have improved as well.

Other targets look to be a wash.

I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.

This is an alternative to D119099 which was focused on RISCV only.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119171
2022-02-20 21:11:23 -08:00
Simon Pilgrim ae4bec20c4 [ARM] ARMAsmPrinter::emitAttributes - remove unnecessary nullptr test.
The MMI pointer has already been dereferenced several times.
2022-02-18 10:36:40 +00:00
Jessica Paquette 6d58f4ab07 [MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.

This patch hides this stuff behind some helper functions:

* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`

This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.

Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.

Some other quality-of-code improvements:

* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`

This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
2022-02-16 11:39:07 -08:00
Shao-Ce SUN 2aed07e96c [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 13:10:09 +08:00
David Green ea6ebbcfb3 [ARM] MVE hadd and rhadd
This uses the nodes from D106237 to add MVE HADD and RHADD lowering.

Differential Revision: https://reviews.llvm.org/D106238
2022-02-14 11:55:40 +00:00
Yuanfang Chen f927021410 Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de0.

Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
2022-02-10 15:16:17 -08:00
Yuanfang Chen b380a31de0 Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683.

Break bots:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
2022-02-10 14:17:37 -08:00
Yuanfang Chen bd3a1de683 [clang-cl] Support the /JMC flag
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428
2022-02-10 10:26:30 -08:00
Oliver Stannard a76620143c [ARM] Patterns for vector conversion between half and float
These patterns were omitted because clang only allows converting between
these types using intrinsics, but other front-ends or optimisation
passes may want to use them.

Differential revision: https://reviews.llvm.org/D119354
2022-02-10 09:51:55 +00:00
serge-sans-paille ef736a1c39 Cleanup LLVMMC headers
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h

Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after:  1049293745

Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
2022-02-09 11:09:17 +01:00
Mark Murray 3d7662142d [ARM] Undeprecate complex IT blocks
AArch32/Armv8A  introduced the performance deprecation of certain patterns
of IT instructions.  After some debate internal to ARM, this is now being
reverted; i.e. no IT instruction patterns are performance deprecated
anymore, as the perfomance degredation is not significant enough.

This reverts the following:

"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of
IT that apply to instructions other than a single subsequent 16-bit
instruction from a restricted set are deprecated, as are explicit
references to the PC within that single 16-bit instruction. This permits
the non-deprecated forms of IT and subsequent instructions to be treated
as a single 32-bit conditional instruction."

The deprecation no longer applies, but the behaviour may be controlled
by the -arm-restrict-it and -arm-no-restrict-it command-line options,
with the latter being the default. No warnings about complex IT blocks
will be generated.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118044
2022-02-07 15:47:53 +00:00
Kazu Hirata 3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
David Green b7d3a2b62f [ARM] Mark i64 and f64 shuffles as Custom for MVE
This way they get lowered through the ARMISD::BUILD_VECTOR, which can
produce more efficient D register moves.

Also helps D115653 not get stuck in a loop.
2022-02-06 16:17:06 +00:00
Simon Pilgrim 5aa2acc86b [DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
2022-02-02 12:04:49 +00:00
tyb0807 762f0b5463 [ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td
Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded
instruction size for some pseudo-instructions, while this
information should ideally be found in ARMInstrInfo.td,
ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence,
the .td files should be updated and no hard-coded instruction sizes
should be used by getInstSizeInBytes() anymore.

Differential Revision: https://reviews.llvm.org/D118009
2022-02-01 10:39:14 +00:00
David Sherwood daa80339df [CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.

Differential Revision: https://reviews.llvm.org/D117210
2022-02-01 09:50:00 +00:00
Ties Stuij 6b1e844b69 [ARM] Add Cortex-X1C Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-X1C processor for AArch64 and
ARM.

For more information, see:
- https://community.arm.com/arm-community-blogs/b/announcements/posts/arm-cortex-x1c
- https://developer.arm.com/documentation/101968/0002/Functional-description/Technical-overview/Components

The following people contributed to this patch:
- Simon Tatham
- Ties Stuij

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D117202
2022-01-31 14:23:35 +00:00
Nikita Popov 93122b2567 [ARM] Don't look through pointer types in canTailPredicateLoop()
Inspecting the pointer element type here is incompatible with
opaque pointers, and doesn't seem necessary to me. I think the
intention might have been to check the type of load/store pointer
arguments, but I believe those should get checked through their
return type or value operand anyway. I don't get any test failures
if I simply drop this.

Differential Revision: https://reviews.llvm.org/D118353
2022-01-28 09:34:13 +01:00
David Green 82973edfb7 [ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.

Fixes #53120 and #50905 and #51761.

Differential Revision: https://reviews.llvm.org/D117592
2022-01-27 19:19:46 +00:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Jim Lin da1cac7d19 [NFC] Remove duplicate include 2022-01-26 15:10:16 +08:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
serge-sans-paille 75e164f61d [llvm] Cleanup header dependencies in ADT and Support
The cleanup was manual, but assisted by "include-what-you-use". It consists in

1. Removing unused forward declaration. No impact expected.
2. Removing unused headers in .cpp files. No impact expected.
3. Removing unused headers in .h files. This removes implicit dependencies and
   is generally considered a good thing, but this may break downstream builds.
   I've updated llvm, clang, lld, lldb and mlir deps, and included a list of the
   modification in the second part of the commit.
4. Replacing header inclusion by forward declaration. This has the same impact
   as 3.

Notable changes:

- llvm/Support/TargetParser.h no longer includes llvm/Support/AArch64TargetParser.h nor llvm/Support/ARMTargetParser.h
- llvm/Support/TypeSize.h no longer includes llvm/Support/WithColor.h
- llvm/Support/YAMLTraits.h no longer includes llvm/Support/Regex.h
- llvm/ADT/SmallVector.h no longer includes llvm/Support/MemAlloc.h nor llvm/Support/ErrorHandling.h

You may need to add some of these headers in your compilation units, if needs be.

As an hint to the impact of the cleanup, running

clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Support/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l

before: 8000919 lines
after:  7917500 lines

Reduced dependencies also helps incremental rebuilds and is more ccache
friendly, something not shown by the above metric :-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
2022-01-21 13:54:49 +01:00
Jim Lin d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Fangrui Song 2e589c9c42 [MC][ARM] Replace MCContext::reportFatalError call with reportError
This call is slightly try. We need to postpone getFixupKindNumBytes.
2022-01-15 00:32:03 -08:00
Fangrui Song e2b66928e5 [MC][ARM] Replace MCContext::reportFatalError call with reportError 2022-01-15 00:13:49 -08:00
Ties Stuij 7c70f96a91 [ARM] fix bug causing shrinkwrapping not always being off using PAC
If you want to check for all uses of PAC, the SpillsLR argument to
shouldSignReturnAddress should be true instead of false, as that value will be
returned from the function if the other checks fall through.

Reviewed By: miyuki

Differential Revision: https://reviews.llvm.org/D116213
2022-01-13 10:37:00 +00:00
Rosie Sumpter 552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
David Green 351edf1c47 [ARM] Remove FeaturePerfMon from armv7-m
FeaturePerfMon relates to the PMU extensions available in armv7-a, and
should not be available in v7-m (it requires loading from a system
register with a mrc). Sink it down a level in the dependency map so that
it isn't present in ARMv7m or HasV8MMainlineOps.

It is also removed from the Neoverse-N2, as it will already be
transitively included.

Differential Revision: https://reviews.llvm.org/D117022
2022-01-12 09:44:53 +00:00
Tim Northover 0b5b35fdbd ARM: make FastISel & GISel pass -1 to ADJCALLSTACKUP to signal no callee pop.
The interface for these instructions changed with support for mandatory tail
calls, and now -1 indicates the CalleePopAmount argument is not valid.
Unfortunately I didn't realise FastISel or GISel did calls at the time so
didn't update them.
2022-01-11 11:31:13 +00:00
Kazu Hirata 4e2ec7e38d [llvm] Remove unused forward declarations (NFC) 2022-01-07 20:00:34 -08:00
Kazu Hirata 2aed08131d [llvm] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-07 00:39:14 -08:00
Kazu Hirata f3a344d212 [Target] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-06 22:01:44 -08:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Lucas Prates cd7f621a0a [ARM][AArch64] Introduce Armv9.3-A
This patch introduces support for targetting the Armv9.3-A architecture,
which should map to the existing Armv8.8-A extensions.

Differential Revision: https://reviews.llvm.org/D116158
2022-01-03 12:40:43 +00:00
Kazu Hirata 41bfac6aed [Target] Remove unused forward declarations (NFC) 2022-01-02 10:20:15 -08:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
David Green 319e77592f [ARM] Verify addressing immediates
This adds at extra check into ARMBaseInstrInfo::verifyInstruction to
verify the offsets used in addressing mode immediates using
isLegalAddressImm. Some tests needed fixing up as a result, adjusting
the opcode created from CMSE stack adjustments.

Differential Revision: https://reviews.llvm.org/D114939
2022-01-01 20:08:45 +00:00
Kazu Hirata 69ccc96162 [llvm] Use the default constructor for SDValue (NFC) 2022-01-01 10:36:59 -08:00
Simon Tatham d50072f74e [ARM] Introduce an empty "armv8.8-a" architecture.
This is the first commit in a series that implements support for
"armv8.8-a" architecture. This should contain all the necessary
boilerplate to make the 8.8-A architecture exist from LLVM and Clang's
point of view: it adds the new arch as a subtarget feature, a definition
in TargetParser, a name on the command line, an appropriate set of
predefined macros, and adds appropriate tests. The new architecture name
is supported in both AArch32 and AArch64.

However, in this commit, no actual _functionality_ is added as part of
the new architecture. If you specify -march=armv8.8a, the compiler
will accept it and set the right predefines, but generate no code any
differently.

Differential Revision: https://reviews.llvm.org/D115694
2021-12-31 16:43:53 +00:00
Kazu Hirata 5a667c0e74 [llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2021-12-28 08:52:25 -08:00
Kazu Hirata 8445883327 [llvm] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-12-27 15:58:03 -08:00
David Green 2ec3ca7477 [ARM] Extend IsCMPZCSINC to handle CMOV
A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp',
and can be treated the same in IsCMPZCSINC added in D114013. This allows
us to remove the unnecessary CMOV in the same way that we could remove a
CSINC.

Differential Revision: https://reviews.llvm.org/D115188
2021-12-27 14:15:03 +00:00
Kazu Hirata 0a5788ab57 [Target] Use range-based for loops (NFC) 2021-12-26 23:49:38 -08:00
Kazu Hirata c5cf7d910e [ARM] Use range-based for loops (NFC) 2021-12-20 23:06:47 -08:00
Kazu Hirata de90490060 Revert "[ARM] Use range-based for loops (NFC)"
This reverts commit 93d79cac2e.

This patch seems to break
llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.
2021-12-20 10:51:36 -08:00
Kazu Hirata 93d79cac2e [ARM] Use range-based for loops (NFC) 2021-12-20 00:04:53 -08:00
David Green 4ece4cd77e [ARM] Fold away CMP/CSINC from CMOV
This makes use of the code in D114013 to fold away unnecessary
CMPZ/CSINC starting from a CMOV, in a similar way to how we fold away
CSINV/CSINC/etc

Differential Revision: https://reviews.llvm.org/D115185
2021-12-19 21:53:50 +00:00
Kazu Hirata 26bd534a79 [llvm] Use none_of instead of \!any_of (NFC) 2021-12-17 13:48:57 -08:00
David Green 6bd8f114c8 [ARM] Handle splats of constants for MVE qr instruction
Some MVE instructions have qr variants that take a Q and R register,
splatting the R register for each lane. This is usually handled fine for
standard splats as we sink the splat into the loop and combine the
resulting dup into the qr instruction. It does not work for constant
splats though, as we generate a vmovimm or constant pool load instead.

This intercepts that, generating a vdup of the constant instead where we
can turn the result into a qr instruction variant.

Differential Revision: https://reviews.llvm.org/D115242
2021-12-17 09:16:28 +00:00
David Green 26f6fbe2be [ARM] Add AddrModeT2_i8neg addressing mode support for frame lowering.
As reported from a failing firefox build, we can sometimes get frame
indices with negative offsets from a t2LDRi8. This adds support for
them, to prevent the crash.
2021-12-14 12:49:27 +00:00
Kazu Hirata 483499670e [Target] Use llvm::reverse (NFC) 2021-12-12 08:34:24 -08:00
Kazu Hirata c2bb9637d9 Use llvm::any_of and llvm::all_of (NFC) 2021-12-11 11:54:37 -08:00
Kazu Hirata 36b8a4f9f3 [llvm] Use llvm::is_contained (NFC) 2021-12-11 11:42:09 -08:00
Mikael Holmen d0f55a0d80 [ARM] Fix gcc warning about mix of enumeral and non-enumeral types
gcc warned with
../lib/Target/ARM/ARMFrameLowering.cpp:797:31: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
  797 |               Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
      |               ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
2021-12-09 10:31:56 +01:00
Mircea Trofin b012742405 [NFC] Rename MachineFunction::deleteMachineInstr (coding style) 2021-12-08 20:36:13 -08:00
David Green d43c801d13 [ARM] Peek through And 1 in IsCMPZCSINC
We can be in situations where And 1 zext nodes will not have been yet,
preventing us from detecting removable cmpz/csinc patterns. This peeks
through those nodes allowing us to simplify more code.

Differential Revision: https://reviews.llvm.org/D115176
2021-12-08 15:40:23 +00:00
Ties Stuij 63eb7ff47d [ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:

- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly

Some notes:

We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.

For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.

For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.

This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.

Epilogue emission code adjusted in a similar manner.

PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.

note on tail calls:

If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.

When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.

One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.

Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112429
2021-12-07 10:15:19 +00:00
Ties Stuij 0fbb17458a [ARM] Implement setjmp BTI placement for PACBTI-M
This patch intends to guard indirect branches performed by longjmp
by inserting BTI instructions after calls to setjmp.

Calls with 'returns-twice' are lowered to a new pseudo-instruction
named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Alexandros Lamprineas
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112427
2021-12-06 11:07:10 +00:00
David Green d8495e0352 [ARM] Add a vrinta.f16.f16 alias
The v8.1-m ARMARM uses the vrinta.f16.f16 names, as opposed to
vrinta.f16. This adds an alias for it in the same way that we have for
f32 and f64.

Differential Revision: https://reviews.llvm.org/D68127
2021-12-06 11:06:25 +00:00
David Green ab0c5cea0b [ARM] Use v2i1 for MVE and CDE intrinsics
This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal
type, to use a <2 x i1> as opposed to emulating the predicate with a
<4 x i1>. The v4i1 workarounds have been removed leaving the natural
v2i1 types, notably in vctp64 which now generates a v2i1 type.

AutoUpgrade code has been added to upgrade old IR, which needs to
convert the old v4i1 to a v2i1 be converting it back and forth to an
integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be
optimized away in the final assembly.

Differential Revision: https://reviews.llvm.org/D114455
2021-12-03 15:27:58 +00:00
David Green 255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
David Green b8f1ccb0ac [ARM] Introduce i8neg and i8pos addressing modes
Some instructions with i8 immediate ranges can only hold negative values
(like t2LDRHi8), only hold positive values (like t2STRT) or hold +/-
depending on the U bit (like the pre/post inc instructions. e.g
t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8,
AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.

This allows us to get the offset ranges of t2LDRHi8 correct in the
load/store optimizer, fixing issues where we could end up creating
instructions with positive offsets (which may then be encoded as ldrht).

Differential Revision: https://reviews.llvm.org/D114638
2021-12-02 17:10:26 +00:00
David Green e629302558 [ARM] Correct range in isLegalAddressImm
The ranges in isLegalAddressImm were off by one, not allowing the
maximum values for unscaled offsets.

Differential Revision: https://reviews.llvm.org/D114636
2021-12-02 11:33:40 +00:00
David Green 646c872f9d [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Recommitted from 0e98659ea1 with a fix for APInt comparison.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-02 07:56:27 +00:00
David Green 13e66c070b Revert "[ARM] Teach getIntImmCostInst about the cost of saturating fp converts"
This reverts commit 6d41de380f as the
windows bots are not happy, in a way I do not understand. Revert whilst
we figure out what is wrong.
2021-12-01 15:25:19 +00:00
Ties Stuij f5f28d5b0c [ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.

BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
  linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads

Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).

Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.

The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.

The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.

PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).

The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.

Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.

The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112426
2021-12-01 12:54:05 +00:00
Ties Stuij b430782be3 [ARM] emit PACBTI-M build attributes
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112425
2021-12-01 11:05:29 +00:00
Ties Stuij c12c7a84b0 [ARM] add common parts for PACBTI-M support in the backend
This patch encapsulates decision logic about when and how to generate
PAC/BTI related code. It's a part shared by PAC-RET, BTI placement,
build attribute emission, etc, so it make sense committing it
separately in order to unblock the aforementioned parts, which can
proceed concurrently.

This patch adds a few member functions to `ARMFunctionInfo`, which are currently
unused, therefore there is no testing for them at the moment. This code is
tested in follow-up PAC/BTI code gen patches.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112423
2021-12-01 10:48:30 +00:00
David Green 6d41de380f [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-01 10:25:52 +00:00
David Green 388bfc5408 [ARM] Fix some identing in ARMAsmPrinter::emitInstruction, NFC 2021-12-01 10:08:37 +00:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Ties Stuij 5cff77c23f [clang][ARM] PACBTI-M assembly support
Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional
extension in v8.1-M.

There are 10 new system registers and 5 new instructions, all predicated on the
feature.

The attribute for llvm-mc is called "pacbti". For armclang, an architecture
extension also called "pacbti" was created.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112420
2021-11-30 09:28:18 +00:00
Nick Desaulniers 89453ed6f2 [ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards
We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the
stack guard for PIC code that references the GOT, since arm-pseudo may
expand this to the narrow tLDRpci rather than the wider t2LDRpci.

Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci.

Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361

Reviewed By: ardb

Differential Revision: https://reviews.llvm.org/D114762
2021-11-30 08:46:05 +01:00
Kazu Hirata c73fc74ce0 [llvm] Use range-based for loops (NFC) 2021-11-28 10:04:54 -08:00
David Green 5c64d8ef8c [ARM] CSINC/CSINV patterns from CMOV
We sometimes end up generating CMOV with constant operands that can be
simplified to CSINC or CSINV under Arm-8.1m. This adds some simple
patterns for them.

Differential Revision: https://reviews.llvm.org/D114349
2021-11-27 20:21:41 +00:00
David Green 7d5d063c77 [ARM] Fold away unnecessary CSET/CMPZ
Codegen from expanded vector operations can end up with unnecessary
CMPZ/CSINC, of the form:
  CSXYZ A, B, C1 (CMPZ (CSINC 0, 0, C2, D), 0)

These can be converted to remove the CMPZ and CSINC, depending on the
condition.
  if C1==NE -> CSXYZ A, B, C2, D
  if C1==EQ -> CSXYZ A, B, NOT(C2), D

Differential Revision: https://reviews.llvm.org/D114013
2021-11-27 19:07:16 +00:00
Kazu Hirata 387927bbaf [Target] Use range-based for loops (NFC) 2021-11-26 21:21:17 -08:00
Kazu Hirata 562356d6e3 [Target] Use range-based for loops (NFC) 2021-11-26 08:23:01 -08:00
David Green c76d6dd192 [ARM] Generate VCTP from SETCC
This converts a vector SETCC([0,1,2,..], splat(n), ult) to vctp n, which
can be fewer instructions and prevent the need for constant pool loads.

Differential Revision: https://reviews.llvm.org/D114177
2021-11-26 10:57:14 +00:00
David Green fbb61adb70 [ARM] Convert fptoi.sat to fixed point multiply
This is a very small addition to the existing MVE fixed point vcvt code
to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which
should be equally valid for native saturating converts under MVE.

Differential Revision: https://reviews.llvm.org/D114360
2021-11-25 15:43:45 +00:00
Simon Pilgrim 63b1e58f07 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-25 11:14:15 +00:00
Benjamin Kramer d32787230d Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl"
This reverts commit 3cf4a2c620.

It makes llc hang on the following test case.
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 {
entry:
  br label %while.body117.i

while.body117.i:                                  ; preds = %cleanup149.i, %entry
  %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ]
  %0 = load i16, i16* undef, align 2
  %1 = icmp eq i16 undef, -10240
  br i1 %1, label %fail.i, label %cleanup149.i

cleanup149.i:                                     ; preds = %while.body117.i
  %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2
  store i16 %or130.i, i16* %out.6269.i, align 2
  br label %while.body117.i

fail.i:                                           ; preds = %while.body117.i
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #1

attributes #0 = { "target-features"="+neon,+v8a" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
```
2021-11-24 14:42:54 +01:00
Simon Pilgrim 3cf4a2c620 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-24 11:28:35 +00:00
David Green 581f837355 [ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x)
This is similar to D113574, but as a DAG combine, not tablegen patterns.
Doing the fold as a DAG combine allows the fadd to be folded with a
fmul, finally producing a predicated vfma. It performs the same fold of
fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as
the identity value of a fadd.

Differential Revision: https://reviews.llvm.org/D113584
2021-11-24 10:41:00 +00:00
David Green d9af9c2c5a [ARM] Fold floating point select(binop) patterns
Similar to D84091 which added extra predicated folds for integer operations
using the identity element of the operation, this adds them for floating
point operations for the form `BinOp(x, select(p, y, Identity))`. They are
folded back to predicated versions of the operator, with fadd having the
identity -0.0, fsub using the identity 0.0 and fmul using 1.0.

Differential Revision: https://reviews.llvm.org/D113574
2021-11-24 10:22:20 +00:00
Kazu Hirata d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Kazu Hirata fc981cedea [llvm] Use range-based for loops (NFC) 2021-11-21 10:36:18 -08:00
Quinn Pham 3f3bee42d2 [NFC][llvm] Inclusive language: remove instance of master from Thumb2SizeReduction.cpp
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `Thumb2SizeReduction.cpp`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114196
2021-11-19 16:07:58 -06:00
Mubashar Ahmad 8e47b83ec9 [AArch64][ARM] Enablement of Cortex-A710 Support
Phabricator review: https://reviews.llvm.org/D113256
2021-11-18 10:58:05 +00:00
Zarko Todorovski 5b8bbbecfa [NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity
test`.
2021-11-17 21:59:00 -05:00
Jay Foad 3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
David Green 4c3bfdc7f1 [ARM] Fix GatherScatter AddLikeOr condition 2021-11-15 09:44:41 +00:00
Kazu Hirata d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Kazu Hirata efa896e5f7 [Target] Use SDNode::uses (NFC) 2021-11-12 21:23:04 -08:00
Kazu Hirata 2ca45adf24 [CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC) 2021-11-11 22:28:55 -08:00
Benjamin Kramer 194897eccf [ARM] Fix unused variable warning in Release builds 2021-11-09 18:55:52 +01:00
Ard Biesheuvel a19da876ab [ARM] implement support for TLS register based stack protector
Implement support for loading the stack canary from a memory location held in
the TLS register, with an optional offset applied. This is used by the Linux
kernel to implement per-task stack canaries, which is impossible on SMP systems
when using a global variable for the stack canary.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112768
2021-11-09 18:19:47 +01:00
Ard Biesheuvel 2caf85ad7a [ARM] implement LOAD_STACK_GUARD for remaining targets
Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and
other targets rely on the generic support which may result in spilling of the
stack canary value or address, or may cause it to be kept in a callee save
register across function calls, which means they essentially get spilled as
well, only by the callee when it wants to free up this register.

So let's implement LOAD_STACK GUARD for other targets as well. This ensures
that the load of the stack canary is rematerialized fully in the epilogue.

This code was split off from

  D112768: [ARM] implement support for TLS register based stack protector

for which it is a prerequisite.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112811
2021-11-08 22:59:15 +01:00
Kazu Hirata 41ef3187e0 [ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC) 2021-11-07 09:53:16 -08:00
David Green 091244023a [ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.

This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.

Differential Revision: https://reviews.llvm.org/D113094
2021-11-04 18:42:12 +00:00
David Green 3bc586b9aa [ARM] Treat MVE gather add-like-or's like adds
LLVM has the habit of turning adds with no common bits set into ors,
which means we need to detect them and treat them like adds again in the
MVE gather/scatter lowering pass.

Differential Revision: https://reviews.llvm.org/D112922
2021-11-03 11:41:06 +00:00
David Green d36dd1f842 [ARM] Push gather/scatter shl index updates out of loops
This teaches the MVE gather scatter lowering pass that SHL is
essentially the same as Mul, where we are able to optimize the
induction of a gather/scatter address by pushing them out of loops.
https://alive2.llvm.org/ce/z/wG4VyT

Differential Revision: https://reviews.llvm.org/D112920
2021-11-03 11:00:05 +00:00
Yi Kong 803d4f8a35 [ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested
Also fixed formatting in AsmMatcherEmitter because it was confusing.

Differential Revision: https://reviews.llvm.org/D112993
2021-11-03 17:18:04 +08:00
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Daniel Kiss 66e03db814 Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume.""
This reverts commit b6420e575f.
2021-10-28 17:24:53 +02:00
Daniel Kiss b6420e575f Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 16:49:19 +02:00
Ard Biesheuvel d7e089f2d6 [ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed
In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register.

Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead.

This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,.

Reviewed By: nickdesaulniers, peter.smith

Differential Revision: https://reviews.llvm.org/D112600
2021-10-27 16:42:11 -07:00
Daniel Kiss 894ddba1c9 Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This reverts commit da1d1a0869.
2021-10-27 14:29:35 +02:00
Daniel Kiss da1d1a0869 [ARM] __cxa_end_cleanup should be called instead of _UnwindResume.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-27 10:40:00 +02:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Alexandros Lamprineas 04dc68710a [DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals
When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404
2021-10-18 21:29:46 +01:00
John Brawn 082fa56819 [ARM] Fix MOVCC peephole to not use an incorrect register class
The MOVCC peephole eliminates a MOVCC by making one of its inputs a
conditional instruction, but when doing this it should be using both
inputs of the MOVCC to decide on the register class to use as
otherwise we can get an error when using -verify-machineinstrs.

Differential Revision: https://reviews.llvm.org/D111714
2021-10-15 10:54:26 +01:00
Andrew Savonichev dc8a41de34 [ARM] Simplify address calculation for NEON load/store
The patch attempts to optimize a sequence of SIMD loads from the same
base pointer:

    %0 = gep float*, float* base, i32 4
    %1 = bitcast float* %0 to <4 x float>*
    %2 = load <4 x float>, <4 x float>* %1
    ...
    %n1 = gep float*, float* base, i32 N
    %n2 = bitcast float* %n1 to <4 x float>*
    %n3 = load <4 x float>, <4 x float>* %n2

For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16].
However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so
the address is computed before every ld/st instruction:

    add r2, r0, #32
    add r0, r0, #16
    vld1.32 {d18, d19}, [r2]
    vld1.32 {d22, d23}, [r0]

This can be improved by computing address for the first load, and then
using a post-indexed form of VLD1/VST1 to load the rest:

    add r0, r0, #16
    vld1.32 {d18, d19}, [r0]!
    vld1.32 {d22, d23}, [r0]

In order to do that, the patch adds more patterns to DAGCombine:

  - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1
    and inc2 are constants.

  - (or ptr inc) is now recognized as a pointer increment if ptr is
    sufficiently aligned.

In addition to that, we now search for all possible base updates and
then pick the best one.

Differential Revision: https://reviews.llvm.org/D108988
2021-10-14 15:23:10 +03:00
David Green 860b4479dc [ARM] Be more explicit about disabling CombineBaseUpdate for MVE.
This shouldn't be called for non-neon targets at the moment in either
case, but it is good to be expliit about the CombineBaseUpdate being a
NEON function, not expecting to be run under MVE.
2021-10-11 21:51:45 +01:00
Victor Campos 3550e242fa [Clang][ARM][AArch64] Add support for Armv9-A, Armv9.1-A and Armv9.2-A
armv9-a, armv9.1-a and armv9.2-a can be targeted using the -march option
both in ARM and AArch64.

 - Armv9-A maps to Armv8.5-A.
 - Armv9.1-A maps to Armv8.6-A.
 - Armv9.2-A maps to Armv8.7-A.
 - The SVE2 extension is enabled by default on these architectures.
 - The cryptographic extensions are disabled by default on these
 architectures.

The Armv9-A architecture is described in the Arm® Architecture Reference
Manual Supplement Armv9, for Armv9-A architecture profile
(https://developer.arm.com/documentation/ddi0608/latest).

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109517
2021-10-11 17:44:09 +01:00
Reid Kleckner b3a6d096d7 Fix shlib builds for all lib/Target/*/TargetInfo libs
They all must depend on MC now that the target registry is in MC.
Also fix llvm-cxxdump
2021-10-08 15:21:13 -07:00
Reid Kleckner 89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Michael Forster 53801a59eb Fix two unused-variable warnings. 2021-10-07 15:17:58 +02:00
David Green 73346f5848 [ARM] Introduce a MQPRCopy
Currently when creating tail predicated loops, we need to validate that
all the live-outs of a loop will be equivalent with and without tail
predication, and if they are not we cannot legally create a
tail-predicated loop, leaving expensive vctp and vpst instructions in
the loop. These notably can include register-allocation instructions
like stack loads and stores, and copys lowered from COPYs to MVE_VORRs.

Instead of trying to prove this is valid late in the pipeline, this
patch introduces a MQPRCopy pseudo instruction that COPY is lowered to.
This can then either be converted to a MVE_VORR where possible, or to a
couple of VMOVD instructions if not. This way they do not behave
differently within and outside of tail-predications regions, and we can
know by construction that they are always valid. The idea is that we can
do the same with stack load and stores, converting them to VLDR/VSTR or
VLDM/VSTM where required to prove tail predication is always valid.

This does unfortunately mean inserting multiple VMOVD instructions,
instead of a single MVE_VORR, but my experiments show it to be an
improvement in general.

Differential Revision: https://reviews.llvm.org/D111048
2021-10-07 12:52:12 +01:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
Pengxuan Zheng b0045f5595 [ARM] Fix a bug in finding a pair of extracts to create VMOVRRD
D100244 missed a check on the ResNo of the extract's operand 0 when finding a
pair of extracts to combine into a VMOVRRD (extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2)). As a result, it can incorrectly pair an extract(x, n)
with another extract(x:3, n+1) for example. This patch fixes the bug by adding
the proper check on ResNo.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111188
2021-10-06 10:03:32 -07:00
Amara Emerson 8bde5e58c0 Delay outgoing register assignments to last.
The delayed stack protector feature which is currently used for SDAG (and thus
allows for more commonly generating tail calls) depends on being able to extract
the tail call into a separate return block. To do this it also has to extract
the vreg->physreg copies that set up the call's arguments, since if it doesn't
then the call inst ends up using undefined physregs in it's new spliced block.

SelectionDAG implementations can do this because they delay emitting register
copies until  *after* the stack arguments are set up. GISel however just
processes and emits the arguments in IR order, so stack arguments always end up
last, and thus this breaks the code that looks for any register arg copies that
precede the call instruction.

This patch adds a thunk argument to the assignValueToReg() and custom assignment
hooks. For outgoing arguments, register assignments use this return param to
return a thunk that does the actual generating of the copies. We collect these
until all the outgoing stack assignments have been done and then execute them,
so that the copies (and perhaps some artifacts like G_SEXTs) are placed after
any stores.

Differential Revision: https://reviews.llvm.org/D110610
2021-10-04 12:33:20 -07:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
David Green 20b1a16a69 [ARM] Mark <= -1 immediate constant as cheap
A <= -1 constant on a compare can be converted to a < 0 operation, which
is usually cheap. If we mark the constant as cheap, preventing hoisting,
we allow that fold to happen even across different blocks.

Differential Revision: https://reviews.llvm.org/D109360
2021-10-03 19:30:08 +01:00
Kazu Hirata c1e32b3fc0 [Target] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-02 12:06:29 -07:00
David Green f9aa8623fe [ARM] Add more MVE intrinsics to sink splats to
This adds a few more unpredicated intrinsics to sink splats to, in order
to create more qr instruction variants. Notably this includes
saddsat/uaddsat but also some of the unpredicated mve intrinsics.

Differential Revision: https://reviews.llvm.org/D110333
2021-09-30 14:41:23 +01:00
David Green fdd8c10959 [ARM] Delay reverting WLS in arm-block-placement
As we have to split blocks, we may be left in an invalid loop state
after a WLS is reverted to a DLS. Instead remember the WLS that could
not be fixed and revert them after finishing processing all other loops.

Differential Revision: https://reviews.llvm.org/D110567
2021-09-28 15:38:29 +01:00
David Green 2c53215e99 [ARM] Skip debug info in recomputeVPTBlockMask
The ARMLowOverheadLoops pass recalculates VPT block masks when it
converts VCMP's inside VPT blocks into VPT's. The function to do so
doesn't seem to handle debug info though, leading to invalid block
creation or asserts at compile time. Make sure the function skips any
debug info between the MVE instructions it inspects.

Differential Revision: https://reviews.llvm.org/D110564
2021-09-28 14:58:13 +01:00
David Green bb2d23dcd4 [ARM] Improve detection of fallthough when aligning blocks
We align non-fallthrough branches under Cortex-M at O3 to lead to fewer
instruction fetches. This improves that for the block after a LE or
LETP. These blocks will still have terminating branches until the
LowOverheadLoops pass is run (as they are not handled by analyzeBranch,
the branch is not removed until later), so canFallThrough will return
false. These extra branches will eventually be removed, leaving a
fallthrough, so treat them as such and don't add unnecessary alignments.

Differential Revision: https://reviews.llvm.org/D107810
2021-09-27 11:21:21 +01:00
David Green 883758ed48 [ARM] Fix Arm block placement creating branches after jump tables.
Given:
 - A jump table
 - Which jumps to the next block
 - The next block ends in a WLS
 - Where the WLS conditionally jumps to block earlier in the program.

The Arm block placement pass would attempt to move the block containing
the WLS earlier, as the WLS instruction can only branch forward. In
doing so it would add a branch from the jumptable block to the WLS
block, thinking it previously fell-through.

This in itself would be fine, if a little inefficient, but the constant
island pass expects all instructions after a jump-table branch to have
been removed by analyzeBranch. So it gets confused and can assign the
same labels to multiple jump table blocks.

I've changed the condition to the same as used in analyzeBranch.
2021-09-25 11:32:25 +01:00
Jay Foad 6cef28ed2d [TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229
2021-09-23 08:58:46 +01:00
Simon Pilgrim b1f38a27f0 [Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides
Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible.

This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up.

Differential Revision: https://reviews.llvm.org/D110242
2021-09-22 15:28:08 +01:00
David Green 02cd8a6b91 [ARM] Allow smaller VMOVL in tail predicated loops
This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.

For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.

Differential Revision: https://reviews.llvm.org/D109706
2021-09-22 12:07:52 +01:00
Kazu Hirata 85b4b21c8b [llvm] Use make_early_inc_range (NFC) 2021-09-20 19:30:02 -07:00
David Green 3f90df22f1 [ARM] MVE reverse shuffles.
The vectorizer can sometimes make reverse shuffles from indices that
count down. In MVE, we don't have a 128bit rev instruction, but we can
select this to a VREV64 with some lane movs to swap the two halfs.

Ideally this would use VMOVD's, but only gets as far as VMOVS's at the
moment.

Differential Revision: https://reviews.llvm.org/D69510
2021-09-20 13:48:01 +01:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
David Green 1da52ef294 [ARM] Add VGETLANEu patterns for v4f16 and v8f16
These were apparently missing, having no pattern that could convert a
VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the
same code.
2021-09-19 14:25:21 +01:00
David Green cb5e3f7959 [ARM] Prevent large integer VQDMULH pattern crashes
Put a limit on the size of constant integers we test when looking for
VQDMULH, to prevent it from crashing from values more than 64bits.
2021-09-18 18:47:02 +01:00
Alexandros Lamprineas 1bd5ea968e [ARM] Mitigate the cve-2021-35465 security vulnurability.
Recently a vulnerability issue is found in the implementation of VLLDM
instruction in the Arm Cortex-M33, Cortex-M35P and Cortex-M55. If the
VLLDM instruction is abandoned due to an exception when it is partially
completed, it is possible for subsequent non-secure handler to access
and modify the partial restored register values. This vulnerability is
identified as CVE-2021-35465.

The mitigation sequence varies between v8-m and v8.1-m as follows:

v8-m.main
---------
mrs        r5, control
tst        r5, #8       /* CONTROL_S.SFPA */
it         ne
.inst.w    0xeeb00a40   /* vmovne s0, s0 */
1:
vlldm      sp           /* Lazy restore of d0-d16 and FPSCR. */

v8.1-m.main
-----------
vscclrm    {vpr}        /* Clear VPR. */
vlldm      sp           /* Lazy restore of d0-d16 and FPSCR. */

More details on
developer.arm.com/support/arm-security-updates/vlldm-instruction-security-vulnerability

Differential Revision: https://reviews.llvm.org/D109157
2021-09-16 12:56:43 +01:00
Alexandros Lamprineas 61f25daa8d [ARM][CMSE] Clear the secure fp-registers when using softfp abi.
When expanding the non-secure call instruction we are emiting code
to clear the secure floating-point registers only if the targeted
architecture has floating-point support. The potential problem is
when the source code containing non-secure calls are built with
-mfloat-abi=soft but some other part of the system has been built
with -mfloat-abi=softfp (soft and softfp are compatible as they use
the same procedure calling standard). In this case floating-point
registers could leak to non-secure state as the non-secure won't
have cleared them assuming no floating point has been used.

Differential Revision: https://reviews.llvm.org/D109153
2021-09-16 12:56:43 +01:00
Martin Storsjö b33a43e57c [ARM] Move fetching of ARMSubtarget into the scopes that need it. NFC.
This was requested in D38253, but missed back then.

Differential Revision: https://reviews.llvm.org/D109046
2021-09-15 15:03:20 +03:00
David Green a2332d5332 [ARM] Prevent continuous folding of SUBC
Under some situations under Thumb1, we could be stuck in an infinite
loop recombining the same instruction. This puts a limit on that, not
combining SUBC with SUBE repeatedly.
2021-09-15 11:23:32 +01:00
David Green 5a6dfbb8cd [ARM] Teach DemandedVectorElts about VMOVN lanes
The class of instructions that write to narrow top/bottom lanes only
demand the even or odd elements of the input lanes. Which means that a
pair of VMOVNT; VMOVNB demands no lanes from the original input. This
teaches that to instcombine from the target hooks available through
ARMTTIImpl.

Differential Revision: https://reviews.llvm.org/D109325
2021-09-14 11:05:31 +01:00
Nikita Popov 45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Kazu Hirata c9fca53af1 [CodeGen, Target] Use pred_empty and succ_empty (NFC) 2021-09-10 11:11:31 -07:00
David Green deefeffb5d [ARM] Remove unused tblgen arguments. NFC
As per D109359, this removes or makes use of some of the existing unused
NEON and base ARM tblgn arguments.
2021-09-10 18:03:54 +01:00
David Green 6b7cdb40da [ARM] Remove unused tblgen arguments. NFCI
As per D109359, this removes or makes use of some of the existing unused
MVE tblgn arguments.
2021-09-10 15:06:31 +01:00
Florian Hahn 5d1a6d0d1a
[ARM] Remove unnecessary use of replaceSymbolicStrideSCEV (NFC).
When passing an empty strides map, there's nothing to replace for
replaceSymbolicStrideSCEV and it just returns the SCEV for Ptr. There
should be no need to call the function.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109462
2021-09-10 10:47:26 +02:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Chris Lattner d51da74889 [CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC. 2021-09-09 10:22:51 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Peter Smith e63455d5e0 [MC] Use local MCSubtargetInfo in writeNops
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.

On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.

For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.

This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.

I've attempted to take into account the in tree experimental backends.

Differential Revision: https://reviews.llvm.org/D45962
2021-09-07 15:46:19 +01:00
Peter Smith 5e71839f77 [MC] Add MCSubtargetInfo to MCAlignFragment
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961
2021-09-07 15:46:19 +01:00
Ben Shi 63ca9371c7 [ARM] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding in DAGCombine if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109124
2021-09-07 15:42:43 +08:00
David Green adfd12e6d1 [ARM] Add patterns for store(fptosisat(..))
As an extension to D107866, this adds store(fptosisat(..)) patterns,
similar to the existing fptosi patterns, to prevent unnecessarily moving
into gpr regs where we can use fp stores directly.

Differential Revision: https://reviews.llvm.org/D108378
2021-09-03 19:22:11 +01:00
David Green f37e132263 [ARM] Add VFP lowering for fptosi.sat
This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat
and llvm.fptoui.sat to VCVT instructions that inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107866
2021-09-03 18:11:08 +01:00
David Green 9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
David Green 49476a4d66 [ARM] Add MVE lowering for fptosi.sat
This adds lowering of the llvm.fptosi.sat and llvm.fptoui.sat intinsics,
selecting a VCVT instruction which under MVE will inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107865
2021-09-01 22:38:47 +01:00
Philip Reames 29fa37ec9f [SCEV] If max BTC is zero, then so is the exact BTC [2 of 2]
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch.

The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll

Differential Revision: https://reviews.llvm.org/D109015
2021-09-01 11:51:48 -07:00
David Green 22c384129e [ARM] Add missing validForTailPredication for VMINNM/VMAXNM
Apparently this was missing, preventing the generation of tail
predication loops containing VMINNM, VMAXNM, VMINNMA and VMAXNMA.
2021-08-31 18:19:03 +01:00
Simon Wallis f417b660ee [Arm] Add assert in T2 Imm7s code emitter
Add assert to provoke failure in object file output, not just in disassembly output.

Reviewed By: yroux

Differential Revision: https://reviews.llvm.org/D107259
2021-08-31 08:16:48 +01:00