Commit Graph

11887 Commits

Author SHA1 Message Date
David Green de6dfbbb30 [ARM] Fix for MVE i128 vector icmp costs.
We were hitting an assert as the legalied type needn't be a vector.

Fixes #58364
2022-10-14 18:49:25 +01:00
Archibald Elliott 7d15212b8c [ARM] Support fp16/bf16 using w constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "w"
constraint, which means "VFP floating-point registers d0-d31" - fp16 and
bf16 values are stored in S registers (which alias the D registers).

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 'w' constraint.

Differential Revision: https://reviews.llvm.org/D135662
2022-10-13 10:32:06 +01:00
David Green deb8f8ab17 [ARM] Add errors for MVE exclusive registers.
These instructions already had errors for operands that could not share
the same register:
  VCMUL, VMULL, VQDMULL.
This extends that to a few others:
  VREV64, VQDMULLqr, VCADD and VHCADD.
Only the i32 types require the error.

Differential Revision: https://reviews.llvm.org/D135560
2022-10-10 22:06:35 +01:00
David Green 5e1a9d319d [ARM] Add lowering for bf16 neon vtrn, vzup and vuzp.
These go via Dag2Dag, which are better based on element sizes not the
exact element types.
2022-10-02 15:34:37 +01:00
David Green f2fde99461 [ARM] More bf16 shuffle handling, including perfect shuffles. 2022-10-02 14:31:51 +01:00
David Green 8193f0d1d2 [ARM] Add tablegen patterns for bf16 vrev 2022-10-02 13:42:14 +01:00
David Green 58369c8631 [ARM] Add tablegen patterns for bf16 vext
This adds missing tablegen patterns for VEXT, identical to the fp16
patterns as they only use baseline Neon operations.
Part of fixing #57770.
2022-10-02 12:45:58 +01:00
Filipp Zhinkin 945a1468c9 [ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr
Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions.

Related issue: https://github.com/llvm/llvm-project/issues/57122

Reviewed By: efriedma, samtebbs

Differential Revision: https://reviews.llvm.org/D131786
2022-10-01 12:41:37 +03:00
David Green 4c4e544cd8 [ARM] Add an option for disabling omitting DLS.
Useful for testing, this option disables when `DLS lr, lr` gets removed.
2022-09-29 17:42:45 +01:00
Archibald Elliott ff4027d152 [ARM] Support fp16/bf16 using t constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "t"
constraint, which means "VFP floating-point registers s0-s31" - fp16 and
bf16 values are stored in S registers too.

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 't' constraint.

Fixes #57753

Differential Revision: https://reviews.llvm.org/D134553
2022-09-28 14:48:21 +01:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Momchil Velikov 6602110152 [ARM] Enable and/cmp0 folding
The `CodeGenPrepare` pass can sink bitwise `and` used by compare to
zero into the basic blocks where the users are. This operation is
guarded by lowering hook, which is disabled for ARM.  In the ARM
architecture versions from v7-M up these two operations can be folded
into `tst rN, #imm` instruction. Sinking of `and` can also enable
the cmov-to-bfi DAG combiner.

This patch fixes some benchmark regressions caused
by https://reviews.llvm.org/D129370 as well scoring slightly better overall.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D134360
2022-09-26 11:31:23 +01:00
James Y Knight a538d1f13a [TableGen][CodeEmitterGen] Allow local names for sub-operands in a operand list.
These names can then be matched by name against 'bits' fields in a
record, to populate an instruction's encoding.

This does _not_ yet change DecoderEmitter to allow by-name matching of
sub-operands. Unlike the encoder, the decoder already defaulted to not
supporting positional matching, and backends had workarounds in place
for the missing decoding support.

Additionally, use this new capability to allow the ARM and AArch64
backends not to require any positional operand matching.

Differential Revision: https://reviews.llvm.org/D131003
2022-09-24 09:40:44 -04:00
Kazu Hirata 20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
David Spickett e428baf001 [LLVM][ARM] Remove options for armv2, 2A, 3 and 3M
Fixes #57486

These pre v4 architectures are not specifically supported
by codegen. As demonstrated in the linked issue.

GCC has not supported 3M since GCC 9 and presumably
2 and 2A earlier than that. So we are aligned in that sense.

(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add)

This removes the options and associated testing.

The Pre_v4 build attribute remains mainly because its absence
would be more confusing. It will not be used other than to
complete the list of build attributes as shown in the ABI.

https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes

Reviewed By: nickdesaulniers, peter.smith, rengolin

Differential Revision: https://reviews.llvm.org/D133109
2022-09-08 09:49:48 +00:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
John Brawn e26cadcc32 [ARM] Constant pools need 4-byte alignment if we only have tADR
When the only ADR instruction we have is the 16-bit thumb one then all
constant pool entries need to be 4-byte aligned, as tADR has an offset
that's a multiple of 4.

It looks like previously there happened to be no situations in which
we encountered a constant pool entry with alignment less than 4, so
failing to do this didn't cause any problems, but the expansion of
cttz to a table added by D128911 does use a constant pool with
alignment 1, so we now need to handle it correctly.

Differential Revision: https://reviews.llvm.org/D133199
2022-09-06 11:36:12 +01:00
Vitaly Buka 6c52736e02 Revert "[llvm] Use range-based for loops (NFC)"
range-based loop should not be used here, as
fixupImmediateBr push_backs into the container.

http://lab.llvm.org/buildbot/#/builders/168
http://lab.llvm.org/buildbot/#/builders/74
http://lab.llvm.org/buildbot/#/builders/5
http://lab.llvm.org/buildbot/#/builders/239
http://lab.llvm.org/buildbot/#/builders/237
http://lab.llvm.org/buildbot/#/builders/236

This reverts commit fedc59734a.
2022-09-04 15:28:53 -07:00
Kazu Hirata fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Sam Clegg 92920c4fe3 [MC][WebAssembly] Allow accurate errors in doBeforeLabelEmit
Although we only currently have one error produced in this function I am
working on changes right now that add some more.  This change makes the
error location more accurate.

Differential Revision: https://reviews.llvm.org/D133016
2022-09-01 01:26:33 -07:00
Kazu Hirata 2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Kazu Hirata c63f823875 [llvm] Use range-based for loops (NFC) 2022-08-28 17:35:04 -07:00
Alex Richardson df00dac828 [ARM] Use getSymbolPreferLocal() in GetARMGVSymbol
This allows relaxing some relocations to symbol+offset instead of emitting
a relocation against a symbol.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131433
2022-08-26 09:34:06 +00:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Philip Reames df20ff9ae2 [TTI] Kill last couple uses of OperandValueKind in targets [nfc]
Use the accessor methods on the containing class instead so that we can change the representation.
2022-08-23 08:54:41 -07:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Alan Zhao 8c8cfaaf0a Revert "[ARM] Use getSymbolPreferLocal() in GetARMGVSymbol"
This reverts commit 6db15a82cc.

Reverted because this breaks offical Chrome builds targeting Android on
arm: https://crbug.com/1354305

Repro: https://drive.google.com/file/d/1pgQI2adwx3DJJqIYvMY4i249ouHU0rmu/view?usp=sharing
2022-08-22 16:16:37 -04:00
David Penry ced705c440 [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reapplication of D128941/(reversion:D132037) with small fix.

Differential Revision: https://reviews.llvm.org/D132170
2022-08-22 12:10:13 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
David Penry 1c9f0408bc Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules"
This reverts commit 8c4aea438c.

Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132037
2022-08-17 09:32:43 -07:00
David Penry 8c4aea438c [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128941
2022-08-17 08:13:26 -07:00
Victor Campos 784da8a722 [ARM] Simplify the creation of escaped build attribute values
There is an existing mechanism to escape strings, therefore the
functions created to escape Tag_also_compatible_with values are not
really needed. We can simply use the pre-existing utilities.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D131680
2022-08-16 11:49:33 +01:00
Kazu Hirata 6d9cd9199a Use llvm::all_of (NFC) 2022-08-14 16:25:36 -07:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
Pengxuan Zheng 9bb6622423 [ARM] Do not use LOAD_STACK_GUARD with ROPI/RWPI
ROPI/RWPI are not supported with LOAD_STACK_GUARD currently.

Reviewed By: nickdesaulniers, rengolin

Differential Revision: https://reviews.llvm.org/D131427
2022-08-09 14:59:08 -07:00
Alex Richardson 6db15a82cc [ARM] Use getSymbolPreferLocal() in GetARMGVSymbol
This allows relaxing some relocations to STT_SECTION symbol+offset
instead of emitting a relocation against a symbol.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131433
2022-08-09 09:53:47 +00:00
Alex Richardson 9a2b14afa0 [ARM] Emit local aliases (.Lfoo$local) for functions
ARMAsmPrinter::emitFunctionEntryLabel() was not calling the base class
function so the $local alias was not being emitted. This should not have
any function effect right now since ARM does not generate different code
for the $local symbols, but it could be improved in the future.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131392
2022-08-09 09:53:47 +00:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Simon Tatham 72017e9b16 [llvm-objdump,ARM] Fix big-endian AArch32 disassembly.
The ABI for big-endian AArch32, as specified by AAELF32, is above-
averagely complicated. Relocatable object files are expected to store
instruction encodings in byte order matching the ELF file's endianness
(so, big-endian for a BE ELF file). But executable images can
//either// do that //or// store instructions little-endian regardless
of data and ELF endianness (to support BE32 and BE8 platforms
respectively). They signal the latter by setting the EF_ARM_BE8 flag
in the ELF header.

(In the case of the Thumb instruction set, this all means that each
16-bit halfword of a Thumb instruction is stored in one or other
endianness. The two halfwords of a 32-bit Thumb instruction must
appear in the same order no matter what, because the first halfword is
the one that must avoid overlapping the encoding of any 16-bit Thumb
instruction.)

llvm-objdump was unconditionally expecting Arm instructions to be
stored little-endian. So it would correctly disassemble a BE8 image,
but if you gave it a BE32 image or a BE object file, it would retrieve
every instruction in byte-swapped form and disassemble it to
nonsense. (Even an object file output by LLVM itself, because
ARMMCCodeEmitter outputs instructions big-endian in big-endian mode,
which is correct for writing an object file.)

This patch allows llvm-objdump to correctly disassemble all three of
those classes of Arm ELF file. It does it by introducing a new
SubtargetFeature for big-endian instructions, setting it from the ELF
image type and flags during llvm-objdump setup, and teaching both
ARMDisassembler and llvm-objdump itself to pay attention to it when
retrieving instruction data from a section being disassembled.

Differential Revision: https://reviews.llvm.org/D130902
2022-08-08 10:49:51 +01:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00