Commit Graph

68777 Commits

Author SHA1 Message Date
David Green 993b203b6a [AArch64] Sink splat(s/zext(..)) to uses
If the Shuffle is a splat and the operand is a zext/sext, sinking the
operand and the s/zext can help create indexed s/umull. This is
especially useful to prevent i64 mul being scalarized.

Differential Revision: https://reviews.llvm.org/D133355
2022-09-13 15:47:41 +01:00
David Spickett 0b8a44388e [llvm][AArch64] Explain why certain registers are reserved on Arm64EC
This extends 4658366d95 to add a note
explaining why the register is reserved.

note: x13 is clobbered by asynchronous signals when using Arm64EC.

I've added testing for w/x registers and v/q/s/d and h floating point
registers.

llvm will accept, but silently do nothing with, b registers. So they
are not tested here (clang rejects them so at least for C you're safe anyway).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D133701
2022-09-13 10:13:06 +00:00
Dmitry Preobrazhensky a80116efec [AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions
Differential Revision: https://reviews.llvm.org/D133608
2022-09-13 12:41:39 +03:00
Dmitry Preobrazhensky 815ba49068 [AMDGPU][MC] Add detection of mandatory literals in parser
Differential Revision: https://reviews.llvm.org/D133606
2022-09-13 12:37:30 +03:00
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
Zi Xuan Wu (Zeson) 955e6ac499 [CSKY] Fix the Predicates of instruction selection
Some select node Pattern with register cmp instruction should be guarded
by iHas2E3.
2022-09-13 15:02:22 +08:00
Haojian Wu 7ed68182d7 Fix a -Wswitch warning. 2022-09-13 08:57:43 +02:00
jacquesguan b98b4fae75 [RISCV] Add cost model for compare and select instructions.
This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132296
2022-09-13 14:44:46 +08:00
Yeting Kuo 5fcb5d7759 [RISCV] Add assertion of hasVecPolicyOp to catch masked intrinsic without policy operand.
The original code may have incorrect result if there is a masked instruction
without policy operand to make us set its policy to TUMU. The patch adds an
assertion to catch the instruction.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133302
2022-09-13 10:09:49 +08:00
gonglingqin 3de3439bd7 [LoongArch] Add codegen support for ISD::FMA
Differential Revision: https://reviews.llvm.org/D133281
2022-09-13 10:04:41 +08:00
David Majnemer ab56719acd [clang, llvm] Add __declspec(safebuffers), support it in CodeView
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)).  This information is recorded in
CodeView.

While we are here, add support for strict_gs_check.
2022-09-12 21:15:34 +00:00
Kazu Hirata 9606608474 [llvm] Use x.empty() instead of llvm::empty(x) (NFC)
I'm planning to deprecate and eventually remove llvm::empty.

I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty().  That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.

Differential Revision: https://reviews.llvm.org/D133677
2022-09-12 13:34:35 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Craig Topper d49280e0a4 [RISCV] Rename WriteFALU* and ReadFALU* to WriteFAdd*/ReadFAdd*.
ALU seems a little vague. FAdd felt more precise even though it
also include FSUB instructions.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D133632
2022-09-12 09:37:28 -07:00
Craig Topper 4186a49d79 [RISCV] Custom type legalize i32 loads by sign extending.
The default is to use extload which can become a zextload or
sextload if it is followed by an 'and' or sext_inreg.

Sometimes type legalization will introduce an 'and' from promoting
something like 'srl X, C' and a sext_inreg from from a setcc. The
'and' could be freely folded with the promoted 'srl' by using srliw,
but the sext_inreg can't be folded into a compare. DAG combiner
will see both of these choices and may decide to fold the 'and'
instead of the 'sext_inreg'. This forces the sext_inreg to become
a sext.w.

By picking sextload in the type legalizer we take this choice away.
Looking at spec2006 compiled with Zba and Zbb this appeared to be
net reduction in lines of code in the objdump disassembly output.

This is similar to what we do with i32 add/sub/mul/shl in
type legalization where we always emit a sext_inreg.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130397
2022-09-12 09:13:07 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Simon Pilgrim 20ad05f9b4 [CostModel][X86] Add CostKinds handling for abs ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695
2022-09-12 16:34:37 +01:00
Matt Arsenault 7834194837 TableGen: Introduce generated getSubRegisterClass function
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
2022-09-12 09:03:37 -04:00
Sander de Smalen cf72dddaef [AArch64][SME] Add utility class for handling SME attributes.
This patch adds a utility class that will be used in subsequent patches
for parsing the function/callsite attributes and determining whether
changes to PSTATE.SM are needed, or whether a lazy-save mechanism is
required.

It also implements some of the restrictions on the SME attributes
in the IR Verifier pass.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: david-arm, aemerson

Differential Revision: https://reviews.llvm.org/D131570
2022-09-12 12:41:30 +00:00
Simon Pilgrim bd0109f392 [CostModel][X86] Move AVX512/AVX2 uniform shift costs into the generic uniform cost tables
They shouldn't be happening after XOP shift costs - AVX2 shift supports takes preference over XOP for everything but vXi8 shifts - the improvement is pretty limited as it only affects bdver4 targets but it does help clean up a fraction of the messy shift cost logic....
2022-09-12 12:08:42 +01:00
David Spickett 739b69e655 [LLVM][AArch64] Explain that X19 is used as the frame base pointer register
Fixes #50098

LLVM uses X19 as the frame base pointer, if it needs to. Meaning you
can get warnings if you clobber that with inline asm.

However, it doesn't explain why. The frame base register is not part
of the ABI so it's pretty confusing why you get that warning out of the blue.

This adds a method to explain a reserved register with X19 as the first one.
The logic is the same as getReservedRegs.

I could have added a return parameter to isASMClobberable and friends
but found that there's a lot of things that call isReservedReg in various
ways.

So while one more method on the pile isn't great design, it is simpler
right now to do it this way and only pay the cost if you are actually using
a reserved register.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D133213
2022-09-12 09:18:09 +00:00
Johannes Doerfert c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert 4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Simon Pilgrim a931dbfbd3 [CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table
We only need to handle the uniform cases early
2022-09-10 18:18:42 +01:00
Simon Pilgrim 10edf88458 [CostModel][X86] Update CTPOP costs
With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont
2022-09-10 17:57:20 +01:00
Simon Pilgrim 4994f87ca1 [X86] Fix bdver2 128-bit shuffles throughputs
Noticed while trying to get vector ctpop/ctlz/cttz costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2

Matches AMD 15h SoG, Agner and instlatx64
2022-09-10 17:34:40 +01:00
Simon Pilgrim 7785bd34e7 [X86] Fix bdver2 128-bit ALU/logic/shift throughputs
Noticed while trying to get vector shifts costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2

Matches AMD 15h SoG, Agner and instlatx64
2022-09-10 16:23:29 +01:00
Mingming Liu 8aa800614b [AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR.
Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers.

See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion.

Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g.,
returning zero cost if type is legalized to scalar).

[1] 2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)
[2] 2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)

Differential Revision: https://reviews.llvm.org/D128302
2022-09-09 09:47:30 -07:00
zhongyunde 7a81782585 [AArch64][CodeGen]Fold the mov and lsl into ubfiz
Fix the issue exposed by D132322, depand on D132939
Reviewed By: efriedma, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D132325
2022-09-09 23:50:29 +08:00
Jay Foad 8901f7cebc [AMDGPU] Fix crash legalizing G_EXTRACT_VECTOR_ELT with negative index
Fixes https://github.com/llvm/llvm-project/issues/57408

Differential Revision: https://reviews.llvm.org/D132938
2022-09-09 15:53:34 +01:00
Simon Pilgrim 05f56f10ed [X86] Fix VPPERM load folding latency
Noticed while investigating BITREVERSE cost numbers with the D103695 script - VPPERM folded loads was using the WriteVarShuffleX defaults and was missing an override like the VPPERM reg-reg variants
2022-09-09 13:57:39 +01:00
Dmitry Preobrazhensky 6b79610fd5 [AMDGPU][MC][GFX11][NFC] Correct VOPD parsing
Differential Revision: https://reviews.llvm.org/D133492
2022-09-09 13:03:29 +03:00
Simon Pilgrim 55b78e28d8 [CostModel][X86] Add missing i8 throughput cost 2022-09-09 10:58:51 +01:00
gonglingqin da8c9521ee [LoongArch] Add codegen support for frint
According to the revised description in `LoongArch Reference Manual v1.02`,
frint.[s/d] does not judge whether floating-point inexact exceptions are
allowed indicated by FCSR, i.e. always executes roundToIntegralExact(x).
What's more, the manual also specifically defines that frint.s/d is only
necessary to be defined in LA64. So ISD::FRINT is legal for LA64.

Differential Revision: https://reviews.llvm.org/D133337
2022-09-09 14:25:34 +08:00
Sheng 88bdc4687d [NFC][M68k] Correct debug message. 2022-09-09 10:58:37 +08:00
Alex Bradbury 51ae462447 [RISCV] Add the GlobalMerge pass (disabled by default)
Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//.

Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag.

Reviewed By: craig.topper, reames, hiraditya

Differential Revision: https://reviews.llvm.org/D130481
2022-09-08 18:40:38 -07:00
Fangrui Song f9b5924975 [AArch64] Fix -Wunused-variable. NFC 2022-09-08 18:27:16 -07:00
zhongyunde b6655333c2 [Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero
Restrict the 32-bit form of an instruction of integer as too many test cases
will be clobber as the register number updated.
    From %reg = INSERT_SUBREG %reg, %subreg, subidx
    To   %reg:subidx =  SUBREG_TO_REG 0, %subreg, subidx
Try to prefix the redundant mov instruction at D132325 as the SUBREG_TO_REG should not generate code.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D132939
2022-09-09 09:00:54 +08:00
Craig Topper 5f3a8b585b [RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133511
2022-09-08 12:34:59 -07:00
David Green 3875c38adf [AArch64] Fix formatting of the Shuffle Cost tables. NFC 2022-09-08 19:54:12 +01:00
Jay Foad afa0ed33df [AMDGPU] Fix shrinking of F16 FMA on newer subtargets
D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in
SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have
a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of
the opcode which is used on GFX9+.

Differential Revision: https://reviews.llvm.org/D133489
2022-09-08 16:41:04 +01:00
Jonas Paulsson de0e3117d4 [SystemZ] Improve handling of vector alignments.
Make the DataLayout string always hold a vector alignment of 8 bytes,
regardless of the vector ABI. This makes the datalayout depend only on the
target triple which is the general expectation (in assertions).

On older architectures where vectors use the natural alignment (16 bytes),
the front end will maintain the same behavior and produce an overalignment
compared to the datalayout.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D131158
2022-09-08 17:33:05 +02:00
Thomas Lively ac3b8df8f2 [WebAssembly] Prototype `f32x4.relaxed_dot_bf16x8_add_f32`
As proposed in https://github.com/WebAssembly/relaxed-simd/issues/77. Only an
LLVM intrinsic and a clang builtin are implemented. Since there is no bfloat16
type, use u16 to represent the bfloats in the builtin function arguments.

Differential Revision: https://reviews.llvm.org/D133428
2022-09-08 08:07:49 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Krzysztof Parzyszek 3c817574c2 [Hexagon] Handle shifts of short vectors of i8 2022-09-08 07:52:16 -07:00
Ivan Kosarev 57c943d581 [AMDGPU] Only raise wave priority if there is a long enough sequence of VALU instructions.
Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D124671
2022-09-08 15:21:30 +01:00
liqinweng 723245bfac [AARCH64][COST] Improve cost of reverse shuffles for AArch64
Update the comments for reverse shuffles and add tests

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132730
2022-09-08 18:55:49 +08:00
liqinweng 9b4e75ee76 [RISCV][COST] Add cost model for mask vector select instruction when its condition is a scalar type
Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D132992
2022-09-08 18:55:49 +08:00
David Spickett e428baf001 [LLVM][ARM] Remove options for armv2, 2A, 3 and 3M
Fixes #57486

These pre v4 architectures are not specifically supported
by codegen. As demonstrated in the linked issue.

GCC has not supported 3M since GCC 9 and presumably
2 and 2A earlier than that. So we are aligned in that sense.

(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add)

This removes the options and associated testing.

The Pre_v4 build attribute remains mainly because its absence
would be more confusing. It will not be used other than to
complete the list of build attributes as shown in the ABI.

https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes

Reviewed By: nickdesaulniers, peter.smith, rengolin

Differential Revision: https://reviews.llvm.org/D133109
2022-09-08 09:49:48 +00:00
gonglingqin d5f7a2182d [LoongArch] Add codegen support for atomicrmw xchg operation on LA32
Depends on D131228

Differential Revision: https://reviews.llvm.org/D131229
2022-09-08 13:57:53 +08:00