Commit Graph

65097 Commits

Author SHA1 Message Date
James Farrell 5032467034 Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
This reverts commit 40d5eeac6c.

Differential Revision: https://reviews.llvm.org/D114885
2021-12-06 14:57:47 +00:00
Kazushi (Jam) Marukawa 9d20fa09eb [VE] Support VE specific data directives in MC
Support VE specific data directives, .word/.long/.llong, in MC layer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D115120
2021-12-06 20:07:44 +09:00
Ties Stuij 0fbb17458a [ARM] Implement setjmp BTI placement for PACBTI-M
This patch intends to guard indirect branches performed by longjmp
by inserting BTI instructions after calls to setjmp.

Calls with 'returns-twice' are lowered to a new pseudo-instruction
named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Alexandros Lamprineas
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112427
2021-12-06 11:07:10 +00:00
Kazushi (Jam) Marukawa 6b41eb7f26 [VE] Change to use R_VE_SREL32
Change to use R_VE_SREL32 for relative branch instructions instead of
R_VE_PC_LO32 in order to check ranges of relative branch isntructions
at link time correctly.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D115097
2021-12-06 20:06:37 +09:00
David Green d8495e0352 [ARM] Add a vrinta.f16.f16 alias
The v8.1-m ARMARM uses the vrinta.f16.f16 names, as opposed to
vrinta.f16. This adds an alias for it in the same way that we have for
f32 and f64.

Differential Revision: https://reviews.llvm.org/D68127
2021-12-06 11:06:25 +00:00
Paulo Matos a96d828510 [WebAssembly] Implementation of intrinsic for ref.null and HeapType removal
This patch implements the intrinsic for ref.null.
In the process of implementing int_wasm_ref_null_func() and
int_wasm_ref_null_extern() intrinsics, it removes the redundant
HeapType.

This also causes the textual assembler syntax for ref.null to
change. Instead of receiving an argument: `func` or `extern`, the
instruction mnemonic is either ref.null_func or ref.null_extern,
without the need for a further operand.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D114979
2021-12-06 09:46:15 +01:00
Zi Xuan Wu bdd7c53dc5 [CSKY] Add compressed instruction mapping between 32-bit and 16-bit instruction
Add all CompressPat to map instructions between 16-bit and 32-bit with using the CompressInstEmitter infra.
Although it's only used in asm printer, also enable it in asm parser to debug mapping when -enable-csky-asm-compressed-inst is on.

Differential Revision: https://reviews.llvm.org/D115026
2021-12-06 14:04:54 +08:00
Qiu Chaofan e3c2694da9 [PowerPC] Implement general back2back fusion
Implement 'back-to-back' FX fusion according to Power10 User Manual
'19.1.5.4 Fusion', not enabled by default.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D114345
2021-12-06 10:15:05 +08:00
Jack Andersen f108c7f59d [GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.

The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D112852
2021-12-05 15:55:59 -05:00
Phoebe Wang f37d9b4112 [X86][FP16] Replace vXi16 to vXf16 instead of v8f16
Fixes pr52561

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D114304
2021-12-05 19:19:11 +08:00
David Green 57ff805a6d [DAG] Create fptoui.sat from clamped fptosi
As an extension to D111976, this converts clamp fptosi, clamped between
0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with
conversions that naturally saturate, such as Arm.

X86 disables the transform as some of the test cases increases in size.
A fptoui.sat necessitates a fp clamp without native support, so there is
little use in converting if the instruction is just going to be
expanded.

Differential Revision: https://reviews.llvm.org/D112428
2021-12-05 09:25:52 +00:00
Michael Liao 53fc971a4b Fix `-Wunused-variable` warning. NFC. 2021-12-04 23:34:55 -05:00
Matt Arsenault 729bf9b26b AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.
2021-12-04 10:49:18 -05:00
Matt Arsenault 2959e082e1 AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default
Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm.amdgcn.implicitarg.ptr. This violates the principle of being
correct by default. Some OpenMP testcases were broken recently since
it wasn't correctly setting this attribute, and no known frontends are
setting this to anything other than the maximum.

Most of the test changes are from load widening of argument loads
since there now more implied dereferenceable bytes.
2021-12-04 10:38:25 -05:00
Matt Arsenault ae0ba7dedd AMDGPU: Optimize out implicit kernarg argument allocation if unused
We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be
unused. Start using it to avoid allocating the implicit arguments if
unneeded.
2021-12-04 10:38:25 -05:00
Jay Foad 2774bad112 [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args
The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.

Don't change the API of the corresponding OpenCL builtins.

Differential Revision: https://reviews.llvm.org/D115032
2021-12-04 10:32:11 +00:00
Stanislav Mekhanoshin 3b17cb1506 [AMDGPU] Kill def when folding immediate in two-addr pass
Two-address pass works right before RA and if an immediate
was folded into an instruction there is nothing to remove
the dead def. We end up with something like:

	v_mov_b32_e32 v14, 0xc1700000
	v_mov_b32_e32 v14, 0x41200000
	v_fmaak_f32 v51, s67, v19, 0xc1700000
	v_fmaak_f32 v38, v51, v19, 0x4120000

The patch kills the dead move instruction right in the folding.

Differential Revision: https://reviews.llvm.org/D114999
2021-12-03 09:37:49 -08:00
David Green ab0c5cea0b [ARM] Use v2i1 for MVE and CDE intrinsics
This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal
type, to use a <2 x i1> as opposed to emulating the predicate with a
<4 x i1>. The v4i1 workarounds have been removed leaving the natural
v2i1 types, notably in vctp64 which now generates a v2i1 type.

AutoUpgrade code has been added to upgrade old IR, which needs to
convert the old v4i1 to a v2i1 be converting it back and forth to an
integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be
optimized away in the final assembly.

Differential Revision: https://reviews.llvm.org/D114455
2021-12-03 15:27:58 +00:00
Nemanja Ivanovic d6c0ef7887 [PowerPC] Handle base load with reservation mnemonic
The Power ISA defined l[bhwdq]arx as both base and
extended mnemonics. The base mnemonic takes the EH
bit as an operand and the extended mnemonic omits
it, making it implicitly zero. The existing
implementation only handles the base mnemonic when
EH is 1 and internally produces a different
instruction. There are historical reasons for this.
This patch simply removes the limitation introduced
by this implementation that disallows the base
mnemonic with EH = 0 in the ASM parser.

This resolves an issue that prevented some files
in the Linux kernel from being built with
-fintegrated-as.

Also fix a crash if the value is not an integer immediate.
2021-12-03 09:13:02 -06:00
David Green 255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
Petar Avramovic 0b34ffe4a6 AMDGPU/GlobalISel: Add clamp combine
Add clamp combine. Source is fminnum(fmaxnum(Val, 0.0), 1.0) or
fmaxnum(fminnum(Val, 1.0), 0.0) or fmed3 intrinsic with 0.0 and
1.0 as two out of three operands.

Differential Revision: https://reviews.llvm.org/D90052
2021-12-03 12:49:39 +01:00
Petar Avramovic ec54867d75 AMDGPU/GlobalISel: Add floating point med3 combine
Add floating point version of med3 combine.
Source is fminnum(fmaxnum(Val, K0), K1) or fmaxnum(fminnum(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1.

Differential Revision: https://reviews.llvm.org/D90051
2021-12-03 12:49:39 +01:00
Petar Avramovic ab01f4d264 AMDGPU/GlobalISel: Do not fcanonicalize const splat padded with undef
Recognize constant splat padded with undef in isCanonicalized.
Fcanonicalize will be removed by RemoveFcanonicalize in post-legalizer
combiner. We will treat undef as value that will result in a splat
in clamp combine after regbankselect.

Differential Revision: https://reviews.llvm.org/D104408
2021-12-03 12:49:38 +01:00
Victor Perez 9eb7322748 [RISCV][VP] Add RVV codegen for vp.select
Lower vp.select instrinsic to VSELECT_VL.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D114629
2021-12-03 11:02:20 +00:00
Jessica Clarke a3530dc199 [AArch64][NFC] Alter ComplexPattern types to be consistent with their uses
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review.

AArch64 currently has several ComplexPatterns that are used in contexts
where they're expected to be an iPTR. The cases that lead to type
contradictions are separated out in D108759, but there are additional
differences to the TableGen output when using my locally-patched
TableGen. None of these appear to matter, at least for passing all the
CodeGen tests, but it's safer to avoid such changes (and similar changes
were causing issues on some AMDGPU tests, causing failures to select).
Changing these additional ComplexPatterns to use iPTR rather than i64
ensures that the TableGen output remains bit-for-bit identical (compared
to without having this patch and my TableGen patch, as well as the
intermediate state of having this patch but not my TableGen patch), and
more accurately captures the higher-level meaning of these patterns.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D109034
2021-12-03 07:04:59 +00:00
Jessica Clarke 3ee56eed2f [AMDGPU][NFC] Alter ComplexPattern types to be consistent with their uses
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review.

AMDGPU currently has several ComplexPatterns that are used in contexts
where they're expected to be an iPTR, and where using an iPTR instead of
a fixed-width integer type matters. With my locally-patched TableGen,
none of these mismatches result in type contradictions, but do change
the patterns and cause various failures to select. These changes to the
ComplexPatterns' types reflect how they are actually used, result in
bit-for-bit identical TableGen output (without my local TableGen patch),
and ensure that with improved type inference AMDGPU's backend will
continue to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109032
2021-12-03 07:04:59 +00:00
Jessica Clarke 0cb44cfbb7 [AArch64][NFC] Fix ComplexPattern types conflicting with uses
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review,
but these are all the type conflicts found (all legitimate) by my
locally-patched TableGen.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108759
2021-12-03 07:04:59 +00:00
Kirill Stoimenov 021ecbbb44 [ASan] Changed intrisic implemenation to use PLT safe registers.
Changed registers to R10 and R11 because PLT resolution clobbers them. Also changed the implementation to use R11 instead of RCX, which saves a push/pop.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D115002
2021-12-03 04:06:30 +00:00
Daniil Fukalov ab05ab59a7 [CostModel][AMDGPU] Fix instructions costs estimation for vector types.
1. Fixed vector instructions costs estimations incosistency - removed different
   logic for "not simple types" since it biases costs for these types.
2. Fixed legalization penalty for vectors too big for the target: changed from
   overwrite default legalization cost value estimation to added penalty.
3. Fixed few typos in tests.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114893
2021-12-03 03:08:08 +03:00
Matt Arsenault 0eebe2e36c AMDGPU: Sanitized functions require implicit arguments
Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a
function is explicitly marked amdgpu-no-implicitarg-ptr and
sanitize_address, infer that it is required.
2021-12-02 17:55:43 -05:00
Amy Kwan c27734c183 [PowerPC] Fix load/store selection infrastructure when load/store intrinsics are used on P10.
The load/store infrastructure previously made an incorrect assumption that
whenever it is used with a load/store intrinsic on Power10 - those intrinsics
would automatically be the lxvp/stxvp intrinsics introduced in Power10.

However, this is obviously not the case as there are multiple instances of
pre-P10 intrinsics that use the refactored load/store implementation.
This patch corrects this assumption, and produces the expected intrinsic on pre-P10.

Differential Revision: https://reviews.llvm.org/D114978
2021-12-02 15:59:29 -06:00
Kazu Hirata 262dd1e42d [llvm] Use range-based for loops (NFC) 2021-12-02 09:27:47 -08:00
David Green b8f1ccb0ac [ARM] Introduce i8neg and i8pos addressing modes
Some instructions with i8 immediate ranges can only hold negative values
(like t2LDRHi8), only hold positive values (like t2STRT) or hold +/-
depending on the U bit (like the pre/post inc instructions. e.g
t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8,
AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.

This allows us to get the offset ranges of t2LDRHi8 correct in the
load/store optimizer, fixing issues where we could end up creating
instructions with positive offsets (which may then be encoded as ldrht).

Differential Revision: https://reviews.llvm.org/D114638
2021-12-02 17:10:26 +00:00
David Stuttard 0e8590f065 [AMDGPU] Add support for in-order bvh in waitcnt pass
bvh should be handled separately from vmem and vmem with sampler instructions
for waitcnt handling.

Differential Revision: https://reviews.llvm.org/D114794
2021-12-02 14:26:11 +00:00
Simon Moll e92429d99b [VE][NFC] Cleanup redundant namespace wrapper 2021-12-02 15:23:37 +01:00
Matt Devereau 4244f95cc6 [AArch64][SVE] Enable bf16 vector.insert
Allow passthrough bf16 registers for vector.insert

Differential revision: https://reviews.llvm.org/D114858
2021-12-02 12:59:19 +00:00
David Green e629302558 [ARM] Correct range in isLegalAddressImm
The ranges in isLegalAddressImm were off by one, not allowing the
maximum values for unscaled offsets.

Differential Revision: https://reviews.llvm.org/D114636
2021-12-02 11:33:40 +00:00
Simon Moll e37000f3bf [VE][NFC] Fix use-after-free in PVFMK expansion
There is custom expansion code for packed VFMK Pseudos in the VE
backend.  This code erased the Pseudo without telling
ExpandPostRAPseudos about it, causing the generic expansion function to
access the erased Pseudo.  This bug triggered in the
test/CodeGen/VE/VELIntrinsics/vfmk.ll test with asan-enabled builds.

Detected by:
sanitizer-x86_64-linux-fast
(https://lab.llvm.org/buildbot/#/builders/5/builds/15393)
2021-12-02 10:41:11 +01:00
David Green 646c872f9d [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Recommitted from 0e98659ea1 with a fix for APInt comparison.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-02 07:56:27 +00:00
Austin Kerbow da067ed569 [AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.

Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.

This type of change needs to be evaluated experimentally. Preliminary
results are positive.

Fixes: SWDEV-282962

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114777
2021-12-01 22:31:28 -08:00
Phoebe Wang f13b43d570 [X86][FP16] Only generate approximate rsqrt when Reciprocal is true for half type
We have reasonable fast sqrt and accurate rsqrt for half type due to the
limited fractions. So neither do we need multi steps refinement for
rsqrt nor replace sqrt by rsqrt.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114844
2021-12-02 13:52:45 +08:00
Phoebe Wang 4756a2f157 [X86] Insert FMUL for estimated non reciprocal SQRT when `RefinementSteps` = 0
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D114843
2021-12-02 13:52:45 +08:00
Christudasan Devadasan 399b7de0ea [AMDGPU] Add a regclass flag for scalar registers
Along with vector RC flags, this scalar flag will
make various regclass queries like `isVGPR` more
accurate.

Regclasses other than vectors are currently set
with the new flag even though certain unallocatable
classes aren't truly scalars. It would be ok as long
as they remain unallocatable.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110053
2021-12-01 23:31:07 -05:00
Petar Avramovic 641906da8d AMDGPU/GlobalISel: Fix constant bus restriction errors for med3
Detected on targets older then gfx10 (e.g. gfx9) for constants that are
too large to be inlined (constant are sgpr by default).
In med3 combine it is expected that regbankselect maps all operands of
min/max we try to match to vgpr. However constants are mapped to sgpr
and there will be a sgpr-to-vgpr copy. Matchers look through sgpr-to-vgpr
copies and return sgpr and these break constant bus restriction.
Build med3 with all vgpr operands. Use existing sgpr-to-vgpr copies for
matched sgprs. If there is no such copy (not expected) build one.

Differential Revision: https://reviews.llvm.org/D114700
2021-12-01 21:36:37 +01:00
Craig Topper 2f6beb7b0e [RISCV] Add inline expansion for vector ftrunc/fceil/ffloor.
This prevents scalarization of fixed vector operations or crashes
on scalable vectors.

We don't have direct support for these operations. To emulate
ftrunc we can convert to the same sized integer and back to fp using
round to zero. We don't need to do a convert if the value is large
enough to have no fractional bits or is a nan.

The ceil and floor lowering would be better if we changed FRM, but
we don't model FRM correctly yet. So I've used the trunc lowering
with a conditional add or subtract with 1.0 if the truncate rounded
in the wrong direction.

There are also missed opportunities to use masked instructions.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113543
2021-12-01 11:25:28 -08:00
Simon Moll 435d44bf8a [VE][NFC] Fix use-after-free in VEInstrInfo
First call getOperand, then erase the MachineInstr. Not the other way
round.

Expected to fix test/CodeGen/VE/VELIntrinsics/lvm.ll

Detected by asan buildbot:

  sanitizer-x86_64-linux-fast
  (https://lab.llvm.org/buildbot/#/builders/5/builds/15384)
2021-12-01 19:30:27 +01:00
Reid Kleckner c6fa4c481a [AArch64] Fix unused variable warning with NDEBUG, NFC 2021-12-01 09:28:22 -08:00
Simon Pilgrim 19d34f6e95 [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT
combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.

Differential Revision: https://reviews.llvm.org/D113371
2021-12-01 16:37:49 +00:00
Simon Pilgrim 1bd01defff [VE] Remove switch with only default case statement to fix MSVC warning. NFC. 2021-12-01 16:37:48 +00:00
Yousuf Ali 415e821a50 [PowerPC][AIX] Add toc-data support for 64-bit AIX small code model.
The patch expands the existing 32-bit toc-data attribute support to 64-bit.
In both 32-bit and 64-bit it is supported for small code model only.

Differential Revision: https://reviews.llvm.org/D114654
2021-12-01 10:56:21 -05:00