Commit Graph

69244 Commits

Author SHA1 Message Date
WANG Xuerui 3155f6c508 [LoongArch] Expand llvm.stacksave and llvm.stackrestore
As in commit bfb00d4c1c ("[RISCV] Allow lowering of dynamic_stackalloc, stacksave, stackrestore").

Differential Revision: https://reviews.llvm.org/D134435
2022-09-29 09:07:44 +08:00
wanglei 036b170c24 [LoongArch] Produce a R_LARCH_32_PCREL relocation
LoongArchELFObjectWriter::getRelocType check IsPCRel for FK_Data_4
(which we produce a R_LARCH_32_PCREL relocation for if IsPCRel).

R_LARCH_32_PCREL is required for FDE relocation.

Differential Revision: https://reviews.llvm.org/D134715
2022-09-29 09:04:44 +08:00
wanglei 7b1bdfbeb0 [LoongArch] Override TargetSubtargetInfo::getSelectionDAGInfo
The target selection DAG lowering information is needed for
SelectionDAGBuilder to lower a call like memcmp into an optimized
form.

Differential Revision: https://reviews.llvm.org/D134712
2022-09-29 08:46:53 +08:00
Jessica Paquette 95dabac7a5 [AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0
A few issues:

  1. There was no legalizer test for G_PTRTOINT
  2. Same clamping issue as in many other opcodes
  3. AArch64 pointers can only be 64b, so in reality we always have to trunc or
     extend with any size other than p0 anyway.

This seems to actually produce more correct selection for narrow types as well.

Differential Revision: https://reviews.llvm.org/D107588
2022-09-28 16:20:24 -07:00
Jessica Paquette a7aaafde2e [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN
This is intended to be equivalent to the s32 + s64 cases in
AArch64TargetLowering::LowerFCOPYSIGN.

Widen everything and then use G_BIT + a mask to handle the actual copysign
operation. Then, narrow back down to s32/s64.

I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable
pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces
reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.)
If there's a better way to do this then I'm happy to change it.

We also have a couple codegen deficiencies with how we emit vector constants
right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff)

Differential Revision: https://reviews.llvm.org/D108725
2022-09-28 16:03:22 -07:00
Florian Mayer 0401dc2913 [MTE] [HWASan] unify isInterestingAlloca
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D134779
2022-09-28 15:52:34 -07:00
Jessica Paquette 4957ee6529 [AArch64][GlobalISel] Add a target-specific G_BIT opcode.
This is necessary for custom-legalizing G_FCOPYSIGN.

This is equivalent to the BIT instruction (bitwise insert if true).

Add selection testcases for imported patterns.

Differential Revision: https://reviews.llvm.org/D108714
2022-09-28 15:48:35 -07:00
Xiang Li 26129766df [DirectX backend] Support global ctor for DXILBitcodeWriter.
1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType.
   This will allow use GlobalVariable/Function as value.
2. Save target type for global ctors for Constant.
3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D133283
2022-09-28 13:23:56 -07:00
Stanislav Mekhanoshin 5a3fe9a039 [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-28 13:13:40 -07:00
Nico Weber 90f7f24b20 try to fix build yet more after 16544cbe64 2022-09-28 15:40:52 -04:00
Baptiste b556726ccc [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
One of the conditions to flush the vmcnt counter in loop preheaders is: The loop
contains a use of a vgpr that is defined out of the loop. The code currently
checks if a waitcnt is needed by looking at the score of that vgpr in the score
brackets. This is not enough and may cause the generation of an unnecessary
vmcnt flush. This patch fixes that case.

Differential Revision: https://reviews.llvm.org/D130313
2022-09-28 13:05:50 -04:00
Jon Chesterfield 35f2584ef9 [amdgpu] Error, instead of miscompile, anonymous kernels using lds
The association between kernel and struct is done by symbol name.
This doesn't work robustly for anonymous kernels as shown by the modified
test case.

An alternative association between function and struct can be constructed
if necessary, probably though metadata, but on the basis that we currently
miscompile anonymous kernels and that they are difficult to construct from
application code and difficult to call from the runtime, this patch makes
it a fatal error for now.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134741
2022-09-28 16:30:04 +01:00
Matt Arsenault 7a84624079 AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something
zero initialized.

This fixes an infinite loop when combining some vector types.
Also fixes zero initializing some undef values.

SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking
for the legality of the output undefs they are replacing unused
operations with. This resulted in turning vectors into undefs
that were later re-legalized back into zero vectors.
2022-09-28 10:48:52 -04:00
Matt Devereau 0a4771a7e8 [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
For gathers which load in 8 and 16 bit data then use that data
as an index, the index can be extended to 32 bits instead of
64 bits

Differential Revision: https://reviews.llvm.org/D130692
2022-09-28 14:42:12 +00:00
Florian Hahn eba84971ae
Revert "[AARCH64][CostModel] Modified the cost of mask vector load/store"
This reverts commit 1c62af3e23.

The commit causes the test below to fail. Revert for now to get the bots
back to green.

Failing test:
lvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll
2022-09-28 15:35:13 +01:00
Florian Hahn 2d3c260362
[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load
Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use
256-bit loads instead.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133421
2022-09-28 15:20:26 +01:00
Jon Chesterfield 80ba432821 [amdgpu][nfc] Allocate kernel-specific LDS struct deterministically
A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic address by
allocating it at the same time as the module scope instance.

This is relatively likely to be where the instance was allocated anyway (~NFC)
but will allow later patches to calculate where a given field can be found,
which means a function which is only reachable from a single kernel will be
able to access a LDS variable with zero overhead. That will be particularly
helpful for applications that instantiate a function template containing LDS
variables once per kernel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127052
2022-09-28 14:55:16 +01:00
Archibald Elliott ff4027d152 [ARM] Support fp16/bf16 using t constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "t"
constraint, which means "VFP floating-point registers s0-s31" - fp16 and
bf16 values are stored in S registers too.

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 't' constraint.

Fixes #57753

Differential Revision: https://reviews.llvm.org/D134553
2022-09-28 14:48:21 +01:00
liqinweng 1c62af3e23 [AARCH64][CostModel] Modified the cost of mask vector load/store
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D134413
2022-09-28 19:40:29 +08:00
Carl Ritson 266b5dbc5d [AMDGPU] Add MIMG NSA threshold configuration attribute
Make MIMG NSA minimum addresses threshold an attribute that can
be set on a function or configured via command line.
This enables frontend tuning which allows increased NSA usage
where beneficial.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134780
2022-09-28 20:03:18 +09:00
Simon Pilgrim 759bedade5 Fix MSVC "not all control paths return a value" warning. NFCI. 2022-09-28 10:56:37 +01:00
wanglei 983a0ae5cf [LoongArch] Specify registers used in DWARF exception handling
Defines LoongArch registers for getExceptionPointerRegister() and
getExceptionSelectorRegister().

Differential Revision: https://reviews.llvm.org/D134709
2022-09-28 17:53:16 +08:00
Cullen Rhodes 3918ef07c4 [AArch64][SVE] Remove redundant ptest after match/nmatch
These instructions are flag setting so the ptest is redundant, the
TableGen class wasn't setting the element size for the predicate causing
the checks in AArch64InstrInfo::optimizePTestInstr to fail.
2022-09-28 08:23:23 +00:00
eopXD 9677d70eb2 [VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134759
2022-09-27 19:45:58 -07:00
gonglingqin 95d2367647 [LoongArch] Expand FSIN/FCOS/FSINCOS/FPOW/FREM
Differential Revision: https://reviews.llvm.org/D134628
2022-09-28 09:42:41 +08:00
Florian Mayer 979db5343f [HWASan] [NFC] use auto* over auto& for pointers 2022-09-27 18:19:25 -07:00
Han-Kuan Chen c595c874cb [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133688
2022-09-27 17:25:34 -07:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Quentin Colombet ce35e8b426 [RISCV][ISel] Remove the commutative flag on SUB
I wasn't able to produce a testcase for that because right now VWSUB is
only generated from VWSUB_W and from there to trigger the commutative
bug we would need to grab VWSUB where the splat value is on the LHS,
which is currently not matched.

Differential Revision: https://reviews.llvm.org/D134701
2022-09-27 20:15:01 +00:00
Philip Reames b54c571a01 [RISCV] Extend strided load/store pattern matching to non-loop cases
The motivation here is to enable a change I'm exploring in vectorizer to prefer base + offset_vector addressing for scatter/gather. The form the vectorizer would end up emitting would be a gep whose vector operand is an add of the scalar IV (splated) and the index vector. This change makes sure we can recognize that pattern as well as the current code structure. As a side effect, it might improve scatter/gathers from other sources.

Differential Revision: https://reviews.llvm.org/D134755
2022-09-27 12:56:58 -07:00
eopXD 163cb33854 [VP][RISCV] Add vp.ceil and RISC-V support
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 12:04:09 -07:00
eopXD 384b8b3da7 Revert "[VP][RISCV] Add vp.ceil and RISC-V support"
This reverts commit 8b00b24f85.
2022-09-27 11:12:57 -07:00
eopXD 8b00b24f85 [VP][RISCV] Add vp.ceil and RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 11:08:27 -07:00
Krzysztof Parzyszek 7da2b91887 [Hexagon] Unify getSizeOfs in HexagonVectorCombine, NFC 2022-09-27 10:51:52 -07:00
Krzysztof Parzyszek 9c9e877b7e [Hexagon] Move function to a different class, NFC
"Sector" is a concept from AlignVectors, so the check for it
should be there.
2022-09-27 10:32:52 -07:00
Stefan Gränitz ed8409dfa0 [ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows
Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI.

Reviewed By: rnk, fhahn

Differential Revision: https://reviews.llvm.org/D134441
2022-09-27 18:49:40 +02:00
David Green 401481daac [AArch64] Remove incorrect zero element insert-bitcast patterns
These two patterns are not working as intended, as shown in D134022.
They need to insert the value into the new register, not override it.
2022-09-27 17:08:17 +01:00
Philip Reames 77b202f974 [RISCV] Rename getVectorImmCost to getStoreImmCost [nfc]
My original intent had been to reuse this for arithmetic instructions as well, but due to the availability of a immediate splat encoding there, we will need different heuristics.  So specialize the existing code for the store case.
2022-09-27 08:22:13 -07:00
wanglei 823ce6ad18 [LoongArch] Add some comments for expand pseudo-inst pass. NFC
Differential Revision: https://reviews.llvm.org/D134708
2022-09-27 20:26:07 +08:00
Kazushi (Jam) Marukawa de8013201f [VE] Change to expand FPOW
VE doesn't have FPOW instruction, so this patch makes llvm expand it.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134695
2022-09-27 20:03:10 +09:00
WANG Xuerui c2a44b591e [LoongArch] Support lowering frames larger than 2048 bytes
Differential Revision: https://reviews.llvm.org/D134582
2022-09-27 18:58:33 +08:00
David Sherwood fbb119412f [AArch64] Add Neoverse V2 CPU support
Adds support for the Neoverse V2 CPU to the AArch64 backend.

Differential Revision: https://reviews.llvm.org/D134352
2022-09-27 07:56:08 +00:00
Paulo Matos 1bd1a44070 [WebAssembly] Use intrinsics for table.get/set instructions
Initial table.get/set implementation would match and lower combinations
of GEP+load/store to table.get/set instructions. However, this is error
prone due to potential combinations of GEP+load/store we don't implement,
and load/store optimizations. By changing the code to using intrinsics, we
 avoid both issues and simplify the code.

New builtins implemented:
* @llvm.wasm.table.get.externref
* @llvm.wasm.table.get.funcref
* @llvm.wasm.table.set.externref
* @llvm.wasm.table.set.funcref

Reviewed By: asb, tlively

Differential Revision: https://reviews.llvm.org/D134436
2022-09-27 09:16:30 +02:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Vitaly Buka 20a80d60a8 Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"
Break msan bots. Details in D134666.

This reverts commit 0ce96e06ee.
2022-09-26 22:22:09 -07:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Stanislav Mekhanoshin 0ce96e06ee [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-26 13:20:24 -07:00
Krzysztof Parzyszek dfaf7a2846 [Hexagon] Avoid some unnecessary sign-extend instructions
Simplify (sext_inreg (extractu ...)) -> (extract ...) where appropriate.
2022-09-26 12:30:18 -07:00
Krzysztof Parzyszek d6c0a5be7f [Hexagon] Make sure we can still shift scalar vectors by non-splats 2022-09-26 11:25:06 -07:00
Changpeng Fang dee4bc4a4e AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers
Summary:
  With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in some cases.
This work takes care of this change, and recognizes the new address patterns to do appropriate optimizations.

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D134596
2022-09-26 09:31:52 -07:00
Kazushi (Jam) Marukawa 1cef30b9d3 [VE] Disable automatic maxnum/minnum selection
Disable FMAX/FMIN selection from select_cc in VEInstrInfo.td because of
the lack of NaN consideration.  This patch removes such selection from
VEInstrInfo.td and lets llvm work on it in combineMinNumMaxNum.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134595
2022-09-26 22:04:02 +09:00
Kazushi (Jam) Marukawa 76c76e9ab4 [VE] Support smax/smin
Support smax/smin in VEInstrInfo.td.  Remove obsolete patterns for
smax/smin.  Add regression tests for smax/smin/umax/umin.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134583
2022-09-26 22:02:57 +09:00
Momchil Velikov 6602110152 [ARM] Enable and/cmp0 folding
The `CodeGenPrepare` pass can sink bitwise `and` used by compare to
zero into the basic blocks where the users are. This operation is
guarded by lowering hook, which is disabled for ARM.  In the ARM
architecture versions from v7-M up these two operations can be folded
into `tst rN, #imm` instruction. Sinking of `and` can also enable
the cmov-to-bfi DAG combiner.

This patch fixes some benchmark regressions caused
by https://reviews.llvm.org/D129370 as well scoring slightly better overall.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D134360
2022-09-26 11:31:23 +01:00
David Green bebc96956b [AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus
The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic,
allowing the linker to relax adrp;add pairs where possible. D132075
extended that to neoverse-n1, this patch extends it to all other cortex
and neoverse cpus for the same reasons.

Differential Revision: https://reviews.llvm.org/D134521
2022-09-26 09:55:10 +01:00
gonglingqin a6d699b55d [LoongArch] Add codegen support for strict_fsetccs and any_fsetcc
Differential Revision: https://reviews.llvm.org/D134274
2022-09-26 13:05:36 +08:00
wanglei 75265c7f49 [LoongArch] Lower BlockAddress/JumpTable
This patch uses a unified interface for lower GlobalAddress ConstantPool
BlockAddress and JumpTable.

This patch allows lowering addresses by using PC-relative addressing
for DSO-local symbols, and accessing the address through the global
offset table for DSO-preemptable symbols.

Remove hardcoded `MininumJumpTableEntries` for test lower JumpTable.

Also updated some test cases using ConstantPool, due to the addition of
relocation information.

Differential Revision: https://reviews.llvm.org/D134431
2022-09-26 10:52:54 +08:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
WANG Xuerui ad6fe32032 [LoongArch] Support 'generic' as a valid CPU name
As the LoongArch port is largely modeled after RISCV it has the same
behavior of not accepting `generic` as a CPU name. For better
compatibility with consumers of LLVM (e.g. mesa) follow D121149's suit
and treat `generic` the same as an empty CPU name.

Differential Revision: https://reviews.llvm.org/D134412
2022-09-26 10:20:13 +08:00
Ruiling Song bf25a48985 Add override for runOnFunction()
Fix build-bot failure.
2022-09-26 10:19:35 +08:00
WANG Xuerui d2ac89b64e [LoongArch] Support fastcc and treat it as ccc
As explained in D68559 the `fastcc` calling convention may be requested
under certain conditions, hence the need for supporting it. But unlike
RISCV we actually treat it exactly like ccc, without actually inventing
any performance hack right here. And CSKY does the same thing.

This is going to fix a few more test cases with native LoongArch builds.

Differential Revision: https://reviews.llvm.org/D134443
2022-09-26 10:15:00 +08:00
WANG Xuerui f89f0990db [LoongArch] Support llvm.thread.pointer
For `__builtin_thread_pointer` to work, among other things.

Similar to D76828 for RISCV.

Differential Revision: https://reviews.llvm.org/D134368
2022-09-26 09:56:42 +08:00
Ruiling Song cf14c7caac AMDGPU: Add a pass to rewrite certain undef in PHI
For the pattern of IR (%if terminates with a divergent branch.),
divergence analysis will report %phi as uniform to help optimal code
generation.
```
  %if
  | \
  | %then
  | /
  %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ]
```
In the backend, %phi and %uniform will be assigned a scalar register.
But the %undef from %then will make the scalar register dead in %then.
This will likely cause the register being over-written in %then. To fix
the issue, we will rewrite %undef as %uniform. For details, please refer
the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test
changes shown, but this is mandatory for later changes.

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D133840
2022-09-26 09:54:47 +08:00
Weining Lu 394f30919a [Clang][LoongArch] Add inline asm support for constraints f/l/I/K
This patch adds support for constraints `f`, `l`, `I`, `K` according
to [1]. The remain constraints (`k`, `m`, `ZB`, `ZC`) will be added
later as they are a little more complex than the others.
f: A floating-point register (if available).
l: A signed 16-bit constant.
I: A signed 12-bit constant (for arithmetic instructions).
K: An unsigned 12-bit constant (for logic instructions).

For now, no need to support register alias (e.g. `$a0`) in llvm as
clang will correctly decode the usage of register name aliases into
their official names. And AFAIK, the not yet upstreamed `rustc` for
LoongArch will always use official register names (e.g. `$r4`).

[1] https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html

Differential Revision: https://reviews.llvm.org/D134157
2022-09-26 08:49:58 +08:00
James Y Knight 4f188ef89c [AVR] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

It renames a few fields to have consistent names, as well as renaming
operands to match the field names.

The encoder behavior is unchanged by this cleanup, but a few
instructions were previously being disassembled incorrectly, and have
been corrected by this change. All of the affected instructions were
missing disassembly tests, which are now added.

Differential Revision: https://reviews.llvm.org/D134185
2022-09-25 17:55:09 -04:00
James Y Knight a8c59bcc01 [AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors in R600.
This is a follow-on to https://reviews.llvm.org/D134073.

It renames a couple of fields to match their operands, as well as
introducing sub-operand names where required.

This change _only_ fixes the 'R600' half of the target, not the
'AMDGPU' half. Fixing the AMDGPU half will be a significantly more
difficult change (which I've not yet attempted.)

Differential Revision: https://reviews.llvm.org/D134078
2022-09-25 17:55:09 -04:00
James Y Knight 0f99958e79 [Lanai] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

Lanai was almost clean: the only issue is that 'bit' behaves
differently than 'bits<1>', because only the 'bits' type preserves
unresolved references via 'keepUnsetBits()' in
TableGen/Record.h. Thus, use bits instead.

This issue _would_ have caused invalid instruction emission/decoding,
except that the PQ bits were being overriden after the fact by code in
'adjustPqBits' in MCTargetDesc/LanaiMCCodeEmitter.cpp, and
'PostOperandDecodeAdjust' in Disassembler/LanaiDisassembler.cpp.

Differential Revision: https://reviews.llvm.org/D134075
2022-09-25 17:55:09 -04:00
Simon Pilgrim 196f27bb56 [CostModel][X86] Add missing cost kinds for v2i64 icmp on SLM 2022-09-25 15:12:21 +01:00
Simon Pilgrim faff990e9b [X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case
Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only

Confirmed with Intel SOM, Agner and instlatx64
2022-09-25 14:18:08 +01:00
Petar Avramovic dcc756d03e [AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr
Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl
test_flat_add_local_f64 caused by D130579
Revert a3becb333d.

Differential Revision: https://reviews.llvm.org/D134568
2022-09-25 13:25:41 +02:00
Philip Reames 5358968e13 [RISCV] Pattern match scalable strided load/store
Very straight forward extension of the existing pattern matching pass to handle scalable types as well as fixed length types. The only extra bit beyond removing a bailout is recognizing stepvector.

Differential Revision: https://reviews.llvm.org/D134502
2022-09-24 17:41:58 -07:00
Philip Reames 6e7c54ecaf [RISCV] Add lowering for scalable @llvm.riscv.masked.strided.load/store
The code previously assumed fixed length vectors; make the relevant code conditional.

Having the lowering in place is neccessary for an upcoming change to generalize scatter/gather matching to scalable vectors.

Differential Revision: https://reviews.llvm.org/D134489
2022-09-24 17:41:57 -07:00
James Y Knight 5351878ba1 [TableGen] Add useDeprecatedPositionallyEncodedOperands option.
Summary:
The existing undefined-bitfield-to-operand matching behavior is very
hard to understand, due to the combination of positional and named
matching. This can make it difficult to track down a bug in a target's
instruction definitions.

Over the last decade, folks have tried to work-around this in various
ways, but it's time to finally ditch the positional matching. With
https://reviews.llvm.org/D131003, there are no longer cases that
_require_ positional matching, and it's time to start removing usage
and support for it.

Therefore: add a (default-false) option, and set it to true only in
those targets that require positional matching today. Subsequent
changes will start cleaning up additional in-tree targets.

NOTE TO OUT OF TREE TARGET MAINTAINERS:

If this change breaks your build, you may restore the previous
behavior simply by adding:
  let useDeprecatedPositionallyEncodedOperands = 1;
to your target's InstrInfo tablegen definition. However, this is
temporary -- the option will be removed in the future.

If your target does not set 'decodePositionallyEncodedOperands', you
may thus start migrating to named operands. However, if you _do_
currently set that option, I recommend waiting until a subsequent
change lands, which adds decoder support for named sub-operands.

Differential Revision: https://reviews.llvm.org/D134073
2022-09-24 09:40:45 -04:00
James Y Knight a538d1f13a [TableGen][CodeEmitterGen] Allow local names for sub-operands in a operand list.
These names can then be matched by name against 'bits' fields in a
record, to populate an instruction's encoding.

This does _not_ yet change DecoderEmitter to allow by-name matching of
sub-operands. Unlike the encoder, the decoder already defaulted to not
supporting positional matching, and backends had workarounds in place
for the missing decoding support.

Additionally, use this new capability to allow the ARM and AArch64
backends not to require any positional operand matching.

Differential Revision: https://reviews.llvm.org/D131003
2022-09-24 09:40:44 -04:00
Craig Topper 4f86c5cbb7 [RISCV] Rename RISCVScheduleB.td to RISCVScheduleZb.td. NFC 2022-09-23 21:38:42 -07:00
Craig Topper 3967abcc0b [RISCV] Add missing scheduler classes to Zbkb and Zbkx instructions. 2022-09-23 21:38:42 -07:00
Craig Topper cde3de5381 [RISCV] Remove a few remnants of Zbr I misssed. 2022-09-23 21:21:51 -07:00
Craig Topper 19850cc2d8 Revert "[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type."
This reverts commit dd53a0bb30.

We have seen crashes from this internally. Probably due to the use
of RoundingMode::Dynamic.
2022-09-23 18:41:41 -07:00
Craig Topper 90a5d8499a [RISCV] Promote f16 STRICT_FCEIL/FLOOR/TRUNC/NEARBYINT/RINT/ROUND,ROUNDEVEN to f32. 2022-09-23 14:01:51 -07:00
Jay Foad ddfa0f62d8 [AMDGPU] Add GFX11 feature for subtargets with more VGPRs
The full complement of physical VGPRs for GFX11 is 50% more than GFX10.
Some subtargets have this, others stay the same as GFX10. This affects
occupancy calculations.

Differential Revision: https://reviews.llvm.org/D134522
2022-09-23 20:18:23 +01:00
Josh Stone cb46ffdbf4 [X86] Use BuildStackAdjustment in stack probes
This has the advantage of dealing with live EFLAGS, using LEA instead of
SUB if needed to avoid clobbering. That also respects feature "lea-sp".

We could allow unrolled stack probing from blocks with live-EFLAGS, if
canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used.

Differential Revision: https://reviews.llvm.org/D134495
2022-09-23 09:30:32 -07:00
Josh Stone 26c37b461a [X86] Don't allow prologue stack probing with live EFLAGS
Fixes https://github.com/llvm/llvm-project/issues/49509

Differential Revision: https://reviews.llvm.org/D134494
2022-09-23 09:30:32 -07:00
Josh Stone 4dcfb09e40 [NFC][CodeGen] Use const MF in TargetLowering stack probe functions
This makes them callable from places like canUseAsPrologue.

Differential Revision: https://reviews.llvm.org/D134492
2022-09-23 09:30:32 -07:00
Petar Avramovic 6db7921b65 AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd
Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.

global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.

buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).

Differential Revision: https://reviews.llvm.org/D130579
2022-09-23 17:52:10 +02:00
Petar Avramovic 5cee9047d5 AMDGPU: Improve atomicrmw fadd selection
Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.

Differential Revision: https://reviews.llvm.org/D131560
2022-09-23 17:52:10 +02:00
Petar Avramovic e03d36d4ae [AMDGPU] Add FeatureFlatAtomicFaddF32Inst
Feature used by targets that have flat_atomic_add_f32 instruction
(gfx940 and gfx11). Remove isGFX940GFX11Plus.
Add hasFlatAtomicFaddF32Inst Subtarget check for codegen.

Differential Revision: https://reviews.llvm.org/D134532
2022-09-23 17:52:10 +02:00
Simon Pilgrim a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Hassnaa Hamdi 181f200a1c [NFC]: AArch64-SVE
modify some comments
2022-09-23 12:07:31 +00:00
Caroline Concatto 5431bf27bd [AArch64]Remove svget/svset/svcreate from llvm
This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm.
It also implements the InstCombine for vector.extract that used to be in svget.

Depends on: D131547

Differential Revision: https://reviews.llvm.org/D131548
2022-09-23 10:48:43 +01:00
gonglingqin ac295597a8 [LoongArch] Add codegen support for atomicrmw add/sub/nand/and/or/xor operation
Differential Revision: https://reviews.llvm.org/D133755
2022-09-23 09:32:11 +08:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 52708be182 [RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions.
These extensions do not appear to be on their way to ratification.
2022-09-22 13:04:41 -07:00
Simon Pilgrim 98907f8685 [CostModel][X86] Tidyup sdiv/srem/udiv/urem by constant cost tables
Preparation for adding cost kinds handling

This is necessary to eventually unblock D111968
2022-09-22 20:46:33 +01:00
Chris Bieneman 4959bfa060 [NFC] Refactor dxil metadata code
DXIL relies on a whole bunch of IR metadata constructs being populated
in the right shape. Rather than just hard coding or using complicated
arrangements of constant data strings, let's make first-class objects
that reprensent the metadata and manage reading and writing the
metadata from the module.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D134397
2022-09-22 12:17:51 -05:00
Hassnaa Hamdi f2072e0ae0 [AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations
Differential Revision: https://reviews.llvm.org/D132477
2022-09-22 16:50:55 +00:00
Simon Pilgrim dc93202b44 [CostModel][X86] Remove duplicate ashr v4i64 cost table entry. NFCI. 2022-09-22 17:27:26 +01:00
Fraser Cormack 92d71c615d [RISCV] Use structured bindings in common RVV lowering code
This patch uses structured bindings to simplify a couple of specific
cases when lowering RVV operations where we commonly declare two
SDValues and immediately 'tie' them to the mask and vector length.

There's also a couple places where we split vectors that structured
bindings make sense to use.

This patch tries to keep these sorts of changes minimal and to cases
where the returned types are commonly understood, rather than applying
this wholesale to the RISCV backend.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134442
2022-09-22 16:38:40 +01:00
Philip Reames e41765aa4d [RISCV] Verify consistency of a couple TSFlags related to vector operands
Various bits of existing code assume the presence of one operand implies the presence of another.  Add verifier rules to catch violations.

Differential Revision: https://reviews.llvm.org/D133810
2022-09-22 08:35:17 -07:00
Craig Topper bf7c7696fe [RISCV] Improve support for vector fp_to_sint_sat/uint_sat.
The default fixed vector legalization is to unroll. The default
scalable vector legalization is to clamp in the FP domain. The
RVV vfcvt instructions have saturating behavior so we can use them
directly. The only difference is that RVV instruction turn nan into
the max value, but the _SAT intrinsics want 0.

I'm only supporting 1 step of narrowing for now. I think we can
support more steps by using VNCLIP to saturate and narrower.

The only case that needs 2 steps of widening is f16->i64 which we can
do as f16->f32->i64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D134400
2022-09-22 08:13:48 -07:00
Craig Topper d6cb8f85bf [RISCV] Formatting fixes to RISCV.td NFC
Improve indentation. Fix the worst of the 80 column violations.
2022-09-22 08:12:59 -07:00
Tim Northover 677da09d02 AArch64: add support for newer Apple CPUs
They're roughly ARMv8.6. This works in the .td file, but in
AArch64TargetParser.def, marking them v8.6 brings in support for the SM4
cryptographic hash and we don't actually have that. So TargetParser side
they're marked as v8.5, with the extra features (BF16 and I8MM added manually).

Finally, A16 supports the HCX extension in addition to v8.6. This has no
TargetParser implications.
2022-09-22 11:58:51 +01:00
Simon Pilgrim e030be64d8 [CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates
This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad.

This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 11:24:11 +01:00
Simon Pilgrim b2cd8118d0 [CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 10:19:23 +01:00
Ilia Diachkov 4421b24fe2 [SPIRV] fix build with clang and use PoisonValue instead of UndefValue
The patch fixes the SPIRV backend build using clang. It also replaces
UndefValue with PoisonValue in SPIRVRegularizer.cpp.

Fixes: #57773

Differential Revision: https://reviews.llvm.org/D134071
2022-09-22 11:49:54 +03:00
Craig Topper 8b8e18e11f [RISCV] Replace RISCVISD::GREV/GORC/SHFL/UNSHFL with BREV8/ORC_B/ZIP/UNZIP.
With Zbp removed, we no longer need the generalized forms.

The computeKnownBitsForTargetNode code brev8/orc.b is still based
on the general form with the shift amount forced to 7.
2022-09-21 21:57:59 -07:00
Craig Topper 182aa0cbe0 [RISCV] Remove support for the unratified Zbp extension.
This extension does not appear to be on its way to ratification.

Still need some follow up to simplify the RISCVISD nodes.
2022-09-21 21:22:42 -07:00
Fanchen Kong 8a2729fea7 [WebAssembly] Improve codegen for loading scalars from memory to v128
Use load32_zero instead of load32_splat to load the low 32 bits from memory to
v128. Test cases are added to cover this change.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D134257
2022-09-21 21:05:44 -07:00
Craig Topper 3b8ec0fde5 [RISCV] Remove some unused Predicates from tablegen. NFC
Specifically predicates for extensions that are subsets of other
extensions. These predicates should never be used. Should always
check the superset extension or the superset ORed with the sub extendsion.
2022-09-21 18:26:43 -07:00
Chris Bieneman e77c40ffbd [NFC] Make dxil namespace consistent
We have namespaces `DXIL` and `dxil`, which is just confusing. This
renames `DXIL` -> `dxil` making everything consistent.

While the LLVM coding standards don't have a clear direction here, I
chose lower case because by my current unscientific count there are
more places where we had the lowercase namespace than the uppercase.
2022-09-21 17:48:13 -05:00
Craig Topper 1d8a7adca6 [RISCV] Rename RISCVISD::SINT_TO_FP_VL/UINT_TO_FP_VL. NFC
Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it
clear that the ISD nodes don't have the poison semantics of
ISD::SINT_TO_FP/UINT_TO_FP.

I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT
patch and need the instruction semantics.
2022-09-21 15:33:04 -07:00
Fangrui Song 8805e5d1b7 [Hexagon] Fix -Wunused-variable in non-assertion builds after f6e7ad5604 2022-09-21 14:14:45 -07:00
Jay Foad 5c7ee894f8 AMDGPU: Stop validating earlyclobber operands in assembler
This validation was introduced in D34003 for v_qsad/v_mqsad instructions
but it applies to all instructions with earlyclobber operands, which now
includes v_mad_i64/v_mad_u64.

In all these cases I do not think there is documentation saying that the
destination must not overlap the sources. Rather there are *some* cases
where the instruction may not function correctly if there is an overlap,
and we are using earlyclobber as a conservative way of preventing
codegen from generating those cases.

I think it is unhelpful for the assembler to enforce the earlyclobber
restriction because it prevents assembling cases where the programmer
knows that in fact the overlap is safe.

See also: https://github.com/llvm/llvm-project/issues/57610

Differential Revision: https://reviews.llvm.org/D134272
2022-09-21 21:46:59 +01:00
Scott Linder 552539bdac Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler"
This reverts commit f583151461.
2022-09-21 18:48:42 +00:00
Krzysztof Parzyszek f6e7ad5604 [Hexagon] Revamp type legalization of ext/trunc/sat in HVX
Resizing operations (e.g. sign extension) in DAG can go from any width
to any other width, e.g. i8 -> i32. If the input and the result differ
by a factor larger than 2, the operation cannot be legal in HVX, since
the only two legal vector sizes in HVX are a single vector and a pair
of vectors.
To simplify the legalization, such operations are expanded into steps
that only double/halve the type size, so that each such step can be fully
legalized on its own. The complication is that DAG will automatically
fold these steps back into one, e.g. sext(sext) -> sext. To prevent that
new HexagonISD nodes are introduced: TL_EXTEND and TL_TRUNCATE. Once
legalized, these nodes are replaced with the original opcodes.

The type legalization is now common to aext/sext/zext/trunc and Hexagon-
specific ssat/usat nodes.
2022-09-21 11:25:27 -07:00
Florian Hahn ac434afed8
[AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4.
shuffle (tbl2, tbl2) can be folded into a single tbl4 if the mask for
the selected elements is constant.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133491
2022-09-21 19:15:56 +01:00
Simon Pilgrim 839ba13c3e [CostModel][X86] Add vbmi2 costs for funnelshift/rotate intrinsics
Add costs for the funnel shift instructions - fixes some discrepancies I was hitting with costs numbers from the 'cost-tables vs llvm-mca' script D103695
2022-09-21 13:48:22 +01:00
Kazushi (Jam) Marukawa eaa263485d [VE] Remove obsolete ANDrm patterns
Remove obsolete ANDrm patterns for MIMM operands.  We add these
translations to optimize commonly used cast operations before
we support MIMM operands directly by each isntruction.  Such
translations are obsolete now.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134341
2022-09-21 19:23:34 +09:00
David Green 4f78e022ee [AArch64] Lower scalar sqxtn intrinsics to use fp registers
The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return
integer types, but operate on fp registers. This can create some
inefficiencies in their lowering, where the registers are converted to
fp a little too late. This patch adds lowering for the intrinsics,
creating bitcasts to/from fp types to allow nicer folding later when the
instructions are selected, especially around insert/extracts.

Differential Revision: https://reviews.llvm.org/D134024
2022-09-21 10:46:43 +01:00
Kazushi (Jam) Marukawa 021d05a1ab [VE][NFC] Change to use l2i/i2l to simplify code
We previously added l2i/i2l macros to simpily EXTRACT_SUBREG/INSERT_SUBREG
conversions.  This patch changes VEInstrInfo.td to use such macros to
simplify existing code.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134118
2022-09-21 18:04:29 +09:00
Kazushi (Jam) Marukawa 337e54ec95 [VE] Add maxnum and minnum
Add maxnum and minnum for float and double.  Lowering is already
implemented, so this patch changes them legal and adds regression
tests.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134108
2022-09-21 18:03:49 +09:00
Kazushi (Jam) Marukawa 3ee64ea5cf [VE] Change to expand FMA
VE has fused multiply-add instruction for only vector calculations.  This
patch forces to expand scalar FMA to multiply and add instructions.
This patch also adds regression test.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134107
2022-09-21 18:02:55 +09:00
David Green 9a20596f48 [AArch64] Insert/Extract of bitcast patterns
This adds some quick tablegen patterns for vector_insert(bitcast(..))
and bitcast(vector_extract(..)), allowing us to avoid a round-trip
through GPRs.

Differential Revision: https://reviews.llvm.org/D134022
2022-09-21 09:54:17 +01:00
David Sherwood 64bef3d568 [AArch64][SME] Disable inlining when SME attributes require smstart/smstop or lazy-save.
Inlining must be disabled when the call-site needs to toggle PSTATE.SM or
when the callee's function body is executed in a different streaming mode than
its caller. This is needed because function calls are the boundaries for
streaming mode changes.

More details about the SME attributes and design can be found
in D131562.

Differential Revision: https://reviews.llvm.org/D131581
2022-09-21 09:35:47 +01:00
Craig Topper 70a64fe7b1 [RISCV] Remove support for the unratified Zbt extension.
This extension does not appear to be on its way to ratification.

Out of the unratified bitmanip extensions, this one had the
largest impact on the compiler.

Posting this patch to start a discussion about whether we should
remove these extensions. We'll talk more at the RISC-V sync meeting this
Thursday.

Reviewed By: asb, reames

Differential Revision: https://reviews.llvm.org/D133834
2022-09-20 20:26:48 -07:00
jacquesguan 1cbf44bd50 [RISCV] Support peephole optimization to fold vmerge.vvm that has tail agnostic policy and unmasked intrinsics.
This patch supports the tail agnostic part of D130442.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132923
2022-09-21 10:56:37 +08:00
Changpeng Fang 3ae4c3589e AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true
Summary:
   Under code object version 5, ockl_get_local_size returns the value computed by the expression:
workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder
For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count
as true, and thus hidden_group_size is returned for ockl_get_local_size.
  With uniform-workgroup-size=true, this work also set all remainders to zero, and if there
is reqd_work_group_size, we also set work-group-size to the required value from the metadata.

Reviewers:
  arsenm and bcahoon

Differential Revision:
  https://reviews.llvm.org/D131276
2022-09-20 17:25:52 -07:00
Scott Linder f583151461 [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use lit.local.cfg substitutions and more idiomatic use of split-file to
simplify and extend existing kernel-descriptor disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2022-09-20 20:37:19 +00:00
Anshil Gandhi a0c53524a5 [AMDGPU] Fix size of SOPK instructions to 4 bytes
Instructions in SOPK format may not have 32-bit
literal constants following the instruction.

Differential Revision: https://reviews.llvm.org/D133972
2022-09-20 14:27:09 -06:00
Matt Arsenault 28e03692ae AMDGPU: Fix expansion of 16-bit atomicrmw
Fixes issue 57830
2022-09-20 14:47:40 -04:00
Anton Sidorenko 3cd503f181 [NFC][RISCV] Move calculations of SDNode policy operand idx to a separate function
Since there is no guaranteed correspondence of SDNode and MI operands, we need
getters simular to RISCVII::get*OpNum for SDNodes.

More uses of getVecPolicyOpIdx will be added in D130895.

Reviewed By: craig.topper, arcbbb

Differential Revision: https://reviews.llvm.org/D134179
2022-09-20 10:36:47 -07:00
Philip Reames eda2af575f [RISCV][MC] Add support for experimental Zawrs extension
This implements experimental support for the Zawrs extension as specified here: https://github.com/riscv/riscv-zawrs/releases/download/V1.0-rc3/Zawrs.pdf. Despite the 1.0 version name, this has not been ratified and there was a major change to proposed specification between rc2 and rc3.  Once this is ratified, it'll move out of experimental status.

This change adds assembly support, but does not include C language or IR intrinsics. We can decide if we want them, and handle that in a separate patch.

Differential Revision: https://reviews.llvm.org/D133443
2022-09-20 10:15:11 -07:00
Jay Foad f19cc793d2 [AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11
This hazard only exists on GFX10.

Differential Revision: https://reviews.llvm.org/D134276
2022-09-20 17:40:49 +01:00
David Green cb375e8c1f [AArch64] Enable LSLFast for modern OoO cpus
This patch enables the LSLFast feature for Cortex-A76, Cortex-A77,
Cortex-A78, Cortex-A78C, Cortex-A710, Cortex-X1, Cortex-X2, Neoverse N1,
Neoverse N2, Neoverse V1 and the Neoverse 512TB pseudo-cpu, in-line with
the software optimization guides for those CPUs.

Differntial revision: https://reviews.llvm.org/D134273
2022-09-20 17:09:14 +01:00
Joe Nash b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00
Simon Pilgrim 0015edeefd Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. 2022-09-20 14:24:07 +01:00
Caroline Concatto d32b8fdbdb [LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret
This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of
using arch64.sve.ldN.sret.

Depends on: D133023

Differential Revision: https://reviews.llvm.org/D133025
2022-09-20 13:15:07 +01:00
gonglingqin 7328ff75ba [LoongArch] Add codegen support for fmaxnum_ieee and fminnum_ieee
Thanks for @xry111's previous bug fixes.
See https://github.com/loongson/llvm-project/pull/1 for more details.

Differential Revision: https://reviews.llvm.org/D133478
2022-09-20 19:22:32 +08:00
Simon Pilgrim 70582bc4d3 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warnings. NFCI. 2022-09-20 10:35:32 +01:00
Serge Pavlov 181279ffcd [X86][GlobalISel] Add support for sret demotion
The change add support for the cases when return value is passed in
memory rathen than in registers.

Differential Revision: https://reviews.llvm.org/D134181
2022-09-20 11:47:53 +07:00
Craig Topper 94049db913 [RISCV] Make computeIncomingVLVTYPE more conservative when merging predecessor state.
If we have already calculated the incoming state before, use that
as our starting point to ensure we are conservative.

This fixes an infinite loop found in our downstream where we
we allowed two waves of updates to propagate through a loop and
the merge points allowed us to toggle back and forth between states.
No small reproducer right now.

Differential Revision: https://reviews.llvm.org/D134229
2022-09-19 15:57:55 -07:00
Alexander Timofeev 2e8817b90a [AMDGPU] SIFixSGPRCopies reworking to use one pass over the MIR for analysis and lowering.
This change finalizes the series of patches aiming to replace the old strategy of VGPR to SGPR copy lowering.

  # Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed.
  # The first pass over the MachineFunctoin collects all the necessary information.
  # Lowering is done in 3 phases:
     - VGPR to SGPR copies analysis  lowering
     - REG_SEQUENCE, PHIs, and SGPR to VGPR copies lowering
     - SCC copies lowering is done in a separate pass over the Machine Function

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D131246
2022-09-19 23:31:45 +02:00
Craig Topper 0cec96ab25 [RISCV] Manage the InQueue flag in insertvli correctly.
We were only setting this flag the first time we added the blocks
not when we mark them for revisiting.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134193
2022-09-19 14:28:22 -07:00
Haojian Wu eec19987c0 Fix one more unused warning in release build, NFC 2022-09-19 20:56:39 +02:00
Haojian Wu 20822e2d42 Fix an unused warning in release build, NFC 2022-09-19 20:45:51 +02:00
Krzysztof Parzyszek 94a71361d6 [Hexagon] Implement [SU]INT_TO_FP and FP_TO_[SU]INT for HVX 2022-09-19 11:11:20 -07:00
Krzysztof Parzyszek ec51e38062 [Hexagon] Add HVX patterns for ISD::ABS 2022-09-19 10:12:15 -07:00
Krzysztof Parzyszek 3eee45cdc8 [Hexagon] Rework SplitHvxPairOp to be a general vector splitting utiity
Enable creating an idiom: V -> opJoin(SplitVectorOp(V))
2022-09-19 09:42:13 -07:00
Simon Pilgrim 6b4d409f69 [CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 17:37:58 +01:00
Krzysztof Parzyszek e5844462f6 [Hexagon] Use proper output chain when widening HVX loads 2022-09-19 09:04:13 -07:00
Simon Pilgrim 135c9b2c4b [CostModel][X86] Add CostKinds handling for vector ctlz instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 16:44:09 +01:00
Simon Pilgrim 2538adde5c [CostModel][X86] Add CostKinds handling for cttz
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 15:57:03 +01:00
Simon Pilgrim d90a42d64c [CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling
Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.
2022-09-19 14:06:33 +01:00
David Green 908b3b6ccb [AArch64] Use fast-math-flags in isAssociativeAndCommutative
Previously only using the UnsafeFPMath option, this now looks for the
fast moth flags on the instructions, using the same flag flags as other
backends.
2022-09-19 11:34:00 +01:00
LiaoChunyu 2e74157ad4 [RISCV]Preserve (and X, 0xffff) in targetShrinkDemandedConstant
shrinkdemandedconstant does some optimizations, but is not very friendly to riscv, targetShrinkDemandedConstant to limit the damage.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134155
2022-09-19 14:19:38 +08:00
LiaoChunyu 8fee91c435 [RISCV][NFC]Remove outdated comment from targetShrinkDemandedConstant
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134154
2022-09-19 10:23:06 +08:00
Kazu Hirata cf07277fb4 [X86] Fix the LEA optimization pass
The LEA optimization pass visits each basic block of a given machine
function.  In each basic block, for each pair of LEAs that differ only
in their displacement fields, we replace all uses of the second LEA
with the first LEA while adjusting the displacement.

Now, without this patch, after all the replacements are made, the
following assert triggers:

        assert(MRI->use_empty(LastVReg) &&
               "The LEA's def register must have no uses");

The replacement loop uses:

  for (MachineOperand &MO :
       llvm::make_early_inc_range(MRI->use_operands(LastVReg))) {

which is equivalent to:

  for (auto UI = MRI->use_begin(LastVReg), UE = MRI->use_end();
       UI != UE;) {
    MachineOperand &MO = *UI++;  // <-- Look!

That is, immediately after the post increment, make_early_inc_range
already has the iterator for the next iteration in its mind.

The problem is that in one iteration of the loop, we could replace two
uses in a debug instruction like:

  DBG_VALUE_LIST !"r", !DIExpression(DW_OP_LLVM_arg, 0), %0:gr64, %0:gr64, ...

So, the iterator for the next iteration becomes invalid.  We end up
traversing a garbage use list from that point on.  In turn, we don't
get to visit remaining uses.

The patch fixes the problem by switching to a "draining" while loop:

  while (!MRI->use_empty(LastVReg)) {
    MachineOperand &MO = *MRI->use_begin(LastVReg);
    MachineInstr &MI = *MO.getParent();

The credit goes to Simon Pilgrim for reducing the test case.

Fixes https://github.com/llvm/llvm-project/issues/57673

Differential Revision: https://reviews.llvm.org/D133631
2022-09-18 17:50:17 -07:00
Carl Ritson 930315f6aa [AMDGPU] Fix isSGPRReg for special registers
Special registers, e.g. MODE, do not have register classes so
will cause null pointer exception if passed to isSGPRReg.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134025
2022-09-19 08:49:43 +09:00
Kazu Hirata 20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Sander de Smalen bed214cf0f [AArch64][SME] Add intrinsics for enabling/disabling ZA.
This adds the intrinsics:
* void @llvm.aarch64.sme.za.enable() -> smstart za
* void @llvm.aarch64.sme.za.disable()  -> smstop za

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D133894
2022-09-17 16:41:42 +00:00
Sander de Smalen 5fae000f36 [AArch64][SME] Disable tail-call optimization when streaming mode change or lazy-save may be required.
When a streaming mode change is (or may be) required for a call, it will
need to restore the original mode after the call, which prevents the use of
tail-call optimization. The same holds true for a call that requires the lazy-save
mechanism to be set up before the call, and possibly restored after.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131579
2022-09-17 16:15:07 +00:00
Jessica Paquette 1076b31da8 [GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum
This is a partial port of the code used by the SelectionDAGBuilder to
translate selects.

In particular, see matchSelectPattern in ValueTracking.cpp. This is a
GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum.

I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler.
On the AArch64-end, it seems like the FP cases are more important for perf
right now, so I bit the bullet and went at the more complicated problem. :)

I elected to do this as a post-legalize combine rather than in the
IRTranslator because

Deciding which fmax/fmin to use can depend on legalization rules
Philosophically-speaking (TM), putting it in a combine just feels cleaner

Being able to enable/disable the combine is handy
Another option would be to use the ValueTracking code in the IRTranslator and
match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat
annoying since we'd need to write lowerings back into the selects in the
legalizer. I'm not strongly opposed to the approach.

We'd also want to be careful with vector selects once that's implemented,
which explicitly check if a vector select is legal on the target. That'd
probably need a hook.

From what I can tell, doing this as a combine is probably a cleaner option
long-term.

Differential Revision: https://reviews.llvm.org/D116702
2022-09-16 13:35:46 -07:00
Craig Topper 61595c45af [RISCV] Simplify some code in vector fp<->int handling. NFC
We changed the way container types are selected since this code
was written. We no longer need to use the largest type.
2022-09-16 12:56:42 -07:00
David Majnemer 8a868d8859 Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView""
This reverts commit cd20a18286 and adds a
"let Heading" to NoStackProtectorDocs.
2022-09-16 19:39:48 +00:00
Simon Pilgrim 23cb1c42cd [CostModel][X86] Update throughput costs for CTLZ ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models)

Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first
2022-09-16 16:56:49 +01:00
Dmitry Preobrazhensky ef8feb6359 [AMDGPU][MC][NFC] Correct error message
Differential Revision: https://reviews.llvm.org/D134028
2022-09-16 18:22:08 +03:00
Sander de Smalen bd4935c175 [AArch64][SME] Implement ABI for calls from streaming-compatible functions.
When a function is streaming-compatible and calls a function with a normal or streaming
interface, it may need to enable/disable stremaing mode before the call, and
needs to restore PSTATE.SM after the call.

This patch implements this with a Pseudo node that gets expanded to a
conditional branch and smstart/smstop node.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131578
2022-09-16 14:48:37 +00:00
Simon Pilgrim 89e4cb603d [X86] Add missing (unsupported) zmm vector move classes
Although unsupported on HSW, we reuse this model for KNL which does require them

Noticed when running the cost model fuzz script from D103695 with -mcpu=knl
2022-09-16 15:31:26 +01:00
Sander de Smalen b00c36c295 [AArch64][SME] Implement ABI for calls to/from streaming functions.
This patch implements the ABI for calls from:

  Normal -> Streaming
  Normal -> Streaming-compatible
  Streaming -> Normal
  Streaming -> Streaming-compatible
  Streaming -> Streaming

The compiler inserts SMSTART/SMSTOP instructions before and after the call,
depending on the required transition.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131576
2022-09-16 14:07:47 +00:00
Florian Hahn 6b86b481e3
[AArch64] Use tbl for truncating vector FPtoUI conversions.
On AArch64, doing the vector truncate separately after the fptoui
conversion can be lowered more efficiently using tbl.4, building on
D133495.

https://alive2.llvm.org/ce/z/T538CC

Depends on D133495

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133496
2022-09-16 14:57:43 +01:00
Simon Pilgrim f8fa04295f [CostModel][X86] Add CostKinds handling for vector integer comparisons
These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.
2022-09-16 13:03:41 +01:00
Florian Hahn 8491d01cc3
[AArch64] Lower vector trunc using tbl.
Similar to using tbl to lower vector ZExts, tbl4 can be used to lower
vector truncates.

The initial version support i32->i8 conversions.

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133495
2022-09-16 12:42:49 +01:00
Florian Hahn 5871f18827
[AArch64] Lower extending uitofp using tbl.
On AArch64, doing the zero-extend separately first can be lowered more
efficiently using tbl, building on D120571.

https://alive2.llvm.org/ce/z/8Je595

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133494
2022-09-16 10:20:25 +01:00
Philip Reames fdff1bb103 [RISCV] Verify merge operand is tied properly
Differential Revision: https://reviews.llvm.org/D133957
2022-09-15 13:06:52 -07:00
Philip Reames 32cfafddb1 [RISCV] Verify VL operand on instructions if present
These should only be immediate values or GPR registers.

Differential Revision: https://reviews.llvm.org/D133953
2022-09-15 13:06:52 -07:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Florian Hahn 81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Simon Pilgrim 94620e4fc3 [CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts
These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 16:51:58 +01:00
Jay Foad 3822a01e0b [AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
2022-09-15 16:46:14 +01:00
Matt Arsenault 69153d6c0a AMDGPU: Use GlobalPriority for largest register tuples
Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.

It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.

The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.
2022-09-15 11:45:02 -04:00
Sander de Smalen 45d28779c5 [AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()
A thread may not have access to SME or TPIDR2_EL0, so in order to
safely query PSTATE.SM in a streaming-compatible function, the
code should call `__arm_sme_state()`, as described in the ABI:

  c2bb09c4d4

This means that the value of pstate.sm is:
* 0 if the function is non-streaming.
* 1 if the function has `arm_streaming` or `arm_locally_streaming`.
* evaluated at runtime by a call to __arm_sme_state() otherwise.

This patch also adds a calling convention for calls to SME support routines.

At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic
and use function calls (with the corresponding cc) directly instead.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131571
2022-09-15 15:14:13 +00:00
Matt Arsenault 63d1d37d35 RegAllocGreedy: Avoid overflowing priority bitfields
The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.

AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.

Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.
2022-09-15 10:38:40 -04:00
Dmitry Preobrazhensky 0e868aff43 [AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD
Differential Revision: https://reviews.llvm.org/D133881
2022-09-15 16:36:19 +03:00
Dmitry Preobrazhensky c89e60bf1f [AMDGPU][MC][GFX11] Add VOPD literals validation
Differential Revision: https://reviews.llvm.org/D133864
2022-09-15 16:29:53 +03:00
Dmitry Preobrazhensky 8bb5c89205 [AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral
Differential Revision: https://reviews.llvm.org/D133861
2022-09-15 16:26:14 +03:00
Simon Pilgrim 0ec028fe10 [CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 14:05:30 +01:00
wanglei a65557d4b3 [LoongArch] Fixup value adjustment in applyFixup
A complete implementation of `applyFixup` for D132323.

Makes `LoongArchAsmBackend::shouldForceRelocation` to determine
if the relocation types must be forced.

This patch also adds range and alignment checks for `b*` instructions'
operands, at which point the offset to a label is known.

Differential Revision: https://reviews.llvm.org/D132818
2022-09-15 21:00:22 +08:00
Ivan Kosarev 693f816288 [AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D133787
2022-09-15 13:48:51 +01:00
Ilia Diachkov 3544d200d9 [SPIRV] add IR regularization pass
The patch adds the regularization pass that prepare LLVM IR for
the IR translation. It also contains following changes:
- reduce indentation, make getNonParametrizedType, getSamplerType,
getPipeType, getImageType, getSampledImageType static in SPIRVBuiltins,
- rename mayBeOclOrSpirvBuiltin to getOclOrSpirvBuiltinDemangledName,
- move isOpenCLBuiltinType, isSPIRVBuiltinType, isSpecialType from
SPIRVGlobalRegistry.cpp to SPIRVUtils.cpp, renaming isSpecialType to
isSpecialOpaqueType,
- implment getTgtMemIntrinsic() in SPIRVISelLowering,
- add hasSideEffects = 0 in Pseudo (SPIRVInstrFormats.td),
- add legalization rule for G_MEMSET, correct G_BRCOND rule,
- add capability processing for OpBuildNDRange in SPIRVModuleAnalysis,
- don't correct types of registers holding constants and used in
G_ADDRSPACE_CAST (SPIRVPreLegalizer.cpp),
- lower memset/bswap intrinsics to functions in SPIRVPrepareFunctions,
- change TargetLoweringObjectFileELF to SPIRVTargetObjectFile
in SPIRVTargetMachine.cpp,
- correct comments.
5 LIT tests are added to show the improvement.

Differential Revision: https://reviews.llvm.org/D133253

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-09-15 15:53:44 +03:00
esmeyi 6e0e926c2f [PowerPC] Converts to comparison against zero even when the optimization
doesn't happened in peephole optimizer.

Summary: Converting a comparison against 1 or -1 into a comparison
against 0 can exploit record-form instructions for comparison optimization.
The conversion will happen only when a record-form instruction can be used
to replace the comparison during the peephole optimizer (see function optimizeCompareInstr).

In post-RA, we also want to optimize the comparison by using the record
form (see D131873) and it requires additional dataflow analysis to reliably
find uses of the CR register set.

It's reasonable to common the conversion for both peephole optimizer and
post-RA optimizer.

Converting to comparison against zero even when the optimization doesn't
happened in peephole optimizer may create additional opportunities for the
post-RA optimization.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D131374
2022-09-15 06:06:25 -04:00
Marco Elver 72e7575ffe [GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests
Add fix for propagation of !pcsections metadata for expanded atomics,
together with more tests for interesting atomic instructions (based on
llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133710
2022-09-15 10:36:11 +02:00
Sheng bea33f75e2 [M68k] Fix the crash of fast register allocator
`MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size.

We change to use the normal `move` instruction to spill 1 byte data.

Fixes #57660

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D133636
2022-09-15 09:24:22 +08:00
Craig Topper 5888c157a7 [RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI
This code was written as if it lived in the MC layer instead of
the CodeGen layer. We get the MCInstrDesc directly from MachineInstr.
And we can use RISCVSubtarget::is64Bit instead of going to the
Triple.

Differential Revision: https://reviews.llvm.org/D133905
2022-09-14 17:07:21 -07:00
Philip Reames e395915ac0 [RISCV] Verify SEW/VecPolicy immediate values
Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see 0a14551.

Differential Revision: https://reviews.llvm.org/D133869
2022-09-14 14:45:16 -07:00
Philip Reames 0a145516a2 [RISCV] Fix a silent miscompile in copyPhysReg
Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively.

Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests.

Differential Revision: https://reviews.llvm.org/D133868
2022-09-14 14:45:01 -07:00
Piotr Sobczak abd927e5a8 [AMDGPU] Check for num elts in SelectVOP3PMods
The rest of the code section assumes there are exactly two elements
in the vector (Lo, Hi), so add the check before entering the section.

Differential Revision: https://reviews.llvm.org/D133852
2022-09-14 20:00:19 +02:00
David Spickett 3acaf04033 [LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used
SLH will fall back to a different technique if X16 is being used,
so there is no need to warn for inline asm use. Only prevent other codegen
from using it.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D133766
2022-09-14 15:19:53 +00:00
Zain Jaffal d1dec04d76
[AArch64] Disable nontemproal load for Big Endian
The current code for generating nontemporal load outputs the wrong assembly for big endian architecture.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133789
2022-09-14 14:49:55 +01:00
Simon Pilgrim 854a4595b6 [CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization
Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).
2022-09-14 11:46:26 +01:00
Simon Pilgrim 40ab7875f8 [CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts
Fixes regression from a931dbfbd3
2022-09-14 11:18:23 +01:00
Jon Chesterfield cdb9738963 [amdgpu] Expand all ConstantExpr users of LDS variables in instructions
Bug noted in D112717 can be sidestepped with this change.

Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133422
2022-09-14 07:55:46 +01:00
Ruiling Song 0404aafbe3 AMDGPU: Factor out hasDivergentBranch(). NFC
This is helpful for detecting whether a block ends with divergent branch
in passes before lowering the pseudo control flow instructions.

Differential Revision: https://reviews.llvm.org/D133184
2022-09-14 13:27:21 +08:00
Antonio Frighetto c63e05dc07 [AArch64InstPrinter] Introduce register markup tags emission
AArch64 assembly syntax emission now leverages markup tags for registers, if enabled.

Reviewed By: MaskRay, david-arm

Differential Revision: https://reviews.llvm.org/D129870
2022-09-13 20:52:02 -07:00
jacquesguan ecf327f154 [RISCV] Add cost model for vector insert/extract element.
This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133007
2022-09-14 11:10:18 +08:00
Shivam Gupta e2632fbcdd [NVPTX] Use MBB.begin() instead MBB.front() in NVPTXFrameLowering::emitPrologue
The second argument of `NVPTXFrameLowering::emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB)` is the first MBB of the MF. In that function, it assumes the first MBB always contains instructions, so it gets the first instruction by MachineInstr *MI = &MBB.front();. However, with the reproducer/test case attached, all instructions in the first MBB is cleared in a previous pass for stack coloring. As a consequence, MBB.front() triggers the assertion that the first node is actually a sentinel node. Hence we are using MachineBasicBlock::iterator to iterate over MBB.

Fix #52623.

Differential Revision: https://reviews.llvm.org/D132663
2022-09-14 08:30:55 +05:30
Yeting Kuo 1b56b2b267 [RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU.
The transformation is benefit because vmerge.vvm always needs mask operand but
vadd.vi may not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133255
2022-09-14 10:01:37 +08:00
Han-Kuan Chen dd53a0bb30 [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Differential Revision: https://reviews.llvm.org/D133688
2022-09-13 18:50:20 -07:00
gonglingqin 7fd7d48b4b [LoongArch] Categorize code by function. NFC.
Differential Revision: https://reviews.llvm.org/D133754
2022-09-14 09:48:26 +08:00
Fangrui Song ab1c259613 [RISCV] Assemble `call foo` to R_RISCV_CALL_PLT
R_RISCV_CALL/R_RISCV_CALL_PLT distinction isn't necessary. R_RISCV_CALL has been
deprecated as a resolution to
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/98 .

ld.lld and mold treat the two relocation types the same. GNU ld has a custom
handling for undefined weak functions which is unnecessary: calling an
unresolved undefined weak function is UB and GNU ld can handle the case without
a relocation error (such a function call is usually guarded by a zero value
check and should be allowed).

This patch assembles `call foo` to use R_RISCV_CALL_PLT instead of the
deprecated R_RISCV_CALL.

Note: the code generator still differentiates `call foo` and (maybe preemptible)
`call foo@plt`, but the difference is purely aesthetic.

Note: D105429 does not support R_RISCV_CALL_PLT correctly. Changed the test to
force R_RISCV_CALL for now.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D132530
2022-09-13 18:47:55 -07:00
Fanchen Kong 28557e8c98 [WebAssembly] Improve codegen for shuffles with undefined lane indices
For undefined lane indices, fill the mask with {0..N} instead of zeros to allow
further reduction to word/dword shuffle on the VM.

Reviewed By: tlively, penzn

Differential Revision: https://reviews.llvm.org/D133473
2022-09-13 16:03:18 -07:00
Philip Reames 09d73fe8cd [RISCV] Add MIR comments for VecPolicy operands
Analogous to what we already do for SEW operands, aimed at making the resulting MIR readable by a human.
2022-09-13 15:36:33 -07:00
Philip Reames cc45687e1c [RISCV] Simpify operand index calculation in createMIROperandComment [nfc] 2022-09-13 15:06:40 -07:00
Xiang Li 1e3d4c4344 [DirectX backend] Remove Attribute not for DXIL on CallInst
Remove Attribute on CallInst which is not for DXIL when prepare for DXIL.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D133279
2022-09-13 13:45:19 -07:00
Jay Foad 2e8863b6a1 [AMDGPU] Don't shrink VOP3 instructions pre-RA on GFX10+
In GFX10, there is no advantage to shrinking these instructions pre-RA,
so this just saves a bit of work.

In GFX11 there is an advantage to *not* shrinking them pre-RA, because
the register classes for 16-bit operands are less restrictive in the
VOP3 form than in the shrunk form. This patch is a prerequisite for
actually setting up those register classes correctly for 16-bit vs
non-16-bit operands.

Differential Revision: https://reviews.llvm.org/D133769
2022-09-13 20:26:08 +01:00
Craig Topper 8d7e73effe [RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl.
Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15>
where half the elements are returned, can be lowered using vnsrl.

SelectionDAGBuilder lowers such shuffles as a build_vector of
extract_elements since the mask has less elements than the source.
To fix this, I've enable the extractSubvectorIsCheapHook to allow
DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding
the shufffle.

I've gone very conservative on extractSubvectorIsCheapHook to minimize
test impact and match what we have test coverage for. This can be
improved in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133736
2022-09-13 11:07:11 -07:00
Alex Bradbury c44c1e9d3e [RISCV] Implement isMaskAndCmp0FoldingBeneficial hook
This hook is currently only used by CodeGenPrepare, which will sink *and
duplicate* an 'and' into a block that has an 'icmp 0' user of it if the
hook returns true.

This hook is less useful for RISC-V than for targets like AArch64 that
have a TBZ (test bit and branch if zero instruction), but may still be
profitable if Zbs is available and a BEXTI can be selected.

Conservatively, we return false even if Zbs is enabled for any masks
that fit in the ANDI immediate because it's possible the only use is a
branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable
transformation.

Differential Revision: https://reviews.llvm.org/D131492
2022-09-13 18:54:00 +01:00
Alex Bradbury 547160848c [RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation
As the Zbs extension includes bext[i] for bit extract, we can
unconditionally return true from this hook. This hook causes the DAG
combiner to perform the following canonicalisation:

  and (not (srl X, C)), 1 --> (and X, 1<<C) == 0
  and (srl (not X), C)), 1 --> (and X, 1<<C) == 0

As simply changing the hook causes a codegen regression, this patch also
modifies a BEXTI pattern to match this canonicalised form.

As BSETINVMask is now used for BEXT as well as BSET and BINV, it has
been renamed to the more generic SingleBitSetMask.

There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI
rather than NOT+SRLIW). This is neutral in terms of code quality.

Differential Revision: https://reviews.llvm.org/D131482
2022-09-13 16:51:47 +01:00
Craig Topper 5224bae613 [RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64.
We use the saturating behavior of fcvt.wu.h/s/d but forgot to
take into account that fcvt.wu will sign extend the saturated
result. According to computeKnownBits a promoted FP_TO_UINT_SAT
is expected to zero extend the saturated value.

In many case the upper bits aren't be demanded so this wouldn't
be an issue. But if we computeKnownBits caused an AND to be removed
it would be a bug.

This patch inserts an AND during to zero the upper bits.

Unfortunately, this pessimizes code if we aren't able to tell if
the upper bits are demanded. To fix that we could custom type
promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll
leave that for future work.

I haven't found a failure from this, I was revisiting the code to
add vector support and spotted it.

Differential Revision: https://reviews.llvm.org/D133746
2022-09-13 08:41:32 -07:00
David Green 993b203b6a [AArch64] Sink splat(s/zext(..)) to uses
If the Shuffle is a splat and the operand is a zext/sext, sinking the
operand and the s/zext can help create indexed s/umull. This is
especially useful to prevent i64 mul being scalarized.

Differential Revision: https://reviews.llvm.org/D133355
2022-09-13 15:47:41 +01:00
David Spickett 0b8a44388e [llvm][AArch64] Explain why certain registers are reserved on Arm64EC
This extends 4658366d95 to add a note
explaining why the register is reserved.

note: x13 is clobbered by asynchronous signals when using Arm64EC.

I've added testing for w/x registers and v/q/s/d and h floating point
registers.

llvm will accept, but silently do nothing with, b registers. So they
are not tested here (clang rejects them so at least for C you're safe anyway).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D133701
2022-09-13 10:13:06 +00:00
Dmitry Preobrazhensky a80116efec [AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions
Differential Revision: https://reviews.llvm.org/D133608
2022-09-13 12:41:39 +03:00
Dmitry Preobrazhensky 815ba49068 [AMDGPU][MC] Add detection of mandatory literals in parser
Differential Revision: https://reviews.llvm.org/D133606
2022-09-13 12:37:30 +03:00
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
Zi Xuan Wu (Zeson) 955e6ac499 [CSKY] Fix the Predicates of instruction selection
Some select node Pattern with register cmp instruction should be guarded
by iHas2E3.
2022-09-13 15:02:22 +08:00
Haojian Wu 7ed68182d7 Fix a -Wswitch warning. 2022-09-13 08:57:43 +02:00
jacquesguan b98b4fae75 [RISCV] Add cost model for compare and select instructions.
This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132296
2022-09-13 14:44:46 +08:00
Yeting Kuo 5fcb5d7759 [RISCV] Add assertion of hasVecPolicyOp to catch masked intrinsic without policy operand.
The original code may have incorrect result if there is a masked instruction
without policy operand to make us set its policy to TUMU. The patch adds an
assertion to catch the instruction.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133302
2022-09-13 10:09:49 +08:00
gonglingqin 3de3439bd7 [LoongArch] Add codegen support for ISD::FMA
Differential Revision: https://reviews.llvm.org/D133281
2022-09-13 10:04:41 +08:00
David Majnemer ab56719acd [clang, llvm] Add __declspec(safebuffers), support it in CodeView
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)).  This information is recorded in
CodeView.

While we are here, add support for strict_gs_check.
2022-09-12 21:15:34 +00:00
Kazu Hirata 9606608474 [llvm] Use x.empty() instead of llvm::empty(x) (NFC)
I'm planning to deprecate and eventually remove llvm::empty.

I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty().  That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.

Differential Revision: https://reviews.llvm.org/D133677
2022-09-12 13:34:35 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Craig Topper d49280e0a4 [RISCV] Rename WriteFALU* and ReadFALU* to WriteFAdd*/ReadFAdd*.
ALU seems a little vague. FAdd felt more precise even though it
also include FSUB instructions.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D133632
2022-09-12 09:37:28 -07:00
Craig Topper 4186a49d79 [RISCV] Custom type legalize i32 loads by sign extending.
The default is to use extload which can become a zextload or
sextload if it is followed by an 'and' or sext_inreg.

Sometimes type legalization will introduce an 'and' from promoting
something like 'srl X, C' and a sext_inreg from from a setcc. The
'and' could be freely folded with the promoted 'srl' by using srliw,
but the sext_inreg can't be folded into a compare. DAG combiner
will see both of these choices and may decide to fold the 'and'
instead of the 'sext_inreg'. This forces the sext_inreg to become
a sext.w.

By picking sextload in the type legalizer we take this choice away.
Looking at spec2006 compiled with Zba and Zbb this appeared to be
net reduction in lines of code in the objdump disassembly output.

This is similar to what we do with i32 add/sub/mul/shl in
type legalization where we always emit a sext_inreg.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130397
2022-09-12 09:13:07 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Simon Pilgrim 20ad05f9b4 [CostModel][X86] Add CostKinds handling for abs ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695
2022-09-12 16:34:37 +01:00
Matt Arsenault 7834194837 TableGen: Introduce generated getSubRegisterClass function
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
2022-09-12 09:03:37 -04:00
Sander de Smalen cf72dddaef [AArch64][SME] Add utility class for handling SME attributes.
This patch adds a utility class that will be used in subsequent patches
for parsing the function/callsite attributes and determining whether
changes to PSTATE.SM are needed, or whether a lazy-save mechanism is
required.

It also implements some of the restrictions on the SME attributes
in the IR Verifier pass.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: david-arm, aemerson

Differential Revision: https://reviews.llvm.org/D131570
2022-09-12 12:41:30 +00:00
Simon Pilgrim bd0109f392 [CostModel][X86] Move AVX512/AVX2 uniform shift costs into the generic uniform cost tables
They shouldn't be happening after XOP shift costs - AVX2 shift supports takes preference over XOP for everything but vXi8 shifts - the improvement is pretty limited as it only affects bdver4 targets but it does help clean up a fraction of the messy shift cost logic....
2022-09-12 12:08:42 +01:00
David Spickett 739b69e655 [LLVM][AArch64] Explain that X19 is used as the frame base pointer register
Fixes #50098

LLVM uses X19 as the frame base pointer, if it needs to. Meaning you
can get warnings if you clobber that with inline asm.

However, it doesn't explain why. The frame base register is not part
of the ABI so it's pretty confusing why you get that warning out of the blue.

This adds a method to explain a reserved register with X19 as the first one.
The logic is the same as getReservedRegs.

I could have added a return parameter to isASMClobberable and friends
but found that there's a lot of things that call isReservedReg in various
ways.

So while one more method on the pile isn't great design, it is simpler
right now to do it this way and only pay the cost if you are actually using
a reserved register.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D133213
2022-09-12 09:18:09 +00:00
Johannes Doerfert c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert 4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Simon Pilgrim a931dbfbd3 [CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table
We only need to handle the uniform cases early
2022-09-10 18:18:42 +01:00
Simon Pilgrim 10edf88458 [CostModel][X86] Update CTPOP costs
With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont
2022-09-10 17:57:20 +01:00
Simon Pilgrim 4994f87ca1 [X86] Fix bdver2 128-bit shuffles throughputs
Noticed while trying to get vector ctpop/ctlz/cttz costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2

Matches AMD 15h SoG, Agner and instlatx64
2022-09-10 17:34:40 +01:00
Simon Pilgrim 7785bd34e7 [X86] Fix bdver2 128-bit ALU/logic/shift throughputs
Noticed while trying to get vector shifts costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2

Matches AMD 15h SoG, Agner and instlatx64
2022-09-10 16:23:29 +01:00
Mingming Liu 8aa800614b [AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR.
Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers.

See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion.

Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g.,
returning zero cost if type is legalized to scalar).

[1] 2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)
[2] 2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)

Differential Revision: https://reviews.llvm.org/D128302
2022-09-09 09:47:30 -07:00
zhongyunde 7a81782585 [AArch64][CodeGen]Fold the mov and lsl into ubfiz
Fix the issue exposed by D132322, depand on D132939
Reviewed By: efriedma, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D132325
2022-09-09 23:50:29 +08:00
Jay Foad 8901f7cebc [AMDGPU] Fix crash legalizing G_EXTRACT_VECTOR_ELT with negative index
Fixes https://github.com/llvm/llvm-project/issues/57408

Differential Revision: https://reviews.llvm.org/D132938
2022-09-09 15:53:34 +01:00
Simon Pilgrim 05f56f10ed [X86] Fix VPPERM load folding latency
Noticed while investigating BITREVERSE cost numbers with the D103695 script - VPPERM folded loads was using the WriteVarShuffleX defaults and was missing an override like the VPPERM reg-reg variants
2022-09-09 13:57:39 +01:00
Dmitry Preobrazhensky 6b79610fd5 [AMDGPU][MC][GFX11][NFC] Correct VOPD parsing
Differential Revision: https://reviews.llvm.org/D133492
2022-09-09 13:03:29 +03:00
Simon Pilgrim 55b78e28d8 [CostModel][X86] Add missing i8 throughput cost 2022-09-09 10:58:51 +01:00
gonglingqin da8c9521ee [LoongArch] Add codegen support for frint
According to the revised description in `LoongArch Reference Manual v1.02`,
frint.[s/d] does not judge whether floating-point inexact exceptions are
allowed indicated by FCSR, i.e. always executes roundToIntegralExact(x).
What's more, the manual also specifically defines that frint.s/d is only
necessary to be defined in LA64. So ISD::FRINT is legal for LA64.

Differential Revision: https://reviews.llvm.org/D133337
2022-09-09 14:25:34 +08:00
Sheng 88bdc4687d [NFC][M68k] Correct debug message. 2022-09-09 10:58:37 +08:00
Alex Bradbury 51ae462447 [RISCV] Add the GlobalMerge pass (disabled by default)
Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//.

Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag.

Reviewed By: craig.topper, reames, hiraditya

Differential Revision: https://reviews.llvm.org/D130481
2022-09-08 18:40:38 -07:00
Fangrui Song f9b5924975 [AArch64] Fix -Wunused-variable. NFC 2022-09-08 18:27:16 -07:00
zhongyunde b6655333c2 [Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero
Restrict the 32-bit form of an instruction of integer as too many test cases
will be clobber as the register number updated.
    From %reg = INSERT_SUBREG %reg, %subreg, subidx
    To   %reg:subidx =  SUBREG_TO_REG 0, %subreg, subidx
Try to prefix the redundant mov instruction at D132325 as the SUBREG_TO_REG should not generate code.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D132939
2022-09-09 09:00:54 +08:00
Craig Topper 5f3a8b585b [RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133511
2022-09-08 12:34:59 -07:00
David Green 3875c38adf [AArch64] Fix formatting of the Shuffle Cost tables. NFC 2022-09-08 19:54:12 +01:00
Jay Foad afa0ed33df [AMDGPU] Fix shrinking of F16 FMA on newer subtargets
D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in
SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have
a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of
the opcode which is used on GFX9+.

Differential Revision: https://reviews.llvm.org/D133489
2022-09-08 16:41:04 +01:00
Jonas Paulsson de0e3117d4 [SystemZ] Improve handling of vector alignments.
Make the DataLayout string always hold a vector alignment of 8 bytes,
regardless of the vector ABI. This makes the datalayout depend only on the
target triple which is the general expectation (in assertions).

On older architectures where vectors use the natural alignment (16 bytes),
the front end will maintain the same behavior and produce an overalignment
compared to the datalayout.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D131158
2022-09-08 17:33:05 +02:00
Thomas Lively ac3b8df8f2 [WebAssembly] Prototype `f32x4.relaxed_dot_bf16x8_add_f32`
As proposed in https://github.com/WebAssembly/relaxed-simd/issues/77. Only an
LLVM intrinsic and a clang builtin are implemented. Since there is no bfloat16
type, use u16 to represent the bfloats in the builtin function arguments.

Differential Revision: https://reviews.llvm.org/D133428
2022-09-08 08:07:49 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Krzysztof Parzyszek 3c817574c2 [Hexagon] Handle shifts of short vectors of i8 2022-09-08 07:52:16 -07:00
Ivan Kosarev 57c943d581 [AMDGPU] Only raise wave priority if there is a long enough sequence of VALU instructions.
Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D124671
2022-09-08 15:21:30 +01:00
liqinweng 723245bfac [AARCH64][COST] Improve cost of reverse shuffles for AArch64
Update the comments for reverse shuffles and add tests

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132730
2022-09-08 18:55:49 +08:00
liqinweng 9b4e75ee76 [RISCV][COST] Add cost model for mask vector select instruction when its condition is a scalar type
Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D132992
2022-09-08 18:55:49 +08:00
David Spickett e428baf001 [LLVM][ARM] Remove options for armv2, 2A, 3 and 3M
Fixes #57486

These pre v4 architectures are not specifically supported
by codegen. As demonstrated in the linked issue.

GCC has not supported 3M since GCC 9 and presumably
2 and 2A earlier than that. So we are aligned in that sense.

(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add)

This removes the options and associated testing.

The Pre_v4 build attribute remains mainly because its absence
would be more confusing. It will not be used other than to
complete the list of build attributes as shown in the ABI.

https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes

Reviewed By: nickdesaulniers, peter.smith, rengolin

Differential Revision: https://reviews.llvm.org/D133109
2022-09-08 09:49:48 +00:00
gonglingqin d5f7a2182d [LoongArch] Add codegen support for atomicrmw xchg operation on LA32
Depends on D131228

Differential Revision: https://reviews.llvm.org/D131229
2022-09-08 13:57:53 +08:00
gonglingqin b60f801607 [LoongArch] Add codegen support for atomicrmw xchg operation on LA64
In order to avoid the patch being too large, the atomicrmw xchg operation
on LA32 will be added later

Differential Revision: https://reviews.llvm.org/D131228
2022-09-08 13:57:26 +08:00
Justin Bogner a81c7dbf0d [AMDGPU] Drop _oneuse checks from med3 patterns
We use _oneuse checks to make sure combines won't accidentally
increase code size, but this prevents the optimization in cases where
we happen to want to clamp multiple values to the same range

It's safe to drop these checks for two reasons:

1. The pattern of max/min operations for med3 is complicated enough
   it's unlikely to come up by accident, so this will still only fire
   when appropriate to do so
2. Even if every intermediate is used and we don't save a single
   operation, we still won't end up with more operations since the
   med3 replaces the final max/min.

In pathological cases we could potentially end up with a larger
encoding size or possibly slightly increased vgpr pressure, but the
risk of that is low, especially considering the upside.

Differential Revision: https://reviews.llvm.org/D132621
2022-09-07 16:31:49 -07:00
Stanislav Mekhanoshin fb28bf3fb4 [AMDGPU] Fix liveness verifier error in hazard recognizer
After D133067 we are inserting swaps to use a new physical
register. I have noticed verifier errors about undefined
physical register uses if we are tracking liveness post RA.

We have no access to LIS at this point, so mark new register
uses as undef to calm down the verifier. Liveness should not
matter at this point anyway.

Note the description of the RegState::Undef: "Value of the
register doesn't matter." I.e. it does not say it is strictly
undefined. In fact that is what we really need: this value
does not matter.

I also had to modify the test a bit since with tracking enabled
it does not pass verification even before the recognizer.

Differential Revision: https://reviews.llvm.org/D133459
2022-09-07 16:30:36 -07:00
Krzysztof Parzyszek c37acb6426 [Hexagon] Move vectorization checks from subtarget to TTI 2022-09-07 14:47:24 -07:00
Stanislav Mekhanoshin 95d497ff2a [AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
In this case gfx90a uses v0 instead of the correct register. Swap
the value temporarily with a lower register and then swap it back.

Unfortunately hazard recognizer works after wait count insertion,
so we cannot simply reuse an arbitrary register, hence w/a also
includes a full waitcount. This can be avoided if we run it from
expandPostRAPseudo, but that is a complete misplacement.

Differential Revision: https://reviews.llvm.org/D133067
2022-09-07 14:23:49 -07:00
Jon Chesterfield 23f6c8d635 [amdgpu] Always, instead of mostly, remove unused LDS symbols
Currently LDS variables are removed by the lower module pass
if they have a use which is caught by the replace with struct control flow.
This makes tests brittle to changes to that control flow which induces
noise when trying to improve lowering. Some tests already check that
variables are removed, while others checked that they are not removed.

LDS variables are not (currently) externally accessible, and if that
changes the machinery which makes them externally accessible will look
like a use. This change therefore breaks no applications.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133028
2022-09-07 18:28:16 +01:00
Philip Reames a4a29438f4 [RISCV][MC] Add minimal support for Ztso extension
This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model.

This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf.

Differential Revision: https://reviews.llvm.org/D133239
2022-09-07 09:30:57 -07:00
Simon Pilgrim e74102a963 [CostModel][X86] Merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost
For the few non type based intrinsic cases we can just check for !isTypeBasedOnly() to access the args directly.

I don't think we have a need to keep getTypeBasedIntrinsicInstrCost in BasicTTIImpl.h any more and can do a similar merge there as well - but it's a messier refactor and will take a while.
2022-09-07 12:04:09 +01:00
Jay Foad 96dfa523c2 [AMDGPU] Refactor SIFoldOperands. NFC.
Refactor static functions into class methods so they have access to TII, MRI
etc.
2022-09-07 11:05:01 +01:00
Marco Elver 31a548021b [GlobalISel] Propagate PCSections metadata to MachineInstr
Propagate (most) PC sections metadata to MachineInstr when GlobalISel is
doing instruction selection.

This change results in support for architectures using GlobalISel (such
as -O0 with AArch64). Not all instructions may be supported yet, and
requires further target-specific handling (such as done for AArch64
pseudo-atomics). Expanding supported instructions is planned on a
case-by-case basis and new use cases for PC sections metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130886
2022-09-07 11:36:02 +02:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Jay Foad 5291c3dd36 [AMDGPU] Simplify mad/mac patterns. NFC.
Simplify instruction selection patterns for mad/mac:
- Use any_fmad consistently to make it clear that all patterns treat
  fmad and AMDGPUfmad_ftz identically.
- For mad, put the patterns on the instruction definitions. For mac the
  patterns are still out-of-line because we want to set AddedComplexity
  and to have special handling of the source modifiers.

Differential Revision: https://reviews.llvm.org/D133305
2022-09-07 09:58:28 +01:00
Zi Xuan Wu (Zeson) 162131257f [CSKY] Fix the compiling error about missing Log2 function with Log2_64 2022-09-07 14:49:40 +08:00
Xiang1 Zhang c836ddaf72 [X86][NFC] Refine load/store reg to StackSlot for extensibility
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D133078
2022-09-07 14:35:42 +08:00
Simon Pilgrim 648e182d92 [CostModel][X86] getIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers

This should allow us to merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost and remove all remaining references
2022-09-06 22:05:32 +01:00
raghavmedicherla 57f01fee1e [AMDGPU/Metadata] Rename HSAMD::MetadataStreamer classes
Renamed all HSAMD::MetadataStreamer classes to improve readability of the code.

Differential Revision: https://reviews.llvm.org/D133156
2022-09-06 16:46:37 -04:00
Alexander Shaposhnikov 6a2442e9be [AArch64] Increase AddedComplexity of BIC
This diff adjusts AddedComplexity of BIC to bump its position
in the list of patterns to make LLVM pick it instead of MVN + AND.
MVN + AND requires 2 cycles, so does e.g. MOV + BIC, but the latter
outperforms the former if the instructions producing the operands of
BIC can be issued in parallel.

One may consider the following example:

ldur x15, [x0, #2] # 4 cycles
mvn x10, x15 # 1 cycle (depends on ldur)
and x9, x10, #0x8080808080808080

vs.

ldur x15, [x0, #2] # 4 cycles
mov x9, #0x8080808080808080 # 1 cycle (can be executed in parallel with ldur)
bic x9, x9, x15. # 1 cycle

Test plan: ninja check-all

Differential revision: https://reviews.llvm.org/D133345
2022-09-06 20:31:24 +00:00
Markus Böck f049b2c3fc [MC] Emit Stackmaps before debug info
This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708
2022-09-06 20:20:56 +02:00
Guozhi Wei 3cf4ab5447 [AArch64] Add an option to reserve physical registers from RA
This patch adds an option --reserve-regs-for-regalloc, so we can reserve a list
of physical registers. These registers will not be used by register allocator,
but can still be used as ABI requests such as passing arguments to function
call.

Its main purpose is simulating high register pressure by reserving many physical
registers. So it will be much easier to test and debug register allocation
changes.

Differential Revision: https://reviews.llvm.org/D132717
2022-09-06 17:18:01 +00:00
Craig Topper 5d30565d80 [RISCV] Improve vector fround lowering by changing FRM.
This is a follow up to D133238 which did this for ceil/floor.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D133335
2022-09-06 09:33:13 -07:00
Simon Pilgrim 10e0f3e948 [CostModel][X86] Add CostKinds handling for ctpop ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples.
2022-09-06 17:27:24 +01:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Simon Pilgrim 83552e8c72 [CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates
These require special handling to account for their expansion in lowering.

I'm trying very hard not to have to add predicate specific costs - but it might be inevitable.....
2022-09-06 12:05:22 +01:00
John Brawn e26cadcc32 [ARM] Constant pools need 4-byte alignment if we only have tADR
When the only ADR instruction we have is the 16-bit thumb one then all
constant pool entries need to be 4-byte aligned, as tADR has an offset
that's a multiple of 4.

It looks like previously there happened to be no situations in which
we encountered a constant pool entry with alignment less than 4, so
failing to do this didn't cause any problems, but the expansion of
cttz to a table added by D128911 does use a constant pool with
alignment 1, so we now need to handle it correctly.

Differential Revision: https://reviews.llvm.org/D133199
2022-09-06 11:36:12 +01:00
Simon Pilgrim c1b5e36d74 [CostModel][X86] Add CostKinds handling for fcmp ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage.
2022-09-06 10:34:53 +01:00
Freddy Ye d5fa8b1c2c [X86] Support SAE for VCVTPS2PH from intrinsic.
For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph:
https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be
update soon.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D132641
2022-09-06 11:28:12 +08:00
Craig Topper f0332d12ae [RISCV] Improve vector fceil/ffloor lowering by changing FRM.
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238
2022-09-05 19:03:44 -07:00
gonglingqin 067aab0a85 [LoongArch] Fix annotations not matching predicates. NFC. 2022-09-06 09:14:20 +08:00
Eli Friedman 2b9cec6244 [ARM64EC 5/?] Fix names of __chkstk and __security_check_cookie.
Part of initial Arm64EC patchset.

Arm64EC code needs to use functions with a different name, to avoid
using the x64 versions.

Differential Revision: https://reviews.llvm.org/D125417
2022-09-05 13:19:54 -07:00
Eli Friedman 5637ec0983 [ARM64EC 4/?] Add LLVM support for varargs calling convention.
Part of patchset to add initial support for ARM64EC.

The ARM64EC calling convention is the same as ARM64 for non-varargs
functions, but for varargs, the convention is significantly different.
Basically, only x0-x3 registers are used for passing arguments, and x4
and x5 describe the address/size of the arguments passed in memory. (See
https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec-abi for
more details; see
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention for
the x64 calling convention rules, which this convention needs to match.)

Note that this currently doesn't handle i128 arguments correctly; as
noted in review, that's sort of complicated to handle, so I'm leaving it
for a followup.

Differential Revision: https://reviews.llvm.org/D125415
2022-09-05 13:05:48 -07:00
Eli Friedman 4658366d95 [ARM64EC 3/?] Mark reserved registers specific to ARM64EC ABI.
Part of patchset to add initial support for ARM64EC.

I'm not completely sure I understand the reason for this restriction,
but Microsoft documentation says that asynchronous signals clobber these
registers, so we can't ever use them.

As far as I know, none of these registers have any hardcoded meaning, so
reserving them shouldn't have any significant side-effects.

Differental Revision: https://reviews.llvm.org/D125413
2022-09-05 12:59:39 -07:00
Eli Friedman 63335afb4e [ARM64EC 2/?] Add target triple, and allow targeting it.
Part of patchset to add initial support for ARM64EC.

Per discussion on review, using the triple arm64ec-pc-windows-msvc. The
parsing works the same way as Apple's alternate Arm ABI "arm64e".

Differential Revision: https://reviews.llvm.org/D125412
2022-09-05 12:27:10 -07:00
David Green 3c6edc0b2f [AArch64][GlobalISel] Recognise some CCMPri
This is a simple addition to emitConditionalComparison, to match CCMP
with immediates using getIConstantVRegValWithLookThrough, letting it
select the CCMPri variants of the instructions.

Differential Revision: https://reviews.llvm.org/D131073
2022-09-05 19:43:23 +01:00