llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	90f7f24b20	try to fix build yet more after `16544cbe64`	2022-09-28 15:40:52 -04:00
Baptiste	b556726ccc	[AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop. The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixes that case. Differential Revision: https://reviews.llvm.org/D130313	2022-09-28 13:05:50 -04:00
Jon Chesterfield	35f2584ef9	[amdgpu] Error, instead of miscompile, anonymous kernels using lds The association between kernel and struct is done by symbol name. This doesn't work robustly for anonymous kernels as shown by the modified test case. An alternative association between function and struct can be constructed if necessary, probably though metadata, but on the basis that we currently miscompile anonymous kernels and that they are difficult to construct from application code and difficult to call from the runtime, this patch makes it a fatal error for now. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134741	2022-09-28 16:30:04 +01:00
Matt Arsenault	7a84624079	AMDGPU: Make various vector undefs legal Surprisingly these were getting legalized to something zero initialized. This fixes an infinite loop when combining some vector types. Also fixes zero initializing some undef values. SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking for the legality of the output undefs they are replacing unused operations with. This resulted in turning vectors into undefs that were later re-legalized back into zero vectors.	2022-09-28 10:48:52 -04:00
Matt Devereau	0a4771a7e8	[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits For gathers which load in 8 and 16 bit data then use that data as an index, the index can be extended to 32 bits instead of 64 bits Differential Revision: https://reviews.llvm.org/D130692	2022-09-28 14:42:12 +00:00
Florian Hahn	eba84971ae	Revert "[AARCH64][CostModel] Modified the cost of mask vector load/store" This reverts commit `1c62af3e23`. The commit causes the test below to fail. Revert for now to get the bots back to green. Failing test: lvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll	2022-09-28 15:35:13 +01:00
Florian Hahn	2d3c260362	[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use 256-bit loads instead. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133421	2022-09-28 15:20:26 +01:00
Jon Chesterfield	80ba432821	[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance. This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127052	2022-09-28 14:55:16 +01:00
Archibald Elliott	ff4027d152	[ARM] Support fp16/bf16 using t constraint fp16 and bf16 values can be used in GCC's inline assembly using the "t" constraint, which means "VFP floating-point registers s0-s31" - fp16 and bf16 values are stored in S registers too. This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 't' constraint. Fixes #57753 Differential Revision: https://reviews.llvm.org/D134553	2022-09-28 14:48:21 +01:00
liqinweng	1c62af3e23	[AARCH64][CostModel] Modified the cost of mask vector load/store Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D134413	2022-09-28 19:40:29 +08:00
Carl Ritson	266b5dbc5d	[AMDGPU] Add MIMG NSA threshold configuration attribute Make MIMG NSA minimum addresses threshold an attribute that can be set on a function or configured via command line. This enables frontend tuning which allows increased NSA usage where beneficial. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134780	2022-09-28 20:03:18 +09:00
Simon Pilgrim	759bedade5	Fix MSVC "not all control paths return a value" warning. NFCI.	2022-09-28 10:56:37 +01:00
wanglei	983a0ae5cf	[LoongArch] Specify registers used in DWARF exception handling Defines LoongArch registers for getExceptionPointerRegister() and getExceptionSelectorRegister(). Differential Revision: https://reviews.llvm.org/D134709	2022-09-28 17:53:16 +08:00
Cullen Rhodes	3918ef07c4	[AArch64][SVE] Remove redundant ptest after match/nmatch These instructions are flag setting so the ptest is redundant, the TableGen class wasn't setting the element size for the predicate causing the checks in AArch64InstrInfo::optimizePTestInstr to fail.	2022-09-28 08:23:23 +00:00
eopXD	9677d70eb2	[VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134759	2022-09-27 19:45:58 -07:00
gonglingqin	95d2367647	[LoongArch] Expand FSIN/FCOS/FSINCOS/FPOW/FREM Differential Revision: https://reviews.llvm.org/D134628	2022-09-28 09:42:41 +08:00
Florian Mayer	979db5343f	[HWASan] [NFC] use auto* over auto& for pointers	2022-09-27 18:19:25 -07:00
Han-Kuan Chen	c595c874cb	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133688	2022-09-27 17:25:34 -07:00
Philip Reames	f6d110e26f	[LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc] This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.	2022-09-27 15:55:44 -07:00
Quentin Colombet	ce35e8b426	[RISCV][ISel] Remove the commutative flag on SUB I wasn't able to produce a testcase for that because right now VWSUB is only generated from VWSUB_W and from there to trigger the commutative bug we would need to grab VWSUB where the splat value is on the LHS, which is currently not matched. Differential Revision: https://reviews.llvm.org/D134701	2022-09-27 20:15:01 +00:00
Philip Reames	b54c571a01	[RISCV] Extend strided load/store pattern matching to non-loop cases The motivation here is to enable a change I'm exploring in vectorizer to prefer base + offset_vector addressing for scatter/gather. The form the vectorizer would end up emitting would be a gep whose vector operand is an add of the scalar IV (splated) and the index vector. This change makes sure we can recognize that pattern as well as the current code structure. As a side effect, it might improve scatter/gathers from other sources. Differential Revision: https://reviews.llvm.org/D134755	2022-09-27 12:56:58 -07:00
eopXD	163cb33854	[VP][RISCV] Add vp.ceil and RISC-V support Previous commit `8b00b24f85` missed to add `int_ceil` anchor for the llvm.ceil.* section under LangRef.rst Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 12:04:09 -07:00
eopXD	384b8b3da7	Revert "[VP][RISCV] Add vp.ceil and RISC-V support" This reverts commit `8b00b24f85`.	2022-09-27 11:12:57 -07:00
eopXD	8b00b24f85	[VP][RISCV] Add vp.ceil and RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 11:08:27 -07:00
Krzysztof Parzyszek	7da2b91887	[Hexagon] Unify getSizeOfs in HexagonVectorCombine, NFC	2022-09-27 10:51:52 -07:00
Krzysztof Parzyszek	9c9e877b7e	[Hexagon] Move function to a different class, NFC "Sector" is a concept from AlignVectors, so the check for it should be there.	2022-09-27 10:32:52 -07:00
Stefan Gränitz	ed8409dfa0	[ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI. Reviewed By: rnk, fhahn Differential Revision: https://reviews.llvm.org/D134441	2022-09-27 18:49:40 +02:00
David Green	401481daac	[AArch64] Remove incorrect zero element insert-bitcast patterns These two patterns are not working as intended, as shown in D134022. They need to insert the value into the new register, not override it.	2022-09-27 17:08:17 +01:00
Philip Reames	77b202f974	[RISCV] Rename getVectorImmCost to getStoreImmCost [nfc] My original intent had been to reuse this for arithmetic instructions as well, but due to the availability of a immediate splat encoding there, we will need different heuristics. So specialize the existing code for the store case.	2022-09-27 08:22:13 -07:00
wanglei	823ce6ad18	[LoongArch] Add some comments for expand pseudo-inst pass. NFC Differential Revision: https://reviews.llvm.org/D134708	2022-09-27 20:26:07 +08:00
Kazushi (Jam) Marukawa	de8013201f	[VE] Change to expand FPOW VE doesn't have FPOW instruction, so this patch makes llvm expand it. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134695	2022-09-27 20:03:10 +09:00
WANG Xuerui	c2a44b591e	[LoongArch] Support lowering frames larger than 2048 bytes Differential Revision: https://reviews.llvm.org/D134582	2022-09-27 18:58:33 +08:00
David Sherwood	fbb119412f	[AArch64] Add Neoverse V2 CPU support Adds support for the Neoverse V2 CPU to the AArch64 backend. Differential Revision: https://reviews.llvm.org/D134352	2022-09-27 07:56:08 +00:00
Paulo Matos	1bd1a44070	[WebAssembly] Use intrinsics for table.get/set instructions Initial table.get/set implementation would match and lower combinations of GEP+load/store to table.get/set instructions. However, this is error prone due to potential combinations of GEP+load/store we don't implement, and load/store optimizations. By changing the code to using intrinsics, we avoid both issues and simplify the code. New builtins implemented: * @llvm.wasm.table.get.externref * @llvm.wasm.table.get.funcref * @llvm.wasm.table.set.externref * @llvm.wasm.table.set.funcref Reviewed By: asb, tlively Differential Revision: https://reviews.llvm.org/D134436	2022-09-27 09:16:30 +02:00
Yeting Kuo	04e1301f3d	[VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support. Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum and llvm.minnum. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134639	2022-09-27 13:36:45 +08:00
Vitaly Buka	20a80d60a8	Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI" Break msan bots. Details in D134666. This reverts commit `0ce96e06ee`.	2022-09-26 22:22:09 -07:00
Paul Scoropan	ce004fb4f2	[PowerPC] XCOFF exception section support on the direct assembler path This feature implements support for making entries in the exception section on XCOFF on the direct assembly path using the ".except" pseudo-op. It also provides functionality to lower entries (comprised of language and reason codes) into the exception section through the use of annotation metadata attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler support will be provided in another review. https://reviews.llvm.org/D133030 needs to merge first for LIT tests Reviewed By: shchenz, RKSimon Differential Revision: https://reviews.llvm.org/D132146	2022-09-26 22:24:20 -04:00
Stanislav Mekhanoshin	0ce96e06ee	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-26 13:20:24 -07:00
Krzysztof Parzyszek	dfaf7a2846	[Hexagon] Avoid some unnecessary sign-extend instructions Simplify (sext_inreg (extractu ...)) -> (extract ...) where appropriate.	2022-09-26 12:30:18 -07:00
Krzysztof Parzyszek	d6c0a5be7f	[Hexagon] Make sure we can still shift scalar vectors by non-splats	2022-09-26 11:25:06 -07:00
Changpeng Fang	dee4bc4a4e	AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers Summary: With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in some cases. This work takes care of this change, and recognizes the new address patterns to do appropriate optimizations. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D134596	2022-09-26 09:31:52 -07:00
Kazushi (Jam) Marukawa	1cef30b9d3	[VE] Disable automatic maxnum/minnum selection Disable FMAX/FMIN selection from select_cc in VEInstrInfo.td because of the lack of NaN consideration. This patch removes such selection from VEInstrInfo.td and lets llvm work on it in combineMinNumMaxNum. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134595	2022-09-26 22:04:02 +09:00
Kazushi (Jam) Marukawa	76c76e9ab4	[VE] Support smax/smin Support smax/smin in VEInstrInfo.td. Remove obsolete patterns for smax/smin. Add regression tests for smax/smin/umax/umin. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134583	2022-09-26 22:02:57 +09:00
Momchil Velikov	6602110152	[ARM] Enable and/cmp0 folding The `CodeGenPrepare` pass can sink bitwise `and` used by compare to zero into the basic blocks where the users are. This operation is guarded by lowering hook, which is disabled for ARM. In the ARM architecture versions from v7-M up these two operations can be folded into `tst rN, #imm` instruction. Sinking of `and` can also enable the cmov-to-bfi DAG combiner. This patch fixes some benchmark regressions caused by https://reviews.llvm.org/D129370 as well scoring slightly better overall. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D134360	2022-09-26 11:31:23 +01:00
David Green	bebc96956b	[AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic, allowing the linker to relax adrp;add pairs where possible. D132075 extended that to neoverse-n1, this patch extends it to all other cortex and neoverse cpus for the same reasons. Differential Revision: https://reviews.llvm.org/D134521	2022-09-26 09:55:10 +01:00
gonglingqin	a6d699b55d	[LoongArch] Add codegen support for strict_fsetccs and any_fsetcc Differential Revision: https://reviews.llvm.org/D134274	2022-09-26 13:05:36 +08:00
wanglei	75265c7f49	[LoongArch] Lower BlockAddress/JumpTable This patch uses a unified interface for lower GlobalAddress ConstantPool BlockAddress and JumpTable. This patch allows lowering addresses by using PC-relative addressing for DSO-local symbols, and accessing the address through the global offset table for DSO-preemptable symbols. Remove hardcoded `MininumJumpTableEntries` for test lower JumpTable. Also updated some test cases using ConstantPool, due to the addition of relocation information. Differential Revision: https://reviews.llvm.org/D134431	2022-09-26 10:52:54 +08:00
Yeting Kuo	43c5fbdd3a	[VP][RISCV] Add vp.sqrt intrinsic and RISC-V support. The patch modeled vp.fabs patch D132793. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133690	2022-09-26 10:47:40 +08:00
WANG Xuerui	ad6fe32032	[LoongArch] Support 'generic' as a valid CPU name As the LoongArch port is largely modeled after RISCV it has the same behavior of not accepting `generic` as a CPU name. For better compatibility with consumers of LLVM (e.g. mesa) follow D121149's suit and treat `generic` the same as an empty CPU name. Differential Revision: https://reviews.llvm.org/D134412	2022-09-26 10:20:13 +08:00
Ruiling Song	bf25a48985	Add override for runOnFunction() Fix build-bot failure.	2022-09-26 10:19:35 +08:00
WANG Xuerui	d2ac89b64e	[LoongArch] Support fastcc and treat it as ccc As explained in D68559 the `fastcc` calling convention may be requested under certain conditions, hence the need for supporting it. But unlike RISCV we actually treat it exactly like ccc, without actually inventing any performance hack right here. And CSKY does the same thing. This is going to fix a few more test cases with native LoongArch builds. Differential Revision: https://reviews.llvm.org/D134443	2022-09-26 10:15:00 +08:00
WANG Xuerui	f89f0990db	[LoongArch] Support llvm.thread.pointer For `__builtin_thread_pointer` to work, among other things. Similar to D76828 for RISCV. Differential Revision: https://reviews.llvm.org/D134368	2022-09-26 09:56:42 +08:00
Ruiling Song	cf14c7caac	AMDGPU: Add a pass to rewrite certain undef in PHI For the pattern of IR (%if terminates with a divergent branch.), divergence analysis will report %phi as uniform to help optimal code generation. ``` %if \| \ \| %then \| / %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ] ``` In the backend, %phi and %uniform will be assigned a scalar register. But the %undef from %then will make the scalar register dead in %then. This will likely cause the register being over-written in %then. To fix the issue, we will rewrite %undef as %uniform. For details, please refer the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test changes shown, but this is mandatory for later changes. Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D133840	2022-09-26 09:54:47 +08:00
Weining Lu	394f30919a	[Clang][LoongArch] Add inline asm support for constraints f/l/I/K This patch adds support for constraints `f`, `l`, `I`, `K` according to [1]. The remain constraints (`k`, `m`, `ZB`, `ZC`) will be added later as they are a little more complex than the others. f: A floating-point register (if available). l: A signed 16-bit constant. I: A signed 12-bit constant (for arithmetic instructions). K: An unsigned 12-bit constant (for logic instructions). For now, no need to support register alias (e.g. `$a0`) in llvm as clang will correctly decode the usage of register name aliases into their official names. And AFAIK, the not yet upstreamed `rustc` for LoongArch will always use official register names (e.g. `$r4`). [1] https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html Differential Revision: https://reviews.llvm.org/D134157	2022-09-26 08:49:58 +08:00
James Y Knight	4f188ef89c	[AVR] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. It renames a few fields to have consistent names, as well as renaming operands to match the field names. The encoder behavior is unchanged by this cleanup, but a few instructions were previously being disassembled incorrectly, and have been corrected by this change. All of the affected instructions were missing disassembly tests, which are now added. Differential Revision: https://reviews.llvm.org/D134185	2022-09-25 17:55:09 -04:00
James Y Knight	a8c59bcc01	[AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors in R600. This is a follow-on to https://reviews.llvm.org/D134073. It renames a couple of fields to match their operands, as well as introducing sub-operand names where required. This change _only_ fixes the 'R600' half of the target, not the 'AMDGPU' half. Fixing the AMDGPU half will be a significantly more difficult change (which I've not yet attempted.) Differential Revision: https://reviews.llvm.org/D134078	2022-09-25 17:55:09 -04:00
James Y Knight	0f99958e79	[Lanai] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. Lanai was almost clean: the only issue is that 'bit' behaves differently than 'bits<1>', because only the 'bits' type preserves unresolved references via 'keepUnsetBits()' in TableGen/Record.h. Thus, use bits instead. This issue _would_ have caused invalid instruction emission/decoding, except that the PQ bits were being overriden after the fact by code in 'adjustPqBits' in MCTargetDesc/LanaiMCCodeEmitter.cpp, and 'PostOperandDecodeAdjust' in Disassembler/LanaiDisassembler.cpp. Differential Revision: https://reviews.llvm.org/D134075	2022-09-25 17:55:09 -04:00
Simon Pilgrim	196f27bb56	[CostModel][X86] Add missing cost kinds for v2i64 icmp on SLM	2022-09-25 15:12:21 +01:00
Simon Pilgrim	faff990e9b	[X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only Confirmed with Intel SOM, Agner and instlatx64	2022-09-25 14:18:08 +01:00
Petar Avramovic	dcc756d03e	[AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl test_flat_add_local_f64 caused by D130579 Revert `a3becb333d`. Differential Revision: https://reviews.llvm.org/D134568	2022-09-25 13:25:41 +02:00
Philip Reames	5358968e13	[RISCV] Pattern match scalable strided load/store Very straight forward extension of the existing pattern matching pass to handle scalable types as well as fixed length types. The only extra bit beyond removing a bailout is recognizing stepvector. Differential Revision: https://reviews.llvm.org/D134502	2022-09-24 17:41:58 -07:00
Philip Reames	6e7c54ecaf	[RISCV] Add lowering for scalable @llvm.riscv.masked.strided.load/store The code previously assumed fixed length vectors; make the relevant code conditional. Having the lowering in place is neccessary for an upcoming change to generalize scatter/gather matching to scalable vectors. Differential Revision: https://reviews.llvm.org/D134489	2022-09-24 17:41:57 -07:00
James Y Knight	5351878ba1	[TableGen] Add useDeprecatedPositionallyEncodedOperands option. Summary: The existing undefined-bitfield-to-operand matching behavior is very hard to understand, due to the combination of positional and named matching. This can make it difficult to track down a bug in a target's instruction definitions. Over the last decade, folks have tried to work-around this in various ways, but it's time to finally ditch the positional matching. With https://reviews.llvm.org/D131003, there are no longer cases that _require_ positional matching, and it's time to start removing usage and support for it. Therefore: add a (default-false) option, and set it to true only in those targets that require positional matching today. Subsequent changes will start cleaning up additional in-tree targets. NOTE TO OUT OF TREE TARGET MAINTAINERS: If this change breaks your build, you may restore the previous behavior simply by adding: let useDeprecatedPositionallyEncodedOperands = 1; to your target's InstrInfo tablegen definition. However, this is temporary -- the option will be removed in the future. If your target does not set 'decodePositionallyEncodedOperands', you may thus start migrating to named operands. However, if you _do_ currently set that option, I recommend waiting until a subsequent change lands, which adds decoder support for named sub-operands. Differential Revision: https://reviews.llvm.org/D134073	2022-09-24 09:40:45 -04:00
James Y Knight	a538d1f13a	[TableGen][CodeEmitterGen] Allow local names for sub-operands in a operand list. These names can then be matched by name against 'bits' fields in a record, to populate an instruction's encoding. This does _not_ yet change DecoderEmitter to allow by-name matching of sub-operands. Unlike the encoder, the decoder already defaulted to not supporting positional matching, and backends had workarounds in place for the missing decoding support. Additionally, use this new capability to allow the ARM and AArch64 backends not to require any positional operand matching. Differential Revision: https://reviews.llvm.org/D131003	2022-09-24 09:40:44 -04:00
Craig Topper	4f86c5cbb7	[RISCV] Rename RISCVScheduleB.td to RISCVScheduleZb.td. NFC	2022-09-23 21:38:42 -07:00
Craig Topper	3967abcc0b	[RISCV] Add missing scheduler classes to Zbkb and Zbkx instructions.	2022-09-23 21:38:42 -07:00
Craig Topper	cde3de5381	[RISCV] Remove a few remnants of Zbr I misssed.	2022-09-23 21:21:51 -07:00
Craig Topper	19850cc2d8	Revert "[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type." This reverts commit `dd53a0bb30`. We have seen crashes from this internally. Probably due to the use of RoundingMode::Dynamic.	2022-09-23 18:41:41 -07:00
Craig Topper	90a5d8499a	[RISCV] Promote f16 STRICT_FCEIL/FLOOR/TRUNC/NEARBYINT/RINT/ROUND,ROUNDEVEN to f32.	2022-09-23 14:01:51 -07:00
Jay Foad	ddfa0f62d8	[AMDGPU] Add GFX11 feature for subtargets with more VGPRs The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affects occupancy calculations. Differential Revision: https://reviews.llvm.org/D134522	2022-09-23 20:18:23 +01:00
Josh Stone	cb46ffdbf4	[X86] Use BuildStackAdjustment in stack probes This has the advantage of dealing with live EFLAGS, using LEA instead of SUB if needed to avoid clobbering. That also respects feature "lea-sp". We could allow unrolled stack probing from blocks with live-EFLAGS, if canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used. Differential Revision: https://reviews.llvm.org/D134495	2022-09-23 09:30:32 -07:00
Josh Stone	26c37b461a	[X86] Don't allow prologue stack probing with live EFLAGS Fixes https://github.com/llvm/llvm-project/issues/49509 Differential Revision: https://reviews.llvm.org/D134494	2022-09-23 09:30:32 -07:00
Josh Stone	4dcfb09e40	[NFC][CodeGen] Use const MF in TargetLowering stack probe functions This makes them callable from places like canUseAsPrologue. Differential Revision: https://reviews.llvm.org/D134492	2022-09-23 09:30:32 -07:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Petar Avramovic	5cee9047d5	AMDGPU: Improve atomicrmw fadd selection Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11 as for gfx90a. Add missing globalisel legalizer support for flat atomicrmw fadd f32 on gfx940 and gfx11. Isel support for gfx11 will be added in D130579. Differential Revision: https://reviews.llvm.org/D131560	2022-09-23 17:52:10 +02:00
Petar Avramovic	e03d36d4ae	[AMDGPU] Add FeatureFlatAtomicFaddF32Inst Feature used by targets that have flat_atomic_add_f32 instruction (gfx940 and gfx11). Remove isGFX940GFX11Plus. Add hasFlatAtomicFaddF32Inst Subtarget check for codegen. Differential Revision: https://reviews.llvm.org/D134532	2022-09-23 17:52:10 +02:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Hassnaa Hamdi	181f200a1c	[NFC]: AArch64-SVE modify some comments	2022-09-23 12:07:31 +00:00
Caroline Concatto	5431bf27bd	[AArch64]Remove svget/svset/svcreate from llvm This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm. It also implements the InstCombine for vector.extract that used to be in svget. Depends on: D131547 Differential Revision: https://reviews.llvm.org/D131548	2022-09-23 10:48:43 +01:00
gonglingqin	ac295597a8	[LoongArch] Add codegen support for atomicrmw add/sub/nand/and/or/xor operation Differential Revision: https://reviews.llvm.org/D133755	2022-09-23 09:32:11 +08:00
Philip Reames	60c91fd364	[RISCV] Disallow scale for scatter/gather RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely. This has two effects: * First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to. * Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.) The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics. Differential Revision: https://reviews.llvm.org/D134382	2022-09-22 15:31:26 -07:00
Craig Topper	52708be182	[RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions. These extensions do not appear to be on their way to ratification.	2022-09-22 13:04:41 -07:00
Simon Pilgrim	98907f8685	[CostModel][X86] Tidyup sdiv/srem/udiv/urem by constant cost tables Preparation for adding cost kinds handling This is necessary to eventually unblock D111968	2022-09-22 20:46:33 +01:00
Chris Bieneman	4959bfa060	[NFC] Refactor dxil metadata code DXIL relies on a whole bunch of IR metadata constructs being populated in the right shape. Rather than just hard coding or using complicated arrangements of constant data strings, let's make first-class objects that reprensent the metadata and manage reading and writing the metadata from the module. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D134397	2022-09-22 12:17:51 -05:00
Hassnaa Hamdi	f2072e0ae0	[AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations Differential Revision: https://reviews.llvm.org/D132477	2022-09-22 16:50:55 +00:00
Simon Pilgrim	dc93202b44	[CostModel][X86] Remove duplicate ashr v4i64 cost table entry. NFCI.	2022-09-22 17:27:26 +01:00
Fraser Cormack	92d71c615d	[RISCV] Use structured bindings in common RVV lowering code This patch uses structured bindings to simplify a couple of specific cases when lowering RVV operations where we commonly declare two SDValues and immediately 'tie' them to the mask and vector length. There's also a couple places where we split vectors that structured bindings make sense to use. This patch tries to keep these sorts of changes minimal and to cases where the returned types are commonly understood, rather than applying this wholesale to the RISCV backend. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134442	2022-09-22 16:38:40 +01:00
Philip Reames	e41765aa4d	[RISCV] Verify consistency of a couple TSFlags related to vector operands Various bits of existing code assume the presence of one operand implies the presence of another. Add verifier rules to catch violations. Differential Revision: https://reviews.llvm.org/D133810	2022-09-22 08:35:17 -07:00
Craig Topper	bf7c7696fe	[RISCV] Improve support for vector fp_to_sint_sat/uint_sat. The default fixed vector legalization is to unroll. The default scalable vector legalization is to clamp in the FP domain. The RVV vfcvt instructions have saturating behavior so we can use them directly. The only difference is that RVV instruction turn nan into the max value, but the _SAT intrinsics want 0. I'm only supporting 1 step of narrowing for now. I think we can support more steps by using VNCLIP to saturate and narrower. The only case that needs 2 steps of widening is f16->i64 which we can do as f16->f32->i64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D134400	2022-09-22 08:13:48 -07:00
Craig Topper	d6cb8f85bf	[RISCV] Formatting fixes to RISCV.td NFC Improve indentation. Fix the worst of the 80 column violations.	2022-09-22 08:12:59 -07:00
Tim Northover	677da09d02	AArch64: add support for newer Apple CPUs They're roughly ARMv8.6. This works in the .td file, but in AArch64TargetParser.def, marking them v8.6 brings in support for the SM4 cryptographic hash and we don't actually have that. So TargetParser side they're marked as v8.5, with the extra features (BF16 and I8MM added manually). Finally, A16 supports the HCX extension in addition to v8.6. This has no TargetParser implications.	2022-09-22 11:58:51 +01:00
Simon Pilgrim	e030be64d8	[CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad. This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 11:24:11 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Ilia Diachkov	4421b24fe2	[SPIRV] fix build with clang and use PoisonValue instead of UndefValue The patch fixes the SPIRV backend build using clang. It also replaces UndefValue with PoisonValue in SPIRVRegularizer.cpp. Fixes: #57773 Differential Revision: https://reviews.llvm.org/D134071	2022-09-22 11:49:54 +03:00
Craig Topper	8b8e18e11f	[RISCV] Replace RISCVISD::GREV/GORC/SHFL/UNSHFL with BREV8/ORC_B/ZIP/UNZIP. With Zbp removed, we no longer need the generalized forms. The computeKnownBitsForTargetNode code brev8/orc.b is still based on the general form with the shift amount forced to 7.	2022-09-21 21:57:59 -07:00
Craig Topper	182aa0cbe0	[RISCV] Remove support for the unratified Zbp extension. This extension does not appear to be on its way to ratification. Still need some follow up to simplify the RISCVISD nodes.	2022-09-21 21:22:42 -07:00
Fanchen Kong	8a2729fea7	[WebAssembly] Improve codegen for loading scalars from memory to v128 Use load32_zero instead of load32_splat to load the low 32 bits from memory to v128. Test cases are added to cover this change. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D134257	2022-09-21 21:05:44 -07:00
Craig Topper	3b8ec0fde5	[RISCV] Remove some unused Predicates from tablegen. NFC Specifically predicates for extensions that are subsets of other extensions. These predicates should never be used. Should always check the superset extension or the superset ORed with the sub extendsion.	2022-09-21 18:26:43 -07:00
Chris Bieneman	e77c40ffbd	[NFC] Make dxil namespace consistent We have namespaces `DXIL` and `dxil`, which is just confusing. This renames `DXIL` -> `dxil` making everything consistent. While the LLVM coding standards don't have a clear direction here, I chose lower case because by my current unscientific count there are more places where we had the lowercase namespace than the uppercase.	2022-09-21 17:48:13 -05:00
Craig Topper	1d8a7adca6	[RISCV] Rename RISCVISD::SINT_TO_FP_VL/UINT_TO_FP_VL. NFC Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it clear that the ISD nodes don't have the poison semantics of ISD::SINT_TO_FP/UINT_TO_FP. I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT patch and need the instruction semantics.	2022-09-21 15:33:04 -07:00

1 2 3 4 5 ...

69035 Commits