llvm-project

Commit Graph

Author	SHA1	Message	Date
OverMighty	232953f996	[AArch64] Add pattern for SQDML*Lv1i32_indexed There was no pattern to fold into these instructions. This patch adds the pattern obtained from the following ACLE intrinsics so that they generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and sqadd/sqsub instructions: - vqdmlalh_s16, vqdmlslh_s16 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16, vqdmlslh_laneq_s16 (when the lane index is 0) It also modifies the result of the existing pattern for the latter, when the lane index is not 0, to use the v1i32_indexed instructions instead of the v4i16_indexed ones. Fixes #49997. Differential Revision: https://reviews.llvm.org/D131700	2022-08-17 12:00:47 +01:00
Rainer Orth	d9993484ee	[Sparc] Don't use SunStyleELFSectionSwitchSyntax As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests currently `FAIL` on Sparc since that backend uses the Sun assembler syntax for the `.section` directive, controlled by `SunStyleELFSectionSwitchSyntax`. Instead of adapting the affected tests, this patch changes that default. The internal assembler still accepts both forms as input, only the output syntax is affected. Current support for the Sun syntax is cursory at best: the built-in assembler cannot even assemble some of the directives emitted by GCC, and the set supported by the Solaris assembler is even larger: SPARC Assembly Language Reference Manual, 3.4 Pseudo-Op Attributes <https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>. A few Sparc test cases need to be adjusted. At the same time, the patch fixes the failures from D85414 <https://reviews.llvm.org/D85414>. Tested on `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D85415	2022-08-17 12:59:29 +02:00
Craig Topper	d27c147aaa	[RISCV] Allow lowerSELECT to fold integer setcc with FP select. We'd pick it up in DAG combine later even if we didn't handle it here. No test changes because we get it in DAG combine anyway.	2022-08-16 21:28:54 -07:00
Vitaly Buka	16fecdfa70	Revert "[AArch64] Add `foldCSELOfCSEl` DAG combine" Breaks ubsan on buildbot, details in D125504 This reverts commit `6f9423ef06`.	2022-08-16 20:29:37 -07:00
Craig Topper	ba1fb54821	[RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC	2022-08-16 19:56:55 -07:00
Monk Chiang	0af4651c0f	[RISCV] Add scheduling class for vector pseudo segment instructions. Add scheduling resource for vector segment load/store instructions in D128886. I miss to add scheduling resource for pseudo segment instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130222	2022-08-16 17:54:47 -07:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	2dfa4b6475	Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This reverts commit `1380b21ceb`. I mixed up N0 and N1 and didn't do what I intended.	2022-08-16 15:47:01 -07:00
Craig Topper	1380b21ceb	[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine. We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:40:09 -07:00
Craig Topper	b5a18de651	[RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is still preferrable. (addi X, -1) can be compressed, sub with x0 on the LHS is never compressible.	2022-08-16 14:49:52 -07:00
Craig Topper	de6fd16971	[RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12. We still need to materialize the constant in a register and we may not be removing all uses of the original constant so it may increase code size.	2022-08-16 14:11:31 -07:00
Craig Topper	4184edc691	[RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc. This introduce an xori in some cases. I don't believe it was the intention of the original patch. This was an accident because nonan FP equality compares also use SETEQ/SETNE. Also pass the correct type to getSetCCInverse.	2022-08-16 13:00:36 -07:00
Craig Topper	c7e58836e8	[RISCV] Minor cleanups to performSUBCombine. NFC -Rename variable NnzC -> N0C. -Use SelectionDAG::getSetCC to reduce code. -Use SDValue::getOperand instead of operator-> and SDNode::getOperand. Initial steps to add another similar combine to this code.	2022-08-16 12:59:16 -07:00
Karl Meakin	6f9423ef06	[AArch64] Add `foldCSELOfCSEl` DAG combine Differential Revision: https://reviews.llvm.org/D125504	2022-08-16 12:49:11 +01:00
Zain Jaffal	7155ed4289	[AArch64] Add support for 256-bit non temporal loads Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D131773	2022-08-16 12:19:36 +01:00
Victor Campos	784da8a722	[ARM] Simplify the creation of escaped build attribute values There is an existing mechanism to escape strings, therefore the functions created to escape Tag_also_compatible_with values are not really needed. We can simply use the pre-existing utilities. Reviewed By: pratlucas Differential Revision: https://reviews.llvm.org/D131680	2022-08-16 11:49:33 +01:00
Bing1 Yu	807b8cb06c	[X86] Fix a lowering issue of mask.compress which has undef float passthrough Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131947	2022-08-16 17:54:45 +08:00
gonglingqin	a9d46d9af3	[LoongArch] Add codegen support for fabs Differential Revision: https://reviews.llvm.org/D131871	2022-08-16 14:41:27 +08:00
Weining Lu	d1f36da9e0	[LoongArch] Encode LoongArch specific ELF e_flags to binary by LoongArchTargetStreamer Reference: https://github.com/loongson/LoongArch-Documentation The last commit hash (main branch) is: 99016636af64d02dee05e39974d4c1e55875c45b Note: There are several PRs [1][2][3] that may affect the e_flags. After they got closed or merged, we should update the implementation here accordingly. [1] https://github.com/loongson/LoongArch-Documentation/pull/33 [2] https://github.com/loongson/LoongArch-Documentation/pull/47 [2] https://github.com/loongson/LoongArch-Documentation/pull/61 Differential Revision: https://reviews.llvm.org/D130239	2022-08-16 13:41:50 +08:00
Vitaly Buka	e0e960923f	[AArch64] Fix signed integer overflow in CSINC case Followup to D131815, which overlflows on different values.	2022-08-15 15:04:20 -07:00
Craig Topper	7a73ab5818	[RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64. We have a good selection of W instructions, so promoting a truncated value back to i64 is often free. This appears to be a net code size reduction on SPECINT2006. This has been split from D130397 as one of the patches needed to complete that. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131819	2022-08-15 08:32:51 -07:00
Simon Pilgrim	a7b85e4c0c	[X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468) Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468 Differential Revision: https://reviews.llvm.org/D106675	2022-08-15 16:17:21 +01:00
Simon Pilgrim	41bdb8cd36	[X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt) I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first. Fixes regressions from D106675.	2022-08-15 14:56:30 +01:00
Ayke van Laethem	a560e57a7e	[AVR] Only push and clear R1 in interrupts when necessary R1 is a reserved register, but LLVM gives the APIs to know when it is used or not. So this patch uses these APIs to only save/clear/restore R1 in interrupts when necessary. The main issue here was getting inline assembly to work. One could argue that this is the job of Clang, but for consistency I've made sure that R1 is always usable in inline assembly even if that means clearing it when it might not be needed. Information on inline assembly in AVR can be found here: https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code Essentially, this seems to suggest that r1 can be freely used in avr-gcc inline assembly, even without specifying it as an input operand. Differential Revision: https://reviews.llvm.org/D117426	2022-08-15 14:29:38 +02:00
Ayke van Laethem	43a8dbc5be	[AVR] Use @earlyclobber instead of register scavenging The code to support the case when the register allocator has assigned the same register to the src and the dst register operand isn't actually needed: * LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output register, so the register allocator will make sure to allocate a different register for the output register. * LDDWRdYQ does not have an @earlyclobber, but the pointer register is the fixed Y register which is reserved. The register allocator won't use reserved registers for the output value. This removes a special case in the code that makes the pseudo instruction expansion pass more complicated than it needs to be. Differential Revision: https://reviews.llvm.org/D131844	2022-08-15 14:29:38 +02:00
Kazu Hirata	f5a68feab3	Use llvm::none_of (NFC)	2022-08-14 16:25:39 -07:00
Kazu Hirata	6d9cd9199a	Use llvm::all_of (NFC)	2022-08-14 16:25:36 -07:00
Krzysztof Parzyszek	40ba78679d	[Hexagon] Distribute disjoint intervals at the end of expand-condsets This fixes https://github.com/llvm/llvm-project/issues/56050.	2022-08-14 16:15:23 -05:00
Krzysztof Parzyszek	98bd252432	[Hexagon] Make some loops in HexagonExpandCondsets.cpp range-based, NFC Plus some readability changes.	2022-08-14 16:15:06 -05:00
Simon Pilgrim	cc6d3f07f4	[M68k] Fix MSVC llvm::Optional<> deprecation warnings Use has_value()/value() instead of hasValue()/getValue()	2022-08-14 18:54:41 +01:00
Simon Pilgrim	8b47e29fa0	[X86] combineVectorShiftImm - fold (shl (add X, X), C) -> (shl X, (C + 1)) Noticed while investigating the regressions in D106675	2022-08-14 17:42:02 +01:00
Phoebe Wang	8b69549dc5	[X86][FP16] Promote FP16->[U]INT to FP16->FP32->[U]INT This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v It also helps to improve the performance by promoting the vector type. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131828	2022-08-14 09:37:33 +08:00
Vitaly Buka	f1596952f9	[AArch64] Fix signed integer overflow in CSINC case https://lab.llvm.org/staging/#/builders/224/builds/2/steps/16/logs/stdio Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D131815	2022-08-13 13:12:09 -07:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
Florian Hahn	c2af37dcdb	Revert "[AArch64][GlobalISel] Recognise some CCMPri" This reverts commit `38c2366b3f`. This patch seems to break boostraping LLVM with `-fglobal-isel -O3` on AArch64 hardware. Without the revert, there are 500+ test failures for the `check-llvm-codegen-x86` target.	2022-08-13 17:44:41 +01:00
Liqin.Weng	8a12606a7e	[AVR] Remove debug location of spill/reload instructions Reviewed By: MatzeB, benshi001 Differential Revision: https://reviews.llvm.org/D129262	2022-08-13 20:58:12 +08:00
LiaoChunyu	99ef0ddea3	[RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq)) (setcc x, y, eq/neq) are seqz, snez that set rd = 0/1. addi is used to process immediate, which can save instructions for load immediate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131471	2022-08-13 20:37:57 +08:00
Craig Topper	37db283362	[RISCV] isImpliedByDomCondition returns an Optional<bool> not a bool. We were incorrectly checking that it returned an implicaton result, not that the implication result itself was true.	2022-08-12 22:21:05 -07:00
jacquesguan	0fe5f03eeb	[RISCV][NFC] Use nested namespace definations. Since we use C++17 now, we could use nested namespace definations to simplify code. Differential Revision: https://reviews.llvm.org/D131751	2022-08-13 09:56:59 +08:00
James Y Knight	4d7f9b7489	X86: Don't fold TEST into ADD ...@GOTTPOFF/GOTNTPOFF/INDNTPOFF The linker may convert such an ADD into a LEA, so we must not use the EFLAGS output. This causes miscompiles with -fsanitize=null after `bacdf80f42` added llvm.threadlocal.address -- previously, global variables were known to be non-null, but the intrinsic is not currently known to return nonnull. (That should be corrected, but it shouldn't've caused miscompiles!) Differential Revision: https://reviews.llvm.org/D131716	2022-08-12 20:52:00 +00:00
Ilia Diachkov	df8713079b	[SPIRV] support capabilities and extensions This patch supports SPIR-V capabilities and extensions. In addition, it inserts decorations related to MIFlags and improves support of switches. Five tests are included to demonstrate the improvement. Differential Revision: https://reviews.llvm.org/D131221 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-08-12 23:33:15 +03:00
James Y Knight	59351fe340	SPIRV: Fix compilation in NDEBUG.	2022-08-12 14:00:39 +00:00
gonglingqin	9e09c3186e	[LoongArch] Add codegen support for ISD::CTPOP, ISD::CTTZ and ISD::CTLZ Differential Revision: https://reviews.llvm.org/D131550	2022-08-12 14:15:30 +08:00
Ting Wang	12e1936f64	[PowerPC] Add XXEVAL TD pattern Add xxeval TD pattern for P10 on: eqv, nor, or, xor. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D131654	2022-08-12 01:27:24 -04:00
Weining Lu	40f1f9b357	[LoongArch] Return null SDValue by default in LowerOperation. NFC Differential Revision: https://reviews.llvm.org/D131546	2022-08-12 12:09:08 +08:00
Craig Topper	e493944f5f	[RISCV] Use SLTIU X, -1 for (setne X, -1). Since -1 is the maximum unsigned value, all values less than it are not equal to it.	2022-08-11 15:36:04 -07:00
Martin Storsjö	2c2fb0c737	[llvm] Use hidden visibility when building for MinGW with Clang Since `c5b3de6745` (git main, August 11th), Clang does generate working hidden visibility on MinGW targets. Using that reduces the number of exports from a dylib build of LLVM significantly, which is vital for fitting within the limit of 64k exported symbols from a DLL. It's essential that if we set CMAKE_CXX_VISIBILITY_PRESET=hidden (which passes -fvisibility=hidden on the command line), we also must define LLVM_EXTERNAL_VISIBILITY consistently to override it. (If there are mismatches, e.g. setting hidden visibility generally but never overriding it back to default for the symbols that do need to be exported, we'd get broken builds in such configurations.) We don't want to be using __attribute__((visibility("hidden"))) on MinGW with GCC, because GCC produces a warning about it. (GCC hasn't warned about the command line options that set hidden visibility though.) Clang has historically not warned about either of them, so it is harmless to use the hidden visibility when building with older Clang (so we don't need to detect the exact version of Clang/LLVM where it has an effect). This reduces the number of exported symbols for a dylib build of LLVM; previously libLLVM exported around 64650 symbols (when the maximum is 65536) when the ARM, AArch64 and X86 targets were enabled. If enabling more targets (or if building with e.g. assertions enabled), it would exceed the limit. Now with visibility flags in use, the same build with ARM, AArch64 and X86 ends up at around 35k exported symbols. Differential Revision: https://reviews.llvm.org/D131661	2022-08-12 00:57:05 +03:00
Craig Topper	2c79801a0e	[RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u). Including patterns to select addiw if only the lower 32 bits are used. I'm not excited about adding this many patterns. I'm looking at whether we can create the xori during lowering and move the ineg patterns to DAGCombiner.	2022-08-11 14:24:09 -07:00
Simon Pilgrim	6ba5fc2dee	[X86] lowerShuffleWithVPMOV - support direct lowering to VPMOV on VLX targets lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count. Fixes the remaining regression from the fixes in rG293899c64b75	2022-08-11 17:40:07 +01:00

1 2 3 4 5 ...

68457 Commits