llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Bieneman	1652c4f2fe	[NFC] Fixing test requirements I broke I broke these in `7a0cbe11fb`, thanks @ikudrin for catching it!	2022-02-09 09:11:34 -06:00
Simon Pilgrim	6d68ece61f	[X86] Refresh funnel/rotate AVX512 VBMI tests Assume that if VBMI2 is enabled then VBMI is as well - correctly shows the fallback and widening code that real world targets will actually see	2022-02-09 15:04:40 +00:00
Sander de Smalen	bcbad75a7c	[AArch64][SVE] NFC: Add test file for predicate vector reductions. This adds some tests for vector reductions which can and should be implemented with ptest as opposed to promoted ANDV/ORV reduction.	2022-02-09 15:00:08 +00:00
Sander de Smalen	381767a274	[AArch64] NFC: Autogen check lines for sve-setcc.ll	2022-02-09 14:32:50 +00:00
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
Tong Zhang	2fe315162e	[X86] TCRETURNmi fix for 32bit platform This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.") after allocating register for index+base, we will only have one register left This bug affects linux kernel compilation for x86 target. Error happens when compiling kmod_si476x_core. clang complains: error: ran out of registers during register allocation The full command is: clang -Wp,-MMD,drivers/mfd/.si476x-cmd.o.d -nostdinc -isystem /opt/toolchain/main/lib/clang/14.0.0/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -no-integrated-as --prefix=/usr/bin/ -Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4 -march=atom -mtune=atom -mtune=generic -Wa,-mtune=generic32 -ffreestanding -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=1024 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=pattern -fno-stack-clash-protection -falign-functions=32 -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -DKBUILD_MODFILE='"drivers/mfd/si476x-core"' -DKBUILD_BASENAME='"si476x_cmd"' -DKBUILD_MODNAME='"si476x_core"' -D__KBUILD_MODNAME=kmod_si476x_core -c -o drivers/mfd/si476x-cmd.o drivers/mfd/si476x-cmd.c ------------- LLVM cannot compile the following code for x86 32bit target, the reason is tail call(TCRETURNmi) is using 2 registers for index+base and we want to use more than one registers for passing function args and that is impossible. This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently."). We will only use tail call when it is using <=1 registers for passing args. ``` struct BIG_PARM { int ver; }; static struct { int (foo) (struct BIG_PARM a, void b); int (bar) (struct BIG_PARM* a); int (zoo0) (void); int (zoo1) (void); int (zoo2) (void); int (zoo3) (void); int (zoo4) (void); } vtable[] = { [0] = { .foo = (int ()(struct BIG_PARM* a, void b))0xdeadbeef, }, }; int something(struct BIG_PARM a, void* b) { return vtable[a->ver].foo(a,b); } ``` ``` $ clang -std=gnu89 -m32 -mregparm=3 -mtune=generic -fno-strict-overflow -O2 -c t0.c -o t0.c.o error: ran out of registers during register allocation 1 error generated. ``` Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D118312	2022-02-09 20:34:04 +08:00
Tim Northover	00e372137c	AArch64: do not use xzr for ldxp -> stxp dataflow. If the result of a cmpxchg is unused, regalloc chooses `xzr` for the defs of CMP_SWAP_128*. However, on the failure path this gets expanded to a LDXP -> STXP to store the original value (to ensure no tearing occurred). This unintentionally nulls out half of the value. So instead use GPR64common for these defs, so regalloc has to choose a real one.	2022-02-09 12:29:16 +00:00
jacquesguan	5e71bbfb6c	[RISCV] Add patterns for vector widening floating-point fused multiply-add instructions Add patterns for vector widening floating-point fused multiply-add instructions. Differential Revision: https://reviews.llvm.org/D117546	2022-02-09 10:34:39 +08:00
Bill Wendling	deaf22bc0e	[X86] Implement -fzero-call-used-regs option The "-fzero-call-used-regs" option tells the compiler to zero out certain registers before the function returns. It's also available as a function attribute: zero_call_used_regs. The two upper categories are: - "used": Zero out used registers. - "all": Zero out all registers, whether used or not. The individual options are: - "skip": Don't zero out any registers. This is the default. - "used": Zero out all used registers. - "used-arg": Zero out used registers that are used for arguments. - "used-gpr": Zero out used registers that are GPRs. - "used-gpr-arg": Zero out used GPRs that are used as arguments. - "all": Zero out all registers. - "all-arg": Zero out all registers used for arguments. - "all-gpr": Zero out all GPRs. - "all-gpr-arg": Zero out all GPRs used for arguments. This is used to help mitigate Return-Oriented Programming exploits. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110869	2022-02-08 17:42:54 -08:00
Matt Arsenault	5af0f097ba	GlobalISel: Constant fold G_PTR_ADD Some globals lower to literal addresses on AMDGPU. This may be wrong for non-integral address spaces. I'm wondering if we should just allow regular G_ADD to use pointer types, and reserve G_PTR_ADD for non-integral address spaces.	2022-02-08 19:21:06 -05:00
Matt Arsenault	2af4a554fe	GlobalISel: Constant fold FP bin ops in MIRBuilder Might as well handle these if we're going to handle the integer ops here.	2022-02-08 18:51:10 -05:00
Matt Arsenault	930f2498d4	GlobalISel: Constant fold integer min/max opcodes	2022-02-08 18:50:35 -05:00
Krzysztof Parzyszek	0792161c00	[Hexagon] Fix operation actions for v128f16 There were more cases of operations that should have been "Custom" for v128f16, but ended up "Legal" (e.g. load and store).	2022-02-08 15:28:37 -08:00
Matt Arsenault	0877fbcc16	GlobalISel: Add FoldBinOpIntoSelect combine This will do the combine in cases that should fold, but don't now. e.g. we're relying on the CSEMIRBuilder's incomplete constant folding. For instance it doesn't handle FP operations or vectors (and we don't have separate constant folding combines either to catch them).	2022-02-08 18:17:21 -05:00
Matt Arsenault	5847d5fb24	AMDGPU/GlobalISel: Add baseline test for binop fold into select combine	2022-02-08 18:17:21 -05:00
Krzysztof Parzyszek	7403c02f06	[Hexagon] Fix crash with shuffle_vector of v128f16	2022-02-08 13:05:22 -08:00
Stanislav Mekhanoshin	aeaf85b9c2	[AMDGPU] Select VGPR versions of MFMA if possible We can select _vgprcd versions of MAI instructions and have no AGPRs with the whole budget left for VGPRs if: 1. This is a kernel; 2. It has no calls; 3. It runs at least on 2 waves thus having not more that 256 VGPRs. 4. There is no inline asm requesting AGPRs. Differential Revision: https://reviews.llvm.org/D117253	2022-02-08 10:19:41 -08:00
Matt Arsenault	f2c99ea47d	AMDGPU: Use reserved VGPR for AGPR spills to memory Previously would reuse the VGPR used for large frame offsets with the one needed for copying from the AGPR. Fix this by reusing the register we already reserved for handling AGPR to AGPR copies.	2022-02-08 11:26:59 -05:00
Matt Arsenault	8b2ca766f0	AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908 We need to guarantee cheap copies between AGPRs, and unfortunately gfx908 cannot directly do this. Theoretically we could set the scavenger up with an emergency spill slot, but it also feels unreasonable to pay that cost for what was assumed to be a simple and cheap copy. Pick a register that doesn't conflict with any ABI registers. This does not address the same issue when copying from SGPR to AGPR for gfx90a (this coincidentally fixes it for gfx908), but that's less interesting since the register allocator shouldn't be proactively introducing such copies. One edge case I'm worried about is respecting the VGPR budget implied by amdgpu-waves-per-eu. If the theoretical upper bound of a function is 32 VGPRs, this will force the actual count to be 33. This is also broken if inline assembly uses/defs something in v32. The coalescer will eliminate the intermediate vreg between the def and use, and the introduced copy will clobber the user value. (cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)	2022-02-08 11:14:52 -05:00
Matt Arsenault	a7f60bfdf6	AMDGPU: Regenerate mir test checks to include -NEXT	2022-02-08 11:14:52 -05:00
Sanjay Patel	905abc5b7d	[SDAG] enable binop identity constant folds for fmul/fdiv The test diffs are identical to D119111. This only affects x86 currently because no other target has an override for the TLI hook that controls this transform.	2022-02-08 10:52:28 -05:00
David Sherwood	eabae1b017	[AArch64][CodeGen] Always use SVE (when enabled) to lower 64-bit vector multiplies This patch adds custom lowering support for ISD::MUL with v1i64 and v2i64 types when SVE is enabled, regardless of the minimum SVE vector length. We do this because NEON simply does not have 64-bit vector multiplies, so we want to take advantage of these instructions in SVE. I've updated the 128-bit min SVE vector bits tests here: CodeGen/AArch64/sve-fixed-length-int-arith.ll CodeGen/AArch64/sve-fixed-length-int-mulh.ll CodeGen/AArch64/sve-fixed-length-int-rem.ll Differential Revision: https://reviews.llvm.org/D118802	2022-02-08 15:37:52 +00:00
Nikita Popov	997027347d	[AMDGPURewriteOutArguments] Don't use pointer element type Instead of using the pointer element type, look at how the pointer is actually being used in store instructions, while looking through bitcasts. This makes the transform compatible with opaque pointers and a bit more general. It's worth noting that I have dropped the 3-vector to 4-vector shufflevector special case, because this is now handled in a different way: If the value is actually used as a 4-vector, then we're directly going to use that type, instead of shuffling to a 3-vector in between. Differential Revision: https://reviews.llvm.org/D119237	2022-02-08 16:10:41 +01:00
Simon Pilgrim	0b00cd19e6	[X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809) As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction. There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem. Differential Revision: https://reviews.llvm.org/D118128	2022-02-08 15:09:22 +00:00
Sheng	76c83e747f	[GlobalISel] Add big endian support in CallLowering When splitting values, CallLowering assumes Lo part goes first. But in big endian ISA such as M68k, Hi part goes first. This patch fixes this. Differential Revision: https://reviews.llvm.org/D116877	2022-02-08 14:43:38 +00:00
Nikita Popov	9cc83bfd6c	[AMDGPU] Regenerate test checks (NFC) Use --include-generated-funcs checks. Unfortunately this places all the functions at the end of the file rather than interleaving them, but at least makes it feasible to update these tests.	2022-02-08 14:30:18 +01:00
Simon Moll	ae1bb44ed8	[VE] v256.32\|64 setcc isel and tests Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D119223	2022-02-08 13:20:55 +01:00
David Green	fdce239ae9	[AArch64] Attempt to emitConjunction from brcond We currently use emitConjunction to create CCMP conjunctions from the conditions of selects, helping turning and/ors into more optimal ccmp sequences that don't need to go through csels. This extends that to also be used whilst lowering brcond, giving more opportunity for better condition generation. Differential Revision: https://reviews.llvm.org/D118650	2022-02-08 11:27:10 +00:00
Mubashar Ahmad	95b8a3e520	[AArch64] FeaturePerfMon Added to CPUs FeaturePerfMon has been enabled for CPUs in AArch64. Differential Revision: https://reviews.llvm.org/D118705	2022-02-08 11:19:26 +00:00
Fraser Cormack	62c4ac764b	[RISCV] Optimize splats of extracted vector elements This patch adds an optimization to splat-like operations where the splatted value is extracted from a identically-sized vector. On RVV we can splat that via vrgather.vx/vrgather.vi without dropping to scalar beforehand. We do have a similar VECTOR_SHUFFLE-specific optimization but that only works on fixed-length vector types and for those with a constant splat lane. This patch extends this optimization to make it work on scalable-vector types and on unknown extract indices. It is performed during fixed-vector BUILD_VECTOR lowering and during a new DAGCombine on SPLAT_VECTOR for scalable vectors. Reviewed By: craig.topper, khchen Differential Revision: https://reviews.llvm.org/D118456	2022-02-08 10:35:25 +00:00
David Green	f21dd70f68	[AArch64] Add some additional tests for conditions of branches. NFC	2022-02-08 10:28:33 +00:00
Rainer Orth	541171f02f	[CodeGen][test] XFAIL CodeGen/Generic/ForceStackAlign.ll on SPARC `CodeGen/Generic/ForceStackAlign.ll` `FAIL`s on SPARC like this: LLVM ERROR: Function "g" required stack re-alignment, but LLVM couldn't handle it (probably because it has a dynamic alloca). According to the comments in `llvm/lib/Target/Sparc/SparcFrameLowering.cpp` (`SparcFrameLowering::emitPrologue`) and `SparcRegisterInfo.cpp` (`SparcRegisterInfo::canRealignStack`) this isn't going to change any time soon, so this patch `XFAIL`s the test. Tested on `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D119119	2022-02-08 08:57:59 +01:00
wangpc	c53d99c37d	[RISCV] Split f64 undef into two i32 undefs So that no store instruction will be generated. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118222	2022-02-08 13:42:15 +08:00
wangpc	cb0fff4397	[RISCV] Pre-commit test for D118222 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119212	2022-02-08 12:52:13 +08:00
Carl Ritson	a1fb307b4b	[AMDGPU] Allow hoisting of some VALU compare instructions Conversatively allow hoisting/sinking of VALU comparisons. If the result of a comparison is masked with exec, narrowing the set of active lanes, then it is safe to hoist it as the masking instruction will never by hoisted. Heuristically this is also true for sinking, as we do not expect the result of a sunk comparison that is masked with exec to be used outside of the loop. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D118975	2022-02-08 11:27:23 +09:00
Sheng	5aa3af3fcb	[M68k][GlobalISel] Implement lowerCall based on M68k calling convention This patch implements CallLowering::lowerCall based on M68k calling convention and adds M68kOutgoingValueHandler and CallReturnHandler to handle argument passing and returned value.	2022-02-07 21:18:54 -05:00
Sheng	146c7820d9	[GlobalISel][Legalizer] Support reducing load/store width in big endian order	2022-02-07 20:06:17 -05:00
Sheng	0fe419faa3	M68K: Pre-commit test of D116931	2022-02-07 20:06:17 -05:00
Sanjay Patel	d1ecfaa097	[SDAG] try to fold one-demanded-bit-of-multiply This is a translation of the transform added to InstCombine with: D118539	2022-02-07 17:24:35 -05:00
Sanjay Patel	fc6bee1c11	[SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X This is translated from recent changes to the IR version of this function: D119060 D119139	2022-02-07 15:38:50 -05:00
Sanjay Patel	e4e671c54f	[AArch64] add tests for demanded bits of multiply; NFC This is adapted from existing tests for instcombine. We want to keep the backend logic synchronized with that as much as possible. See D119139 / D119060 / D118539	2022-02-07 15:38:50 -05:00
Vang Thao	570471199b	[AMDGPU] Fix debug values in scheduler not placed correctly when reverting Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule. Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D119022	2022-02-07 11:01:13 -08:00
Mark Murray	3d7662142d	[ARM] Undeprecate complex IT blocks AArch32/Armv8A introduced the performance deprecation of certain patterns of IT instructions. After some debate internal to ARM, this is now being reverted; i.e. no IT instruction patterns are performance deprecated anymore, as the perfomance degredation is not significant enough. This reverts the following: "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." The deprecation no longer applies, but the behaviour may be controlled by the -arm-restrict-it and -arm-no-restrict-it command-line options, with the latter being the default. No warnings about complex IT blocks will be generated. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118044	2022-02-07 15:47:53 +00:00
Sanjay Patel	40a50f8701	[x86] avoid false dependency stall on 'sbb' with same source reg This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs. Differential Revision: https://reviews.llvm.org/D118843	2022-02-07 10:12:12 -05:00
Matt Arsenault	31973062ec	AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos If we had a large offset which required materializing in a register, we would emit an s_add_i32, clobbering SCC. Start checking if SCC is live, and instead use a VGPR offset. For MUBUF, we switch to using offen. We would do this anyway in a normal load/store with a frame index, but not for spills. The same problem still exists in other contexts where we expand frame indices. The nasty edge case is when SGPRs are spilled to memory at a large frame offset where SCC is also clobbered. This requires a second scavenging index, and also required several patches in the scavenger to correctly handle multiple recursive scavenge indexes. An even nastier edge case we still don't support is if we don't have any free SGPRs. If SCC is live and we don't have any free SGPRs to save exec, we have no way of flipping exec back and forth without also clobbering SCC. Fixes: SWDEV-309419	2022-02-07 10:02:03 -05:00
David Truby	be826cf4f7	[AArch64][NEON][SVE] Lower FCOPYSIGN using AArch64ISD::BSP This patch modifies the FCOPYSIGN lowering to go through the BSP pseudo-instruction. This allows the same lowering code for NEON, SVE and SVE2. As part of this, lowering for BSP for SVE and SVE2 is also added. For SVE and NEON this patch is NFC. Differential Revision: https://reviews.llvm.org/D118394	2022-02-07 14:35:26 +00:00
Simon Pilgrim	2e0409a545	[X86][SSE] Add some initial PAVGB/PAVGW tests Once D119073 has landed, I'll start addressing these	2022-02-07 12:58:47 +00:00
Zi Xuan Wu	a190fcdfcc	[CSKY] Add inline asm constraints and related codegen support There are kinds of inline asm constraints and corresponding register class or register as following. 'b': mGPRRegClass 'v': sGPRRegClass 'w': sFPR32RegClass or sFPR64RegClass 'c': C register 'z': R14 register 'h': HI register 'l': LO register 'y': HI or LO register It also adds codegen test for inline-asm including constraints, clobbers and abi names.	2022-02-07 17:45:37 +08:00
Simon Pilgrim	289b8e0d2f	[X86] Add fp80 copysign test coverage Add PR41749 test coverage	2022-02-07 09:44:01 +00:00
Luo, Yuanke	24562babdf	[X86] Add test cases for fmul/fdiv with select.	2022-02-07 16:10:44 +08:00

1 2 3 4 5 ...

42106 Commits