Commit Graph

42106 Commits

Author SHA1 Message Date
Chris Bieneman 1652c4f2fe [NFC] Fixing test requirements I broke
I broke these in 7a0cbe11fb, thanks @ikudrin for catching it!
2022-02-09 09:11:34 -06:00
Simon Pilgrim 6d68ece61f [X86] Refresh funnel/rotate AVX512 VBMI tests
Assume that if VBMI2 is enabled then VBMI is as well - correctly shows the fallback and widening code that real world targets will actually see
2022-02-09 15:04:40 +00:00
Sander de Smalen bcbad75a7c [AArch64][SVE] NFC: Add test file for predicate vector reductions.
This adds some tests for vector reductions which can and should
be implemented with ptest as opposed to promoted ANDV/ORV reduction.
2022-02-09 15:00:08 +00:00
Sander de Smalen 381767a274 [AArch64] NFC: Autogen check lines for sve-setcc.ll 2022-02-09 14:32:50 +00:00
Sander de Smalen ec46232517 [DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)`
This seems like an obvious fold, which leads to a few improvements.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118920
2022-02-09 14:30:01 +00:00
Tong Zhang 2fe315162e [X86] TCRETURNmi fix for 32bit platform
This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.")

after allocating register for index+base, we will only have one register left

This bug affects linux kernel compilation for x86 target. Error happens when compiling kmod_si476x_core.

clang complains:
    error: ran out of registers during register allocation

The full command is:

clang -Wp,-MMD,drivers/mfd/.si476x-cmd.o.d  -nostdinc -isystem /opt/toolchain/main/lib/clang/14.0.0/include -I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -no-integrated-as --prefix=/usr/bin/ -Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4 -march=atom -mtune=atom -mtune=generic -Wa,-mtune=generic32 -ffreestanding -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=1024 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=pattern -fno-stack-clash-protection -falign-functions=32 -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare     -DKBUILD_MODFILE='"drivers/mfd/si476x-core"' -DKBUILD_BASENAME='"si476x_cmd"' -DKBUILD_MODNAME='"si476x_core"' -D__KBUILD_MODNAME=kmod_si476x_core -c -o drivers/mfd/si476x-cmd.o drivers/mfd/si476x-cmd.c

-------------

LLVM cannot compile the following code for x86 32bit target, the reason is tail call(TCRETURNmi) is using 2 registers for index+base and we want to use more than one registers for passing function args and that is impossible.
This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.").
We will only use tail call when it is using <=1 registers for passing args.

```
struct BIG_PARM {
	int ver;
};

static struct {
	int (*foo)  (struct BIG_PARM* a, void *b);
	int (*bar)  (struct BIG_PARM* a);
	int (*zoo0) (void);
	int (*zoo1) (void);
	int (*zoo2) (void);
	int (*zoo3) (void);
	int (*zoo4) (void);
} vtable[] = {
	[0] = {
		.foo = (int (*)(struct BIG_PARM* a, void *b))0xdeadbeef,
	},
};

int something(struct BIG_PARM *a, void* b) {
	return vtable[a->ver].foo(a,b);
}

```

```
$ clang -std=gnu89 -m32 -mregparm=3 -mtune=generic -fno-strict-overflow -O2 -c t0.c -o t0.c.o
error: ran out of registers during register allocation
1 error generated.
```

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D118312
2022-02-09 20:34:04 +08:00
Tim Northover 00e372137c AArch64: do not use xzr for ldxp -> stxp dataflow.
If the result of a cmpxchg is unused, regalloc chooses `xzr` for the defs of
CMP_SWAP_128*. However, on the failure path this gets expanded to a LDXP ->
STXP to store the original value (to ensure no tearing occurred). This
unintentionally nulls out half of the value.

So instead use GPR64common for these defs, so regalloc has to choose a real
one.
2022-02-09 12:29:16 +00:00
jacquesguan 5e71bbfb6c [RISCV] Add patterns for vector widening floating-point fused multiply-add instructions
Add patterns for vector widening floating-point fused multiply-add instructions.

Differential Revision: https://reviews.llvm.org/D117546
2022-02-09 10:34:39 +08:00
Bill Wendling deaf22bc0e [X86] Implement -fzero-call-used-regs option
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.

The two upper categories are:

  - "used": Zero out used registers.
  - "all": Zero out all registers, whether used or not.

The individual options are:

  - "skip": Don't zero out any registers. This is the default.
  - "used": Zero out all used registers.
  - "used-arg": Zero out used registers that are used for arguments.
  - "used-gpr": Zero out used registers that are GPRs.
  - "used-gpr-arg": Zero out used GPRs that are used as arguments.
  - "all": Zero out all registers.
  - "all-arg": Zero out all registers used for arguments.
  - "all-gpr": Zero out all GPRs.
  - "all-gpr-arg": Zero out all GPRs used for arguments.

This is used to help mitigate Return-Oriented Programming exploits.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110869
2022-02-08 17:42:54 -08:00
Matt Arsenault 5af0f097ba GlobalISel: Constant fold G_PTR_ADD
Some globals lower to literal addresses on AMDGPU.

This may be wrong for non-integral address spaces. I'm wondering if we
should just allow regular G_ADD to use pointer types, and reserve
G_PTR_ADD for non-integral address spaces.
2022-02-08 19:21:06 -05:00
Matt Arsenault 2af4a554fe GlobalISel: Constant fold FP bin ops in MIRBuilder
Might as well handle these if we're going to handle the integer ops
here.
2022-02-08 18:51:10 -05:00
Matt Arsenault 930f2498d4 GlobalISel: Constant fold integer min/max opcodes 2022-02-08 18:50:35 -05:00
Krzysztof Parzyszek 0792161c00 [Hexagon] Fix operation actions for v128f16
There were more cases of operations that should have been "Custom" for
v128f16, but ended up "Legal" (e.g. load and store).
2022-02-08 15:28:37 -08:00
Matt Arsenault 0877fbcc16 GlobalISel: Add FoldBinOpIntoSelect combine
This will do the combine in cases that should fold, but don't
now. e.g. we're relying on the CSEMIRBuilder's incomplete constant
folding. For instance it doesn't handle FP operations or vectors (and
we don't have separate constant folding combines either to catch
them).
2022-02-08 18:17:21 -05:00
Matt Arsenault 5847d5fb24 AMDGPU/GlobalISel: Add baseline test for binop fold into select combine 2022-02-08 18:17:21 -05:00
Krzysztof Parzyszek 7403c02f06 [Hexagon] Fix crash with shuffle_vector of v128f16 2022-02-08 13:05:22 -08:00
Stanislav Mekhanoshin aeaf85b9c2 [AMDGPU] Select VGPR versions of MFMA if possible
We can select _vgprcd versions of MAI instructions and have no
AGPRs with the whole budget left for VGPRs if:

1. This is a kernel;
2. It has no calls;
3. It runs at least on 2 waves thus having not more that 256 VGPRs.
4. There is no inline asm requesting AGPRs.

Differential Revision: https://reviews.llvm.org/D117253
2022-02-08 10:19:41 -08:00
Matt Arsenault f2c99ea47d AMDGPU: Use reserved VGPR for AGPR spills to memory
Previously would reuse the VGPR used for large frame offsets with the
one needed for copying from the AGPR. Fix this by reusing the register
we already reserved for handling AGPR to AGPR copies.
2022-02-08 11:26:59 -05:00
Matt Arsenault 8b2ca766f0 AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908
We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)
2022-02-08 11:14:52 -05:00
Matt Arsenault a7f60bfdf6 AMDGPU: Regenerate mir test checks to include -NEXT 2022-02-08 11:14:52 -05:00
Sanjay Patel 905abc5b7d [SDAG] enable binop identity constant folds for fmul/fdiv
The test diffs are identical to D119111.

This only affects x86 currently because no other target
has an override for the TLI hook that controls this transform.
2022-02-08 10:52:28 -05:00
David Sherwood eabae1b017 [AArch64][CodeGen] Always use SVE (when enabled) to lower 64-bit vector multiplies
This patch adds custom lowering support for ISD::MUL with v1i64 and v2i64
types when SVE is enabled, regardless of the minimum SVE vector length. We
do this because NEON simply does not have 64-bit vector multiplies, so we
want to take advantage of these instructions in SVE.

I've updated the 128-bit min SVE vector bits tests here:

  CodeGen/AArch64/sve-fixed-length-int-arith.ll
  CodeGen/AArch64/sve-fixed-length-int-mulh.ll
  CodeGen/AArch64/sve-fixed-length-int-rem.ll

Differential Revision: https://reviews.llvm.org/D118802
2022-02-08 15:37:52 +00:00
Nikita Popov 997027347d [AMDGPURewriteOutArguments] Don't use pointer element type
Instead of using the pointer element type, look at how the pointer
is actually being used in store instructions, while looking through
bitcasts. This makes the transform compatible with opaque pointers
and a bit more general.

It's worth noting that I have dropped the 3-vector to 4-vector
shufflevector special case, because this is now handled in a
different way: If the value is actually used as a 4-vector, then
we're directly going to use that type, instead of shuffling to a
3-vector in between.

Differential Revision: https://reviews.llvm.org/D119237
2022-02-08 16:10:41 +01:00
Simon Pilgrim 0b00cd19e6 [X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809)
As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction.

There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem.

Differential Revision: https://reviews.llvm.org/D118128
2022-02-08 15:09:22 +00:00
Sheng 76c83e747f [GlobalISel] Add big endian support in CallLowering
When splitting values, CallLowering assumes Lo part goes first. But in big endian ISA such as M68k, Hi part goes first.

This patch fixes this.

Differential Revision: https://reviews.llvm.org/D116877
2022-02-08 14:43:38 +00:00
Nikita Popov 9cc83bfd6c [AMDGPU] Regenerate test checks (NFC)
Use --include-generated-funcs checks. Unfortunately this places
all the functions at the end of the file rather than interleaving
them, but at least makes it feasible to update these tests.
2022-02-08 14:30:18 +01:00
Simon Moll ae1bb44ed8 [VE] v256.32|64 setcc isel and tests
Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D119223
2022-02-08 13:20:55 +01:00
David Green fdce239ae9 [AArch64] Attempt to emitConjunction from brcond
We currently use emitConjunction to create CCMP conjunctions from the
conditions of selects, helping turning and/ors into more optimal ccmp
sequences that don't need to go through csels. This extends that to also
be used whilst lowering brcond, giving more opportunity for better
condition generation.

Differential Revision: https://reviews.llvm.org/D118650
2022-02-08 11:27:10 +00:00
Mubashar Ahmad 95b8a3e520 [AArch64] FeaturePerfMon Added to CPUs
FeaturePerfMon has been enabled for CPUs in AArch64.

Differential Revision: https://reviews.llvm.org/D118705
2022-02-08 11:19:26 +00:00
Fraser Cormack 62c4ac764b [RISCV] Optimize splats of extracted vector elements
This patch adds an optimization to splat-like operations where the
splatted value is extracted from a identically-sized vector. On RVV we
can splat that via vrgather.vx/vrgather.vi without dropping to scalar
beforehand.

We do have a similar VECTOR_SHUFFLE-specific optimization but that only
works on fixed-length vector types and for those with a constant splat
lane. This patch extends this optimization to make it work on
scalable-vector types and on unknown extract indices.

It is performed during fixed-vector BUILD_VECTOR lowering and during a
new DAGCombine on SPLAT_VECTOR for scalable vectors.

Reviewed By: craig.topper, khchen

Differential Revision: https://reviews.llvm.org/D118456
2022-02-08 10:35:25 +00:00
David Green f21dd70f68 [AArch64] Add some additional tests for conditions of branches. NFC 2022-02-08 10:28:33 +00:00
Rainer Orth 541171f02f [CodeGen][test] XFAIL CodeGen/Generic/ForceStackAlign.ll on SPARC
`CodeGen/Generic/ForceStackAlign.ll` `FAIL`s on SPARC like this:

  LLVM ERROR: Function "g" required stack re-alignment, but LLVM couldn't
handle it (probably because it has a dynamic alloca).

According to the comments in `llvm/lib/Target/Sparc/SparcFrameLowering.cpp`
(`SparcFrameLowering::emitPrologue`) and `SparcRegisterInfo.cpp`
(`SparcRegisterInfo::canRealignStack`) this isn't going to change any time
soon, so this patch `XFAIL`s the test.

Tested on `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D119119
2022-02-08 08:57:59 +01:00
wangpc c53d99c37d [RISCV] Split f64 undef into two i32 undefs
So that no store instruction will be generated.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118222
2022-02-08 13:42:15 +08:00
wangpc cb0fff4397 [RISCV] Pre-commit test for D118222
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119212
2022-02-08 12:52:13 +08:00
Carl Ritson a1fb307b4b [AMDGPU] Allow hoisting of some VALU compare instructions
Conversatively allow hoisting/sinking of VALU comparisons.
If the result of a comparison is masked with exec, narrowing the
set of active lanes, then it is safe to hoist it as the masking
instruction will never by hoisted.

Heuristically this is also true for sinking, as we do not expect
the result of a sunk comparison that is masked with exec to be
used outside of the loop.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D118975
2022-02-08 11:27:23 +09:00
Sheng 5aa3af3fcb [M68k][GlobalISel] Implement lowerCall based on M68k calling convention
This patch implements CallLowering::lowerCall based on M68k calling
convention and adds M68kOutgoingValueHandler and CallReturnHandler to
handle argument passing and returned value.
2022-02-07 21:18:54 -05:00
Sheng 146c7820d9 [GlobalISel][Legalizer] Support reducing load/store width in big endian order 2022-02-07 20:06:17 -05:00
Sheng 0fe419faa3 M68K: Pre-commit test of D116931 2022-02-07 20:06:17 -05:00
Sanjay Patel d1ecfaa097 [SDAG] try to fold one-demanded-bit-of-multiply
This is a translation of the transform added to InstCombine with:
D118539
2022-02-07 17:24:35 -05:00
Sanjay Patel fc6bee1c11 [SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X
This is translated from recent changes to the IR version of this function:
D119060
D119139
2022-02-07 15:38:50 -05:00
Sanjay Patel e4e671c54f [AArch64] add tests for demanded bits of multiply; NFC
This is adapted from existing tests for instcombine.
We want to keep the backend logic synchronized with
that as much as possible.

See D119139 / D119060 / D118539
2022-02-07 15:38:50 -05:00
Vang Thao 570471199b [AMDGPU] Fix debug values in scheduler not placed correctly when reverting
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.

Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D119022
2022-02-07 11:01:13 -08:00
Mark Murray 3d7662142d [ARM] Undeprecate complex IT blocks
AArch32/Armv8A  introduced the performance deprecation of certain patterns
of IT instructions.  After some debate internal to ARM, this is now being
reverted; i.e. no IT instruction patterns are performance deprecated
anymore, as the perfomance degredation is not significant enough.

This reverts the following:

"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of
IT that apply to instructions other than a single subsequent 16-bit
instruction from a restricted set are deprecated, as are explicit
references to the PC within that single 16-bit instruction. This permits
the non-deprecated forms of IT and subsequent instructions to be treated
as a single 32-bit conditional instruction."

The deprecation no longer applies, but the behaviour may be controlled
by the -arm-restrict-it and -arm-no-restrict-it command-line options,
with the latter being the default. No warnings about complex IT blocks
will be generated.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118044
2022-02-07 15:47:53 +00:00
Sanjay Patel 40a50f8701 [x86] avoid false dependency stall on 'sbb' with same source reg
This is effectively inverting the transform added with D116804
because the downside of the false dependency of something like
"sbb %eax, %eax" is much greater than the upside of eliminating
a zeroing instruction on (all?) Intel CPUs.

Differential Revision: https://reviews.llvm.org/D118843
2022-02-07 10:12:12 -05:00
Matt Arsenault 31973062ec AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos
If we had a large offset which required materializing in a register,
we would emit an s_add_i32, clobbering SCC. Start checking if SCC is
live, and instead use a VGPR offset. For MUBUF, we switch to using
offen. We would do this anyway in a normal load/store with a frame
index, but not for spills.

The same problem still exists in other contexts where we expand frame
indices.

The nasty edge case is when SGPRs are spilled to memory at a large
frame offset where SCC is also clobbered. This requires a second
scavenging index, and also required several patches in the scavenger
to correctly handle multiple recursive scavenge indexes.

An even nastier edge case we still don't support is if we don't have
any free SGPRs. If SCC is live and we don't have any free SGPRs to
save exec, we have no way of flipping exec back and forth without also
clobbering SCC.

Fixes: SWDEV-309419
2022-02-07 10:02:03 -05:00
David Truby be826cf4f7
[AArch64][NEON][SVE] Lower FCOPYSIGN using AArch64ISD::BSP
This patch modifies the FCOPYSIGN lowering to go through the BSP
pseudo-instruction. This allows the same lowering code for NEON,
SVE and SVE2.

As part of this, lowering for BSP for SVE and SVE2 is also added.

For SVE and NEON this patch is NFC.

Differential Revision: https://reviews.llvm.org/D118394
2022-02-07 14:35:26 +00:00
Simon Pilgrim 2e0409a545 [X86][SSE] Add some initial PAVGB/PAVGW tests
Once D119073 has landed, I'll start addressing these
2022-02-07 12:58:47 +00:00
Zi Xuan Wu a190fcdfcc [CSKY] Add inline asm constraints and related codegen support
There are kinds of inline asm constraints and corresponding register class or register as following.

 'b': mGPRRegClass
 'v': sGPRRegClass
 'w': sFPR32RegClass or sFPR64RegClass
 'c': C register
 'z': R14 register
 'h': HI register
 'l': LO register
 'y': HI or LO register

It also adds codegen test for inline-asm including constraints, clobbers and abi names.
2022-02-07 17:45:37 +08:00
Simon Pilgrim 289b8e0d2f [X86] Add fp80 copysign test coverage
Add PR41749 test coverage
2022-02-07 09:44:01 +00:00
Luo, Yuanke 24562babdf [X86] Add test cases for fmul/fdiv with select. 2022-02-07 16:10:44 +08:00