Commit Graph

66144 Commits

Author SHA1 Message Date
Simon Pilgrim ada6bcc13f [X86] X86tcret_1reg - use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is always dereferenced, so assert the cast is correct instead of returning nullptr
2022-02-17 11:54:12 +00:00
Simon Pilgrim f1877eb1bb AArch64_MC::isQForm - Fix MSVC 'no default capture mode' lambda warning 2022-02-17 11:41:47 +00:00
Pavel Kosov 37fa99eda0 [SchedModels][CortexA55] Add ASIMD integer instructions
Depends on D114642

Original review https://reviews.llvm.org/D112201

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D117003
2022-02-17 13:41:57 +03:00
Pavel Kosov f3809b20f2 [AArch64][SchedModels] Handle virtual registers in FP/NEON predicates
Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually.

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114642
2022-02-17 13:41:05 +03:00
Zakk Chen 093ecccdab [RISCV] Add the passthru operand for vadc/vsbc/vmerge/vfmerge IR intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D119686
2022-02-17 02:21:39 -08:00
David Green f3bc7fd546 [AArch64] Cleanup for performCommonVectorExtendCombine. NFC
This is some NFC (hopefully!) cleanup for performCommonVectorExtendCombine
and related methods, removing conditions that cannot occur and otherwise
cleaning up the code a little.
2022-02-17 10:03:28 +00:00
Jay Foad c08896d292 [AMDGPU] Return better Changed status from SILowerI1Copies
Differential Revision: https://reviews.llvm.org/D119946
2022-02-17 09:38:57 +00:00
Jay Foad 78ebb1dd24 [AMDGPU] Return better Changed status from SIAnnotateControlFlow
Differential Revision: https://reviews.llvm.org/D119945
2022-02-17 09:38:57 +00:00
Jay Foad 1822a5ecdd [AMDGPU] Return better Changed status from AMDGPUPerfHintAnalysis
Differential Revision: https://reviews.llvm.org/D119944
2022-02-17 09:31:42 +00:00
Jay Foad 77e793d025 [AMDGPU] Return better Changed status from AMDGPUAnnotateUniformValues
Differential Revision: https://reviews.llvm.org/D119943
2022-02-17 09:31:42 +00:00
Ben Shi 0b93e90971 Revert "[RISCV] LUI used for address computation should not isAsCheapAsAMove"
This reverts commit 23a5073600.

Although this patch achieved better codegen in most cases, it is really
important to accurately describe the cost of instructions. So I revert it.
2022-02-17 17:27:37 +08:00
Roman Lebedev 371fcb720e
[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP
That transformation is lossy, as discussed in
https://github.com/llvm/llvm-project/issues/53853
and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes https://github.com/llvm/llvm-project/issues/53853
Refs. https://github.com/rust-lang/rust/issues/85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
2022-02-17 12:13:55 +03:00
Amara Emerson c8b8c8e989 [AArch64][GlobalISel] Implement support for clang.arc.attachedcall call operand bundles.
Differential Revision: https://reviews.llvm.org/D119983
2022-02-16 17:35:22 -08:00
Yonghong Song 3671bdbcd2 [BPF] Fix a BTF type pruning bug
In BPF backend, BTF type generation may skip
some debuginfo types if they are the pointee
type of a struct member. For example,
  struct task_struct {
    ...
    struct mm_struct                *mm;
    ...
  };
BPF backend may generate a forward decl for
'struct mm_struct' instead of full type if
there are no other usage of 'struct mm_struct'.
The reason is to avoid bringing too much unneeded types
in BTF.

Alexei found a pruning bug where we may miss
some full type generation. The following is an illustrating
example:
   struct t1 { ... }
   struct t2 { struct t1 *p; };
   struct t2 g;
   void foo(struct t1 *arg) { ... }
In the above case, we will have partial debuginfo chain like below:
   struct t2 -> member p
                        \ -> ptr -> struct t1
                        /
     foo -> argument arg
During traversing
   struct t2 -> member p -> ptr -> struct t1
The corresponding BTF types are generated except 'struct t1' which
will be in FixUp stage. Later, when traversing
   foo -> argument arg -> ptr -> struct t1
The 'ptr' BTF type has been generated and currently implementation
ignores 'pointer' type hence 'struct t1' is not generated.

This patch fixed the issue not just for the above case, but for
general case with multiple derived types, e.g.,
   struct t2 -> member p
                        \ -> const -> ptr -> volatile -> struct t1
                        /
     foo -> argument arg

Differential Revision: https://reviews.llvm.org/D119986
2022-02-16 17:23:34 -08:00
Matt Arsenault 3884cb9235 AMDGPU: Always reserve VGPR for AGPR copies on gfx908
Just because there aren't AGPRs in the original program doesn't mean
the register allocator can't choose to use them (unless we were to
forcibly reserve all AGPRs if there weren't any uses). This happens in
high pressure situations and introduces copies to avoid spills.

In this test, the allocator ends up introducing a copy from SGPR to
AGPR which requires an intermediate VGPR. I don't believe it would
introduce a copy from AGPR to AGPR in this situation, since it would
be trying to use an intermediate with a different class.

Theoretically this is also broken on gfx90a, but I have been unable to
come up with a testcase.
2022-02-16 18:48:18 -05:00
Florian Mayer c195addb60 [NFC] [MTE] [HWASan] Remove unnecessary member of AllocaInfo
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D119981
2022-02-16 15:19:30 -08:00
Jacob Lambert 7470244475 [AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.

Differential Revision: https://reviews.llvm.org/D116140
2022-02-16 15:17:23 -08:00
Jessica Paquette 67ab4c010b [MachineOutliner] NFC: Update LRU stuff for RISCV
I missed it in my grep. Fixes broken buildbot.`
2022-02-16 12:01:59 -08:00
Jessica Paquette 6d58f4ab07 [MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.

This patch hides this stuff behind some helper functions:

* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`

This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.

Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.

Some other quality-of-code improvements:

* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`

This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
2022-02-16 11:39:07 -08:00
Jacob Lambert 0bad7cb565 Hoist getTotalNumVGPRs into AMDGPUBaseInfo for use in both codegen and MC
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D119912
2022-02-16 11:04:08 -08:00
Craig Topper cfbbcc544c [RISCV] Improve lowering of SHL_PARTS/SRL_PARTS/SRA_PARTS.
Part of the shift lowering creates a (sub XLEN-1, ShAmt). When this
value is used we know that ShAmt is [0..XLEN-1]. Since XLEN is a power
of 2 we can replace the sub with an xor. This allows us to use XORI
instead of LI+SUB.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D119411
2022-02-16 09:22:11 -08:00
Dmitry Preobrazhensky 6655c5a6bb [AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1
Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.

Differential Revision: https://reviews.llvm.org/D119939
2022-02-16 19:45:44 +03:00
Lei Huang 5abe6c312b [PowerPC] Rename PPCInstrPrefix.td to PPCInstrP10.td 2022-02-16 10:22:41 -06:00
Sheng 4306fbff9c Revert "Revert "[M68k] Adopt VarLenCodeEmitter for control instructions""
This reverts commit 69a7d49de6.

llvm/test/MC/M68k/Relaxations/branch.s needs disassembler support.

So I disabled it temporarily
2022-02-16 17:41:49 +08:00
Sheng 69a7d49de6 Revert "[M68k] Adopt VarLenCodeEmitter for control instructions"
This reverts commit 9ffd498fcb.

This patch introduce regression on MC/M68k/Relaxations/branch.s
2022-02-16 17:09:46 +08:00
Zakk Chen e8973dd389 [RISCV] Add the passthru operand for some RVV nomask unary and nullary intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

My plan is to handle more complex operations in follow-up patches.

Reviewers: frasercrmck

Differential Revision: https://reviews.llvm.org/D118253
2022-02-15 22:34:06 -08:00
Shao-Ce SUN 2aed07e96c [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 13:10:09 +08:00
Sheng 9ffd498fcb [M68k] Adopt VarLenCodeEmitter for control instructions
Refactor the instructions in M68kInstrControl.td to use the VarLenCodeEmitter.

This patch is tested by the existing test cases.

Reviewed By: myhsu, ricky26

Differential Revision: https://reviews.llvm.org/D119665
2022-02-16 12:54:20 +08:00
Min-Yih Hsu 53a2bf8ac7 [M68k][VarLenCodeEmitter] Support reloc & pc-rel immediate values
Supporting relocatable and pc-relative immediate values for the new code
emitter.

Differential Revision: https://reviews.llvm.org/D119101
2022-02-15 20:41:33 -08:00
Mubariz Afzal 1a5b881d4c Revert [SystemZ][z/OS] Fix f32 variadic argument assertion
This reverts ea0676f97d
2022-02-15 23:28:40 -05:00
Shao-Ce SUN 9cc49c1951 Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`"
This reverts commit fe25c06cc5.
2022-02-16 11:57:49 +08:00
Shao-Ce SUN fe25c06cc5 [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
For ten years, it seems that `MCRegisterInfo` is not used by any target.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 11:47:17 +08:00
Zakk Chen b784719904 [RISCV] Add the passthru operand for RVV nomask binary intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.

The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.

Reviewed by: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D117989
2022-02-15 18:36:18 -08:00
Matt Arsenault 898dc8a4b1 AMDGPU: Use subtarget in class instead of querying function 2022-02-15 21:28:12 -05:00
zhongyunde 064b2a6dc6 [DAGCombiner][AArch64] Enhance to fold CSNEG into CSINC instruction
Perform the scalar expression combine in the form of:
  CSNEG(1, c, cc) + b  =>  cc  ? b+1 : b-c => CSINC(b-c, b, !cc)
  CSNEG(c, -1, cc) + b =>  cc  ? b+c : b+1 => CSINC(b+c, b, cc)

Fix https://github.com/llvm/llvm-project/issues/53071

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D119105
2022-02-16 09:39:38 +08:00
Mubariz Afzal ea0676f97d [SystemZ][z/OS] Fix f32 variadic argument assertion
The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). It becomes a bitcast from f32 to i64.

Since we don't handle a bitcast for f32s this caused an assertion.
2022-02-15 18:11:57 -05:00
Florian Mayer a650bb58c0 [NFC] [MTE] only do one pass over allocas for stack tagging.
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D119801
2022-02-15 13:09:24 -08:00
Stanislav Mekhanoshin 29a0e0a9e5 [AMDGPU] Do not define GET_INSTRINFO_SCHED_ENUM
Autogenerated names are too long and break compilation on Windows,
while we do not need this enum at all.

Differential Revision: https://reviews.llvm.org/D119869
2022-02-15 13:00:54 -08:00
Simon Moll de42307e44 [VE] Fix breakage after D118981
VE backend code expected all VP SDNode to have a mask parameter.  This
is not the case with vp.select|merge after D118981.
2022-02-15 18:56:20 +01:00
Craig Topper ab6e02dded [RISCV] Match vwmulsu_vx with scalar splat input.
This is a more generic version of D119110 that uses MaskedValueIsZero
to do the matching and SimplifyDemandedBits to remove any unneeded
AND instructions.

Tests were taken from D119110.

Reviewed By: Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D119622
2022-02-15 08:45:21 -08:00
Craig Topper d132b47bb9 [RISCV] Replace llvm_unreachable with report_fatal_error.
Parsing errors aren't handled earlier in all cases. A simple
example is llc -mtriple=riscv64 -mattr=+zve32f. If F or Finx is
not also specified, this will hit a parse error.

Use a fatal_error so that the error is conveyed to the user.
2022-02-15 08:40:37 -08:00
Matt Devereau 7dce12de68 [AArch64] Suggest b.nfrst if the user tries b.nfirst.
Differential Revision: https://reviews.llvm.org/D119453

Co-authored-by: George Steed <george.steed@arm.com>
2022-02-15 15:06:04 +00:00
Amy Kwan ac5a5a9cfe [PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors.
This patch updates the handling of vectors in getPreferredVectorAction():

For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.

The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.

```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```

Differential Revision: https://reviews.llvm.org/D119521
2022-02-15 08:44:08 -06:00
Simon Pilgrim 2808743cbd [X86] LowerVSETCC - always split 512-bit vectors before lowering to PCMPEQ/GT (PR53842)
Extend the existing split where we already do this for v32i16/v64i8

We can end up trying to use PCMPEQ/GT if the result needs to be sign-extended (typically due to the DAGCombiner::foldSextSetcc fold).

Fixes #53842
2022-02-15 14:21:12 +00:00
Simon Moll 53efbc15cb [VE] v256i1 broadcast isel and tests
Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D119241
2022-02-15 12:40:51 +01:00
Markus Böck 78c27a3cee [X86][Win64] Avoid statepoints in trailing call position
The "avoid trailing call pass" makes sure that no function ends with a call instruction for the purpose of the unwinder.
It starts of by skipping over any non real instruction, which is approximated via the Pseudo and Meta property. This sadly leads to issues when the last machine instruction is a STATEPOINT, as it is skipped despite it lowering to a call.

This patch fixes the use of a statepoint in the trailing call position by making sure call instructions are not skipped.

Differential Revision: https://reviews.llvm.org/D119644
2022-02-15 12:17:19 +01:00
Jay Foad a65b9dd049 [AMDGPU] Divergence-driven instruction selection for bfm patterns
Differential Revision: https://reviews.llvm.org/D119706
2022-02-15 10:49:18 +00:00
Jay Foad f72d8897ac [AMDGPU] Honor !invariant.load metadata on load-like intrinsics
Differential Revision: https://reviews.llvm.org/D119739
2022-02-15 09:16:57 +00:00
Min-Yih Hsu b99365a7f4 [TableGen] Add a new `encoder` directive into VarLenCodeEmitterGen
The new encoder directive can be used to specify custom encoder for a
single operand or slice. This is different from the EncoderMethod field
within an Operand, which affects every operands in the target.

In addition, this patch also changes the function signature of the
encoder method -- a new argument, InsertPost, is added to both the
default one (i.e. getMachineValue) and the custom one. This argument
provides the bit position where the operand will eventually be inserted.

Differential Revision: https://reviews.llvm.org/D119100
2022-02-14 20:41:15 -08:00
Yonghong Song f419029fcd [BPF] Fix a bug in BTF_KIND_TYPE_TAG generation
Kumar Kartikeya Dwivedi reported a bug ([1]) where BTF_KIND_TYPE_TAG types
are not generated.

Currently, BPF backend only generates BTF types which are used by
the program, e.g., global variables, functions and some builtin functions.
For example, suppose we have
  struct task_struct {
    ...
    struct task_group               *sched_task_group;
    struct mm_struct                *mm;
    ...
    pid_t                           pid;
    pid_t                           tgid;
    ...
  }
If BPF program intends to access task_struct->pid and task_struct->tgid,
there really no need to generate BTF types for struct task_group
and mm_struct.

In BPF backend, during BTF generation, when generating BTF for struct
task_struct, if types for task_group and mm_struct have not been generated
yet, a Fixup structure will be created, which will be reexamined later
to instantiate into either a full type or a forward type.

In current implementation, if we have something like
  struct foo {
     struct bar  __tag1    *f;
  };
and when generating types for struct foo, struct bar type
has not been generated, the __tag1 will be lost during later
Fixup instantiation. This patch fixed this issue by properly
handling btf_type_tag's during Fixup instantiation stage.

  [1] https://lore.kernel.org/bpf/20220210232411.pmhzj7v5uptqby7r@apollo.legion/

Differential Revision: https://reviews.llvm.org/D119799
2022-02-14 19:43:57 -08:00