Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually.
OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D114642
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D119686
This is some NFC (hopefully!) cleanup for performCommonVectorExtendCombine
and related methods, removing conditions that cannot occur and otherwise
cleaning up the code a little.
This reverts commit 23a5073600.
Although this patch achieved better codegen in most cases, it is really
important to accurately describe the cost of instructions. So I revert it.
In BPF backend, BTF type generation may skip
some debuginfo types if they are the pointee
type of a struct member. For example,
struct task_struct {
...
struct mm_struct *mm;
...
};
BPF backend may generate a forward decl for
'struct mm_struct' instead of full type if
there are no other usage of 'struct mm_struct'.
The reason is to avoid bringing too much unneeded types
in BTF.
Alexei found a pruning bug where we may miss
some full type generation. The following is an illustrating
example:
struct t1 { ... }
struct t2 { struct t1 *p; };
struct t2 g;
void foo(struct t1 *arg) { ... }
In the above case, we will have partial debuginfo chain like below:
struct t2 -> member p
\ -> ptr -> struct t1
/
foo -> argument arg
During traversing
struct t2 -> member p -> ptr -> struct t1
The corresponding BTF types are generated except 'struct t1' which
will be in FixUp stage. Later, when traversing
foo -> argument arg -> ptr -> struct t1
The 'ptr' BTF type has been generated and currently implementation
ignores 'pointer' type hence 'struct t1' is not generated.
This patch fixed the issue not just for the above case, but for
general case with multiple derived types, e.g.,
struct t2 -> member p
\ -> const -> ptr -> volatile -> struct t1
/
foo -> argument arg
Differential Revision: https://reviews.llvm.org/D119986
Just because there aren't AGPRs in the original program doesn't mean
the register allocator can't choose to use them (unless we were to
forcibly reserve all AGPRs if there weren't any uses). This happens in
high pressure situations and introduces copies to avoid spills.
In this test, the allocator ends up introducing a copy from SGPR to
AGPR which requires an intermediate VGPR. I don't believe it would
introduce a copy from AGPR to AGPR in this situation, since it would
be trying to use an intermediate with a different class.
Theoretically this is also broken on gfx90a, but I have been unable to
come up with a testcase.
gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.
Differential Revision: https://reviews.llvm.org/D116140
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.
This patch hides this stuff behind some helper functions:
* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`
This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.
Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.
Some other quality-of-code improvements:
* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`
This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
Part of the shift lowering creates a (sub XLEN-1, ShAmt). When this
value is used we know that ShAmt is [0..XLEN-1]. Since XLEN is a power
of 2 we can replace the sub with an xor. This allows us to use XORI
instead of LI+SUB.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D119411
Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.
Differential Revision: https://reviews.llvm.org/D119939
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
My plan is to handle more complex operations in follow-up patches.
Reviewers: frasercrmck
Differential Revision: https://reviews.llvm.org/D118253
Refactor the instructions in M68kInstrControl.td to use the VarLenCodeEmitter.
This patch is tested by the existing test cases.
Reviewed By: myhsu, ricky26
Differential Revision: https://reviews.llvm.org/D119665
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.
The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.
Reviewed by: craig.topper, rogfer01
Differential Revision: https://reviews.llvm.org/D117989
The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). It becomes a bitcast from f32 to i64.
Since we don't handle a bitcast for f32s this caused an assertion.
Autogenerated names are too long and break compilation on Windows,
while we do not need this enum at all.
Differential Revision: https://reviews.llvm.org/D119869
This is a more generic version of D119110 that uses MaskedValueIsZero
to do the matching and SimplifyDemandedBits to remove any unneeded
AND instructions.
Tests were taken from D119110.
Reviewed By: Chenbing.Zheng
Differential Revision: https://reviews.llvm.org/D119622
Parsing errors aren't handled earlier in all cases. A simple
example is llc -mtriple=riscv64 -mattr=+zve32f. If F or Finx is
not also specified, this will hit a parse error.
Use a fatal_error so that the error is conveyed to the user.
This patch updates the handling of vectors in getPreferredVectorAction():
For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.
The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.
```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```
Differential Revision: https://reviews.llvm.org/D119521
Extend the existing split where we already do this for v32i16/v64i8
We can end up trying to use PCMPEQ/GT if the result needs to be sign-extended (typically due to the DAGCombiner::foldSextSetcc fold).
Fixes#53842
The "avoid trailing call pass" makes sure that no function ends with a call instruction for the purpose of the unwinder.
It starts of by skipping over any non real instruction, which is approximated via the Pseudo and Meta property. This sadly leads to issues when the last machine instruction is a STATEPOINT, as it is skipped despite it lowering to a call.
This patch fixes the use of a statepoint in the trailing call position by making sure call instructions are not skipped.
Differential Revision: https://reviews.llvm.org/D119644
The new encoder directive can be used to specify custom encoder for a
single operand or slice. This is different from the EncoderMethod field
within an Operand, which affects every operands in the target.
In addition, this patch also changes the function signature of the
encoder method -- a new argument, InsertPost, is added to both the
default one (i.e. getMachineValue) and the custom one. This argument
provides the bit position where the operand will eventually be inserted.
Differential Revision: https://reviews.llvm.org/D119100
Kumar Kartikeya Dwivedi reported a bug ([1]) where BTF_KIND_TYPE_TAG types
are not generated.
Currently, BPF backend only generates BTF types which are used by
the program, e.g., global variables, functions and some builtin functions.
For example, suppose we have
struct task_struct {
...
struct task_group *sched_task_group;
struct mm_struct *mm;
...
pid_t pid;
pid_t tgid;
...
}
If BPF program intends to access task_struct->pid and task_struct->tgid,
there really no need to generate BTF types for struct task_group
and mm_struct.
In BPF backend, during BTF generation, when generating BTF for struct
task_struct, if types for task_group and mm_struct have not been generated
yet, a Fixup structure will be created, which will be reexamined later
to instantiate into either a full type or a forward type.
In current implementation, if we have something like
struct foo {
struct bar __tag1 *f;
};
and when generating types for struct foo, struct bar type
has not been generated, the __tag1 will be lost during later
Fixup instantiation. This patch fixed this issue by properly
handling btf_type_tag's during Fixup instantiation stage.
[1] https://lore.kernel.org/bpf/20220210232411.pmhzj7v5uptqby7r@apollo.legion/
Differential Revision: https://reviews.llvm.org/D119799