Eliminates the need for working around optional and token operands being
mistakenly parsed as expressions.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D138492
Marking rs1 (memory offset base) as memory operand provides additional
semantic value to this operand that can be used by different tools
(e.g. llvm-exegesis).
This change does not affect neigther Isel nor assembler. However it
required some tweaks in tablegen compressed inst emmiter.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D136847
When calculating the size of concatenated subregisters, and at least
one of the subregisters involved has an unknown size (-1), then the
concatenated size should be set to -1 as well.
This bug was found for an out-of-tree target.
Looking at lib/Target the only in-tree target that has a subregister
with unknown size is X86:
X86RegisterInfo.td: def sub_mask_0 : SubRegIndex<-1>;
But it looks like sub_mask_0 don't result in any concatenated subreg
index with faulty size if looking at X86SubRegIdxRanges[].
Differential Revision: https://reviews.llvm.org/D138341
This patch adds dumping of the Offset and Size info for each
SubRegIndex printed when using
llvm-tblgen -gen-register-info -register-info-debug
It also updates the ConcatenatedSubregs.td to check those printouts,
including some new subreg definitions that show short-comings in
how the size is calculated when concatenating subregisters and at
least one has an incomplete size (-1). Today TableGen will just add
sizes together, resulting in MCRegisterInfo::getSubRegIdxSize()
returning a value that isn't -1 even if the combined subregister size
is unknown.
Differential Revision: https://reviews.llvm.org/D138340
The TableGen implementation was using a homegrown implementation of
FunctionModRefInfo. This switches it to use MemoryEffects instead.
This makes the code simpler, and will allow exposing the full
representational power of MemoryEffects in the future. Among other
things, this will allow us to map IntrHasSideEffects to an
inaccessiblemem readwrite, rather than just ignoring it entirely
in most cases.
To avoid layering issues, this moves the ModRef.h header from IR
to Support, so that it can be included in the TableGen layer.
Differential Revision: https://reviews.llvm.org/D137641
Currently, the asm parser stops matching instruction operands as soon as
the first optional operand is encountered. This leads to the need for
custom checks on missing mandatory operands that come after optional
operands.
The patch changes the parser to always match all optional and mandatory
instruction operands, thus making the custom checks unnecessary. This is
particularly useful for the AMDGPU backend where we have numerous
optional instruction modifiers.
Differential Revision: https://reviews.llvm.org/D137549
Currently, the asm parser stops matching instruction operands as soon as the first optional operand is encountered. This leads to the need for custom checks on missing mandatory operands that come after optional operands.
The patch changes the parser to always match all optional and mandatory instruction operands, thus making the custom checks unnecessary. This is particularly useful for the AMDGPU backend where we have numerous optional instruction modifiers.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D137549
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780
A new register class as well as a number of related subregisters are being added
to Future CPU. These registers are Dense Math Registers (DMR) and are 1024 bits
long. These regsiters can also be used in consecutive pairs which leads to a
register that is 2048 bits.
This patch also adds 7 new instructions that use these registers. More
instructions will be added in future patches.
Reviewed By: amyk, saghir
Differential Revision: https://reviews.llvm.org/D136366
This change modifies the implementation of the format() function
so that vendor forks committed to building with compilers that
support __attribute__((format)) on non-variadic functions can
check the format() function with it.
rdar://84571523
Currently the DecoderEmitter constructor takes a bunch of string
parameters containing bits of code to interpolate.
However, there's only two ways it can be called. The one used for most
targets which doesn't handle the SoftFail DecoderStatus (not a
problem, because they don't use SoftFail). The other mode, which is
used for ARM/AArch64, does handle SoftFail, but requires an externally
defined helper function in those targets.
This is unnecessary complication; remove the parameters, and unify
onto a single version which does support SoftFail, defining the helper
itself.
This change modifies the implementation of the format() function
so that vendor forks committed to building with compilers that
support __attribute__((format)) on non-variadic functions can
check the format() function with it.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D132413
rdar://84571523
Small QoL change to allow Predicates to be used in GICombineRule.
Currently only one combine in the AMDGPU backend makes use of it.
The implementation is pretty simple to get started but of course we can expand this later on and optimize predicate checking better if needed.
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D136681
This patch adds the assembly/disassembly for the following instructions:
ADD (to vector): Add replicated single vector to multi-vector with multi-vector result.
SQDMULH (multiple and single vector): Multi-vector signed saturating doubling multiply high by vector.
for 2 and 4 ZA SVE registers.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
It also adds more size for the multiple register tuple:
ZZ_b_mul_r, ZZ_h_mul_r,
ZZZZ_b_mul_r, ZZZZ_h_mul_r,
for 8 bits and 16 bits with 2 and 4 ZA registers.
Depends on: D135468
With a fix for Mips for this test:
llvm/test/MC/Mips/mips64r6/valid.s
Differential Revision: https://reviews.llvm.org/D135563
This patch adds the assembly/disassembly for the following instructions:
ADD (to vector): Add replicated single vector to multi-vector with multi-vector result.
SQDMULH (multiple and single vector): Multi-vector signed saturating doubling multiply high by vector.
for 2 and 4 ZA SVE registers.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
It also adds more size for the multiple register tuple:
ZZ_b_mul_r, ZZ_h_mul_r,
ZZZZ_b_mul_r, ZZZZ_h_mul_r,
for 8 bits and 16 bits with 2 and 4 ZA registers.
Depends on: D135468
Differential Revision: https://reviews.llvm.org/D135563
Currently attributes for intrinsics are emitted using the
ArrayRef<AttrKind> based constructor for AttributeLists. This works
out fine for simple enum attributes, but doesn't really generalize
to attributes that accept values. We're already doing something
awkward for alignment attributes, and I'd like to have a cleaner
solution to this with
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579 in mind.
The new generation approach is to instead directly construct
Attributes, giving us access to the full generality of that
interface. To avoid significantly increasing the size of the
generated code, we now also deduplicate the attribute sets. The
code generated per unique AttributeList looks like this:
case 204: {
AS[0] = {1, getIntrinsicArgAttributeSet(C, 5)};
AS[1] = {AttributeList::FunctionIndex, getIntrinsicFnAttributeSet(C, 10)};
NumAttrs = 2;
break;
}
and then the helper functions contain something like
case 5:
return AttributeSet::get(C, {
Attribute::get(C, Attribute::NoCapture),
});
and
case 10:
return AttributeSet::get(C, {
Attribute::get(C, Attribute::NoUnwind),
Attribute::get(C, Attribute::ArgMemOnly),
});
A casualty of this change is the intrin-properties.td test, as I
don't think that FileCheck allows matching this kind of output.
Differential Revision: https://reviews.llvm.org/D135679
Instead of a flat list that includes the argument index, use a
nested vector, where each inner vector is the attribute set for
a single argument. This is more obvious and makes followup changes
simpler.
The opcode field in most places uses unsigned type.
InstrInfoEmitter still uses signed int for the
custom opcodes like CFSetupOpcode.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135140
Without it, the count of renderer functions is inaccurate and, in some
edge cases (like the patterns added in D134354), we can actually
go out of bounds (run out of pre-allocated renderer function spaces
in the GISel state)
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134861
Summary:
The existing undefined-bitfield-to-operand matching behavior is very
hard to understand, due to the combination of positional and named
matching. This can make it difficult to track down a bug in a target's
instruction definitions.
Over the last decade, folks have tried to work-around this in various
ways, but it's time to finally ditch the positional matching. With
https://reviews.llvm.org/D131003, there are no longer cases that
_require_ positional matching, and it's time to start removing usage
and support for it.
Therefore: add a (default-false) option, and set it to true only in
those targets that require positional matching today. Subsequent
changes will start cleaning up additional in-tree targets.
NOTE TO OUT OF TREE TARGET MAINTAINERS:
If this change breaks your build, you may restore the previous
behavior simply by adding:
let useDeprecatedPositionallyEncodedOperands = 1;
to your target's InstrInfo tablegen definition. However, this is
temporary -- the option will be removed in the future.
If your target does not set 'decodePositionallyEncodedOperands', you
may thus start migrating to named operands. However, if you _do_
currently set that option, I recommend waiting until a subsequent
change lands, which adds decoder support for named sub-operands.
Differential Revision: https://reviews.llvm.org/D134073
These names can then be matched by name against 'bits' fields in a
record, to populate an instruction's encoding.
This does _not_ yet change DecoderEmitter to allow by-name matching of
sub-operands. Unlike the encoder, the decoder already defaulted to not
supporting positional matching, and backends had workarounds in place
for the missing decoding support.
Additionally, use this new capability to allow the ARM and AArch64
backends not to require any positional operand matching.
Differential Revision: https://reviews.llvm.org/D131003
We have namespaces `DXIL` and `dxil`, which is just confusing. This
renames `DXIL` -> `dxil` making everything consistent.
While the LLVM coding standards don't have a clear direction here, I
chose lower case because by my current unscientific count there are
more places where we had the lowercase namespace than the uppercase.
This allows MachineCopyPropagation to eliminate copies of constant registers
such as zero registers. They were previously not being eliminated as the
check for MO.clobbersPhysReg(AvailSrc) would return true for constant
registers such as MIPS $zero.
To avoid having to manually add the zero registers to all CalleeSavedRegs
instantiations in tablegen, I instead added a new isConstant bit to the
Register and set this for MIPS, RISC-V, and AArch64 zero registers.
RegisterInfoEmitter.cpp looks at this flag and adds all constant registers
to the preserved register mask.
This may also benefit other passes but so far I have only seen differences
in MachineCopyPropagation. In the future it might make sense to generate
`isConstantPhysReg()` from this information.
Original source: 8588d8b814
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D131958
In `GIMatchTreeOpcodePartitioner::applyForPartition()`, the loop over
the possible leaves skip a leaf if the instruction does not care
about the instruction.
When processing the referenced operands in the next loop the same
leaves need to be skipped.
Later, when these leaves are added to all partitions, the bit vector
must be resized first before the bit representing the leaf is set.
This fixes a crash in llvm-tblgen.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134192
CodeGenSchedModels::hasReadOfWrite tries to predicate whether the WriteDef is contained in the list of ValidWrites of someone ProcReadAdvance,
so that WriteID of WriteDef can be compressed and reusable.
It tries to iterate all ProcReadAdvance entry, but not all ProcReadAdvance defs also inherit from SchedRead.
Some ProcReadAdvances are defined by ReadAdvance.So it's not complete to enumerate all ProcReadAdvances if just iterate all SchedReads.
Differential Revision: https://reviews.llvm.org/D132205
The following changes are necessasy to get the generated tree
matcher to compile:
- In CodeExpansions::declare(), the assert() prevents connecting
two instructions. E.g. the match code
(match (MUL $t, $s1, $s2),
(SUB $d, $t, $s3)),
results in two declarations of $t, one for the def and one for
the use. Removing the assertion allows this construct.
If $t is later used, it is one of the operands, which should be
perfectly fine.
- The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode()
is not compilable:
- The value of NewInstrID should be emitted, not the name
- Both calls involving getOperand() end with one parenthesis too many
- Swaps generated condition for the partition code in the latter function
It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold
to use the tree matcher for a linear match. These rules are tested by:
CodeGen/AArch64/GlobalISel/combine-fabs.mir
CodeGen/AArch64/GlobalISel/combine-fneg.mir
CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir
CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D133257
The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.
AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.
Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
`llvm` and downstream internal callers no longer use `array_lengthof`, so drop
the include everywhere.
Differential Revision: https://reviews.llvm.org/D133600
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D130884
Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be
simplified into a single ADD3 instruction instead of two adds.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D131254