This patch continues a series of patches that decrease time spent by
GlobalISel in its InstructionSelect pass by roughly 60% for -O0 builds
for large inputs as measured on sqlite3-amalgamation
(http://sqlite.org/download.html) targeting AArch64.
This commit specifically removes number of operands checks that are
redundant if the instruction's opcode already guarantees that number
of operands (or more), and also avoids any kind of checks on a def
operand of a nested instruction as everything about it was already
checked at its use.
The expected performance implication is about 3% off InstructionSelect
comparing to the baseline (before the series of patches)
This patch also contains a bit of NFC changes required for further
patches in the series.
Every commit planned shares the same Phabricator Review.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 332945
Apparently the compile time problem was caused by the fact that not
all compilers / STL implementations can automatically convert
std::unique_ptr<Derived> to std::unique_ptr<Base>. Fixed (hopefully)
by making sure it's std::unique_ptr<Derived>&& (rvalue ref) to
std::unique_ptr<Base> conversion instead.
llvm-svn: 332917
This patch starts a series of patches that decrease time spent by
GlobalISel in its InstructionSelect pass by roughly 60% for -O0 builds
for large inputs as measured on sqlite3-amalgamation
(http://sqlite.org/download.html) targeting AArch64.
The performance improvements are achieved solely by reducing the
number of matching GIM_* opcodes executed by the MatchTable's
interpreter during the selection by approx. a factor of 30, which also
brings contribution of this particular part of the selection process
to the overall runtime of InstructionSelect pass down from approx.
60-70% to 5-7%, thus making further improvements in this particular
direction not very profitable.
The improvements described above are expected for any target that
doesn't have many complex patterns. The targets that do should
strictly benefit from the changes, but by how much exactly is hard to
estimate beforehand. It's also likely that such target WILL benefit
from further improvements to MatchTable, most likely the ones that
bring it closer to a perfect decision tree.
This commit specifically is rather large mostly NFC commit that does
necessary preparation work and refactoring, there will be a following
series of small patches introducing a specific optimization each
shortly after.
This commit specifically is expected to cause a small compile time
regression (around 2.5% of InstructionSelect pass time), which should
be fixed by the next commit of the series.
Every commit planned shares the same Phabricator Review.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 332907
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
This implements a new table-gen emitter to create tables for
a wasm disassembler, and a dissassembler to use them.
Comes with 2 tests, that tests a few instructions manually. Is also able to
disassemble large .wasm files with objdump reasonably.
Not working so well, to be addressed in followups:
- objdump appears to be passing an incorrect starting point.
- since the disassembler works an instruction at a time, and it is
disassembling stack instruction, it has no idea of pseudo register assignments.
These registers are required for the instruction printing code that follows.
For now, all such registers appear in the output as $0.
Patch by Wouter van Oortmerssen
Differential Revision: https://reviews.llvm.org/D45848
llvm-svn: 332052
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
improve optimization of extends and truncates, this legality requirement
would spread without considerable care w.r.t when certain combines were
permitted.
* The SelectionDAG importer required some ugly and fragile pattern
rewriting to translate patterns into this style.
This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.
Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.
The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.
Depends on D45540
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar
Reviewed By: rtereshin
Subscribers: javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45541
llvm-svn: 331601
to make sure that Testgen always has access to coverage info even if
the match table used by the selector itself is stripped off that
information for performance reasons.
Reviewers: dsanders, aemerson
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, llvm-commits, dsanders
Differential Revision: https://reviews.llvm.org/D46098
llvm-svn: 331398
to share it between the Instruction Selector in optimized and
non-optimized modes both and the Testgen.
Reviewers: dsanders, aemerson
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, llvm-commits, dsanders
Differential Revision: https://reviews.llvm.org/D46097
llvm-svn: 331396
The main goal is to share getMatchTable between the Instruction
Selector and the Testgen.
The commit also contains some NFC only loosely related to refactoring
out the getMatchTable, but strongly related to the initial Testgen
patch (see https://reviews.llvm.org/D43962)
Reviewers: dsanders, aemerson
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, llvm-commits, dsanders
Differential Revision: https://reviews.llvm.org/D46096
llvm-svn: 331395
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
Previously for instructions like fxsave we would print "opaque ptr" as part of the memory operand. Now we print nothing.
We also no longer accept "opaque ptr" in the parser. We still accept any size to be specified for these instructions, but we may want to consider only parsing when no explicit size is specified. This what gas does.
llvm-svn: 331243
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
An optional, light-weight and backward-compatible mechanism to allow
specifying that a diagnostic _only_ applies to a partial mismatch (NearMiss),
rather than a full mismatch.
Patch [1/2] in a series to improve assembler diagnostics for SVE.
- Patch [1/2]: https://reviews.llvm.org/D45879
- Patch [2/2]: https://reviews.llvm.org/D45880
Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar
Reviewed By: olista01
Differential Revision: https://reviews.llvm.org/D45879
llvm-svn: 330930
This patch ensures that the pfm issue counter tables are the correct size, accounting for the invalid resource entry at the beginning of the resource tables.
It also fixes an issue with pfm failing to match event counters due to a trailing comma added to all the event names.
I've also added a counter comment to each entry as it helps locate problems with the tables.
Note: I don't have access to a SandyBridge test machine, which is the only model to make use of multiple event counters being mapped to a single resource. I don't know if pfm accepts a comma-seperated list or not, but that is what it was doing.
Differential Revision: https://reviews.llvm.org/D45787
llvm-svn: 330317
Summary:
Subtargets can define the libpfm counter names that can be used to
measure cycles and uops issued on ProcResUnits.
This allows making llvm-exegesis available on more targets.
Fixes PR36984.
Reviewers: gchatelet, RKSimon, andreadb, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45360
llvm-svn: 329675
Summary:
This patch implements a tablegen-driven Instruction Compression
mechanism for generating RISCV compressed instructions
(C Extension) from the expanded instruction form.
This tablegen backend processes CompressPat declarations in a
td file and generates all the compile-time and runtime checks
required to validate the declarations, validate the input
operands and generate correct instructions.
The checks include validating register operands, immediate
operands, fixed register operands and fixed immediate operands.
Example:
class CompressPat<dag input, dag output> {
dag Input = input;
dag Output = output;
list<Predicate> Predicates = [];
}
let Predicates = [HasStdExtC] in {
def : CompressPat<(ADD GPRNoX0:$rs1, GPRNoX0:$rs1, GPRNoX0:$rs2),
(C_ADD GPRNoX0:$rs1, GPRNoX0:$rs2)>;
}
The result is an auto-generated header file
'RISCVGenCompressEmitter.inc' which exports two functions for
compressing/uncompressing MCInst instructions, plus
some helper functions:
bool compressInst(MCInst& OutInst, const MCInst &MI,
const MCSubtargetInfo &STI,
MCContext &Context);
bool uncompressInst(MCInst& OutInst, const MCInst &MI,
const MCRegisterInfo &MRI,
const MCSubtargetInfo &STI);
The clients that include this auto-generated header file and
invoke these functions can compress an instruction before emitting
it, in the target-specific ASM or ELF streamer, or can uncompress
an instruction before printing it, when the expanded instruction
format aliases is favored.
The following clients were added to implement compression\uncompression
for RISCV:
1) RISCVAsmParser::MatchAndEmitInstruction:
Inserted a call to compressInst() to compresses instructions
parsed by llvm-mc coming from an ASM input.
2) RISCVAsmPrinter::EmitInstruction:
Inserted a call to compressInst() to compress instructions that
were lowered from Machine Instructions (MachineInstr).
3) RVInstPrinter::printInst:
Inserted a call to uncompressInst() to print the expanded
version of the instruction instead of the compressed one (e.g,
add s0, s0, a5 instead of c.add s0, a5) when -riscv-no-aliases
is not passed.
This patch squashes D45119, D42780 and D41932. It was reviewed in smaller patches by
asb, efriedma, apazos and mgrang.
Reviewers: asb, efriedma, apazos, llvm-commits, sabuasal
Reviewed By: sabuasal
Subscribers: mgorny, eraman, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng
Differential Revision: https://reviews.llvm.org/D45385
llvm-svn: 329455
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: stoklund, kparzysz, dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45144
llvm-svn: 329451
This patch adds the ability to describe properties of the hardware retire
control unit.
Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).
A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.
A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize. A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".
Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.
Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).
This patch fixes PR36661.
Differential Revision: https://reviews.llvm.org/D45259
llvm-svn: 329304
For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp
This patch causes problems for a number of models that had been incorrectly flagged as complete.
Differential Revision: https://reviews.llvm.org/D43235
llvm-svn: 329280
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
Summary:
We will use this in the AMDGPU backend in a subsequent patch
in the stack to lookup target-specific per-intrinsic information.
The generic CodeGenIntrinsic machinery is used to ensure that,
even though we don't calculate actual enum values here, we do
get the intrinsics in the right order for the binary search
index.
Change-Id: If61cd5587963a4c5a1cc53df1e59c5e4dec1f9dc
Reviewers: arsenm, rampitec, b-sumner
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D44935
llvm-svn: 328937
This patch throws a fatal error if an instregex entry doesn't actually match any instructions. This is part of the work to reduce the compile time impact of increased instregex usage (PR35955), although the x86 models seem to be relatively clean.
All the cases I encountered have now been fixed in trunk and this will ensure they don't get reintroduced.
Differential Revision: https://reviews.llvm.org/D44687
llvm-svn: 328459
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
This is used from llvm tblgen and the X86Disassembler - the only common
library (apart from TableGen, which probably doesn't make sense to have
as a dependency from a release tool (rather than a use-while-building-llvm
tool) of LLVM)
llvm-svn: 328393
This makes the Y position consistent with other instructions.
This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.
llvm-svn: 328254
We already know all the of instructions we're processing in the instruction loop belong to no class or all to the same class. So we only have to worry about remapping one class. So hoist it all out and remove the SmallPtrSet that tracked which class we'd already remapped.
I had to introduce new instruction loop inside this code to print an error message, but that only occurs on the error path.
llvm-svn: 328142
We already have an OldSCIdx variable in the outer loop here. And we already did the map lookup in the loop that populated ClassInstrs. And the outer OldSCIdx got it from ClassInstrs.
llvm-svn: 328139
Summary:
This code previously had a SmallVector of std::pairs containing an unsigned and another SmallVector. The outer vector was using the unsigned effectively as a key to decide which SmallVector to add into. So each time something new needed to be added the out vector needed to be scanned. If it wasn't found a new entry needed to be added to be added. This sounds very much like a map, but the next loop iterates over the outer vector to get a deterministic order.
We can simplify this code greatly if use SmallMapVector instead. This uses more stack space since we now have a vector and a map, but the searching and creating new entries all happens behind the scenes. It should also make the search more efficient though usually there are only a few entries so that doesn't matter much.
We could probably get determinism by just using std::map which would iterate over the unsigned key, but that would generate different output from what we get with the current implementation.
Reviewers: RKSimon, dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44711
llvm-svn: 328070
Both vectors contain unsigned so we can just use append to do the copying. Not only is this shorter, but it should be able to predict the final size and only grow the vector once if needed.
llvm-svn: 328033
Registers E[A-D]X, E[SD]I, E[BS]P, and EIP have 16-bit subregisters
that cover the low halves of these registers. This change adds artificial
subregisters for the high halves in order to differentiate (in terms of
register units) between the 32- and the low 16-bit registers.
This patch contains parts that aim to preserve the calculated register
pressure. This is in order to preserve the current codegen (minimize the
impact of this patch). The approach of having artificial subregisters
could be used to fix PR23423, but the pressure calculation would need
to be changed.
Differential Revision: https://reviews.llvm.org/D43353
llvm-svn: 328016
This is similar to the check later when we remap some of the instructions from one class to a new one. But if we reuse the class we don't get to do that check.
So many CPUs have violations of this check that I had to add a flag to the SchedMachineModel to allow it to be disabled. Hopefully we can get those cleaned up quickly and remove this flag.
A lot of the violations are due to overlapping regular expressions, but that's not the only kind of issue it found.
llvm-svn: 327808
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.
This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.
Differential Revision: https://reviews.llvm.org/D41879
llvm-svn: 327767
Remove the special casing for MRM_F8 by using HANDLE_OPTIONAL.
This should be NFC as the forms that were missing aren't used by any instructions today. They exist in the enum so that we didn't have to put them in one at a time when instructions are added. But looks like we failed here.
llvm-svn: 327298
With this patch, the tablegen 'SubtargetEmitter' always generates processor
resource names.
The impact of this patch on the code size of other llvm tools is small. I have
observed an average increase of 0.03% in code size when doing a release build of
LLVM (on windows, using MSVC) with all the default backends.
This change is done in preparation for the upcoming llvm-mca patch.
llvm-svn: 326993
The former simply makes more sense: we want to access the data here in
the backend, not information about the type.
More importantly, removing users of RecordRecTy::getRecord() allows us
more freedom to refactor the frontend.
Change-Id: Iee8905fd22cdb9b11c42ca03246c03d8fe4dd77f
llvm-svn: 326699
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers. This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.
Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).
Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.
Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.
Clear the IsRenamable bit when changing an operand's register value.
Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.
Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.
Reviewers: qcolombet, MatzeB
Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D43042
llvm-svn: 325931
This patch changes GlobalISelEmitter to rank patterns similar to how the
DAG does it (ie it computes a score for a pattern and adds the added
complexity to it).
This is so that the decision tree for GISelSelector remains compatible
with that of SelectionDAG.
https://reviews.llvm.org/D43270
llvm-svn: 325401
Summary:
This patch makes the decoder understand old AMD 3DNow!
instructions that have never been properly supported in the X86
disassembler, despite being supported in other subsystems. Hopefully
this should make the X86 decoder more complete with respect to binaries
containing legacy code.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits, maksfb, bruno
Differential Revision: https://reviews.llvm.org/D43311
llvm-svn: 325295
Summary:
Right now using a ProcResource automatically counts as usage of all
super ProcResGroups. All this is done during codegen, so there is no
way for schedulers to get this information at runtime.
This adds the information of which individual ProcRes units are
contained in a ProcResGroup in MCProcResourceDesc.
Reviewers: gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43023
llvm-svn: 324582
Summary:
Right now only the ProcResourceUnits that are directly referenced by
instructions are emitted. This change emits all of them, so that
analysis passes can use the information.
This has no functional impact. It typically adds a few entries (e.g. 4
for X86/haswell) to the generated ProcRes table.
Reviewers: gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42903
llvm-svn: 324228
Summary:
This is a bit of a reimplementation the work done in
https://reviews.llvm.org/D41446, since that patch only really works for
tied operands of instructions, not aliases.
Instead of checking the constraints based on the matched instruction's opcode,
this patch uses the match-info's convert function to check the operand
constraints for that specific instruction/alias.
This is based on the matched operands for the instruction, not the
resulting opcode of the MCInst.
This patch adds the following enum/table to the *GenAsmMatcher.inc file:
enum {
Tie0_1_1,
Tie0_1_2,
Tie0_1_5,
...
};
const char TiedAsmOperandTable[][3] = {
/* Tie0_1_1 */ { 0, 1, 1 },
/* Tie0_1_2 */ { 0, 1, 2 },
/* Tie0_1_5 */ { 0, 1, 5 },
...
};
And it is referenced directly in the ConversionTable, like this:
static const uint8_t ConversionTable[CVT_NUM_SIGNATURES][13] = {
...
{ CVT_95_addRegOperands, 1,
CVT_95_addRegOperands, 2,
CVT_Tied, Tie0_1_5,
CVT_95_addRegOperands, 6, CVT_Done },
...
The Tie0_1_5 (and corresponding table) encodes that:
* Result operand 0 is the operand to copy (which is e.g. done when
building up the operands to the MCInst in convertToMCInst())
* Asm operands 1 and 5 should be the same operands (which is checked
in checkAsmTiedOperandConstraints()).
Reviewers: olista01, rengolin, fhahn, craig.topper, echristo, apazos, dsanders
Reviewed By: olista01
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42293
llvm-svn: 324196
Summary:
Apparently, we missed on constraining register classes of VReg-operands of all the instructions
built from a destination pattern but the root (top-level) one. The issue exposed itself
while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped
into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained,
while nested VTOSIZS (or rather its destination virtual register to be exact) does not.
Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI.
https://bugs.llvm.org/show_bug.cgi?id=35965
rdar://problem/36886530
Patch by Roman Tereshin
Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan
Reviewed By: dsanders, qcolombet, rovka
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42565
llvm-svn: 323692
Collected statistics for the number of patterns emitted can be
incorrect because rules can be grouped if OptimizeMatchTable
is enabled. Increase the counter in RuleMatcher::emit(...)
to avoid that.
llvm-svn: 323391
This is a bit of a hack, but removes a cycle that broke modular builds
of LLVM. Of course the cycle is still there in form of a dependency
on the .def file.
llvm-svn: 323383
llvm::Regex is still the slowest regex engine on earth, running it over
all instructions on X86 takes a while. Extract a prefix and use a binary
search to reduce the search space before we resort to regex matching.
There are a couple of caveats here:
- The generic opcodes are outside of the sorted enum. They're handled in an extra loop.
- If there's a top-level bar we can't use the prefix trick.
- We bail on top-level ?. This could be handled, but it's rare.
This brings the time to generate X86GenInstrInfo.inc from 21s to 4.7s on
my machine.
llvm-svn: 323277
It appears that we haven't been prioritizing rules that contain nested
instructions properly. InstructionOperandMatcher didn't override
isHigherPriorityThan so it never compared the instructions/operands/predicates
inside nested instructions.
Fixes PR35926. Thanks to Diana Picus for the bug report.
llvm-svn: 322754
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.
Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.
def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
GISDNodeXFormEquiv<imm8_xform>;
Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);
Reviewers: dsanders, ab, rovka
Reviewed By: dsanders
Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet
Differential Revision: https://reviews.llvm.org/D42012
llvm-svn: 322582
Prior to this we had a separate instruction and register class that excluded eax to prevent matching the instruction that would encode with 0x90.
This patch changes this to just use an InstAlias to force xchgl %eax, %eax to use XCHG32rr instruction in 64-bit mode. This gets rid of the separate instruction and register class.
llvm-svn: 322532
Summary:
This extends TableGen's AsmMatcherEmitter with code that generates
a table with tied-operand constraints. The constraints are checked
when parsing the instruction. If an operand is not equal to its tied operand,
the assembler will give an error.
Patch [2/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB.
Reviewers: olista01, rengolin, mcrosier, fhahn, craig.topper, evandro, echristo
Reviewed By: fhahn
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D41446
llvm-svn: 322166
This patch improves diagnostic for case when mapped instruction
does not contain a field listed under RowFields.
Differential Revision: https://reviews.llvm.org/D41778
llvm-svn: 322004
This change deals with intrinsics with multiple outputs, for example load
instrinsic with address updated.
DAG selection for Instrinsics could be done either through source code or
tablegen. Handling all intrinsics in source code would introduce a huge chunk
of repetitive code if we have a large number of intrinsic that return multiple
values (see NVPTX as an example). While intrinsic class in tablegen supports
multiple outputs, tablegen only supports Intrinsics with zero or one output on
TreePattern. This appears to be a simple bug in tablegen that is fixed by this
change.
For Intrinsics defined as:
def int_xxx_load_addr_updated: Intrinsic<[llvm_i32_ty, llvm_ptr_ty], [llvm_ptr_ty, llvm_i32_ty], []>;
Instruction will be defined as:
def L32_X: Inst<(outs reg:$d1, reg:$d2), (ins reg:$s1, reg:$s2), "ld32_x $d1, $d2, $s2", [(set i32:$d1, i32:$d2, (int_xxx_load_addr_updated i32:$s1, i32:$s2))]>;
Patch by Wenbo Sun, thanks!
Differential Revision: https://reviews.llvm.org/D32888
llvm-svn: 321704
Allows preserving MachineMemOperands on intrinsics
through selection. For reasons I don't understand, this
is a static property of the pattern and the selector
deliberately goes out of its way to drop if not present.
Intrinsics already inherit from SDPatternOperator allowing
them to be used directly in instruction patterns. SDPatternOperator
has a list of SDNodeProperty, but you currently can't set them on
the intrinsic. Without SDNPMemOperand, when the node is selected
any memory operands are always dropped. Allowing setting this
on the intrinsics avoids needing to introduce another equivalent
target node just to have SDNPMemOperand set.
llvm-svn: 321212
NFC for currently supported targets. This resolves a problem encountered by
targets such as RISCV that reference `Subtarget` in ImmLeaf predicates.
llvm-svn: 321176
This patch resubmits the SVE ZIP1/ZIP2 patch series consisting of
of r320992, r320986, r320973, and r320970 by reverting
https://reviews.llvm.org/rL321024.
The issue that caused r321024 has been addressed in https://reviews.llvm.org/rL321158,
so this patch-series should be safe to resubmit.
llvm-svn: 321163
Between the creation of the last InstructionMatcher and the first
emission of the related Rule, we need to clear the internal map of IDs.
We used to do that right after the creation of the main
InstructionMatcher when building the rule and although that worked, this
is fragile because if for some reason some later code decides to create
more InstructionMatcher before the final call to emit, then the IDs
would be completely messed up.
Move that to the beginning of "emit" so that the IDs are guarantee to be
consistent.
NFC.
llvm-svn: 321053
Move InsnVarID and OpIdx at the beginning of the list of arguments
for all the constructors of the OperandMatcher subclasses.
This matches what we do for the InstructionMatcher.
NFC.
llvm-svn: 321031
In theory, reapplying optimizeRules on each group matchers should give
us a second nesting level on the matching table. In practice, we need
more work to make that happen because all the predicates are actually
not directly available through the predicate matchers list.
NFC.
llvm-svn: 321025
This reverts changes r320992, r320986, r320973, and r320970.
r320970 by itself breaks the test case, and the rest depend on it.
Test case will land soon.
llvm-svn: 321024
*** Context ***
Prior to this patchw, the table generated for matching instruction was
straight forward but highly inefficient.
Basically, each pattern generates its own set of self contained checks
and actions.
E.g., TableGen generated:
// First pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDrr
// Second pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDri
// Third pattern
CheckNumOperand 3
CheckOpcode G_SUB
...
Build SUBrr
*** Problem ***
Because of that generation, a *lot* of check were redundant between each
pattern and were checked every single time until we reach the pattern
that matches.
E.g., Taking the previous table, let say we are matching a G_SUB, that
means we were going to check all the rules for G_ADD before looking at
the G_SUB rule. In particular we are going to do:
check 3 operands; PASS
check G_ADD; FAIL
; Next rule
check 3 operands; PASS (but we already knew that!)
check G_ADD; FAIL (well it is still not true)
; Next rule
check 3 operands; PASS (really!!)
check G_SUB; PASS (at last :P)
*** Proposed Solution ***
This patch introduces a concept of group of rules (GroupMatcher) that
share some predicates and only get checked once for the whole group.
This patch only creates groups with one nesting level. Conceptually
there is nothing preventing us for having deeper nest level. However,
the current implementation is not smart enough to share the recording
(aka capturing) of values. That limits its ability to do more sharing.
For the given example the current patch will generate:
// First group
CheckOpcode G_ADD
// First pattern
CheckNumOperand 3
...
Build ADDrr
// Second pattern
CheckNumOperand 3
...
Build ADDri
// Second group
CheckOpcode G_SUB
// Third pattern
CheckNumOperand 3
...
Build SUBrr
But if we allowed several nesting level, it could create a sub group
for the checknumoperand 3.
(We would need to call optimizeRules on the rules within a group.)
*** Result ***
With only one level of nesting, the instruction selection pass is up
to 4x faster. For instance, one instruction now takes 500 checks,
instead of 24k! With more nesting we could get in the tens I believe.
Differential Revision: https://reviews.llvm.org/D39034
rdar://problem/34670699
llvm-svn: 321017
Summary: Patch [4/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. This patch further improves diagnostic messages for when the SVE feature is not specified.
Reviewers: rengolin, fhahn, olista01, echristo, efriedma
Reviewed By: fhahn
Subscribers: sdardis, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40363
llvm-svn: 320992
Summary:
When emitting a diagnostic for an invalid operand, a specific diagnostic
should only be reported when the instruction being matched is actually
enabled by the feature flags.
Patch [3/4] in a series to add parsing of predicates and properly parse SVE
ZIP1/ZIP2 instructions. This patch fixes bogus diagnostic messages for when
the SVE feature is not specified.
Reviewers: rengolin, craig.topper, olista01, sdardis, stoklund
Reviewed By: olista01, sdardis
Subscribers: fhahn, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D40362
llvm-svn: 320986
Prior to this patch, a predicate wouldn't make sense outside of its
rule. Indeed, it was only during emitting a rule that a predicate would
be made aware of the IDs of the data it is checking. Because of that,
predicates could not be moved around or compared between each other.
NFC.
llvm-svn: 320887
Summary:
The generated diagnostic by the AsmMatcher isn't always applicable to the AsmOperand.
This is because the code will only update the diagnostic if it is more
specific than the previous diagnostic. However, when having validated
operands and 'moved on' to a next operand (for some instruction/alias for
which all previous operands are valid), if the diagnostic is InvalidOperand,
than that should be set as the diagnostic, not the more specific message
about a previous operand for some other instruction/alias candidate.
(Re-committed with an extra whitespace in SVEInstrFormats.td to trigger rebuild
of AArch64GenAsmMatcher.inc, since the llvm-clang-x86_64-expensive-checks-win
builder does not seem to rebuild AArch64GenAsmMatcher.inc with the
newly built TableGen due to a missing dependency somewhere (see:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119555.html))
Reviewers: craig.topper, olista01, rengolin, stoklund
Reviewed By: olista01
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D40011
llvm-svn: 320711
Most of the targets don't need the scheduler class enum.
I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.
llvm-svn: 320552
A number of architectures re-use the same register names (e.g. for both 32-bit
FPRs and 64-bit FPRs). They are currently unable to use the tablegen'erated
MatchRegisterName and MatchRegisterAltName, as tablegen (when built with
asserts enabled) will fail.
When the AllowDuplicateRegisterNames in AsmParser is set, duplicated register
names will be tolerated. A backend can then coerce registers to the desired
register class by (for instance) implementing validateTargetOperandClass.
At least the in-tree Sparc backend could benefit from this, as does RISC-V
(single and double precision floating point registers).
Differential Revision: https://reviews.llvm.org/D39845
llvm-svn: 320018
This patch splits atomics out of the generic G_LOAD/G_STORE and into their own
G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a
necessary one. Atomic load/store has little in implementation in common with
non-atomic load/store. They tend to be handled very differently throughout the
backend. It also has the nice side-effect of slightly improving the common-case
performance at ISel since there's no longer a need for an atomicity check in the
matcher table.
All targets have been updated to remove the atomic load/store check from the
G_LOAD/G_STORE path. AArch64 has also been updated to mark
G_ATOMIC_LOAD/G_ATOMIC_STORE legal.
There is one issue with this patch though which also affects the extending loads
and truncating stores. The rules only match when an appropriate G_ANYEXT is
present in the MIR. For example,
(G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X))))
will match but:
(G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X))
will not. This shouldn't be a problem at the moment, but as we get better at
eliminating extends/truncates we'll likely start failing to match in some
cases. The current plan is to fix this in a patch that changes the
representation of extending-load/truncating-store to allow the MMO to describe
a different type to the operation.
llvm-svn: 319691
This is causing a failure in the llvm-clang-x86_64-expensive-checks-win
buildbot, and I can't reproduce it locally, so reverting until I can work out
what is wrong.
llvm-svn: 319654
This adds a "invalid operands for instruction" diagnostic for
instructions where there is an instruction encoding with the correct
mnemonic and which is available for this target, but where multiple
operands do not match those which were provided. This makes it clear
that there is some combination of operands that is valid for the current
target, which the default diagnostic of "invalid instruction" does not.
Since this is a very general error, we only emit it if we don't have a
more specific error.
Differential revision: https://reviews.llvm.org/D36747
llvm-svn: 319649
GIM_CheckNonAtomic has been replaced by GIM_CheckAtomicOrdering to allow it to support a wider
range of orderings. This has then been used to import patterns using nodes such
as atomic_cmp_swap, atomic_swap, and atomic_load_*.
llvm-svn: 319232
RecordKeeper::getDef() is a hot place, it shows up in profiling
and it creates std::string instance for each search in RecordMap
though RecordKeeper::RecordMap can use StringRef as a key
instead to avoid that. Patch do that change.
Differential revision: https://reviews.llvm.org/D40170
llvm-svn: 318822
When searching for a resource unit, use the reference location instead of
the definition location in case of an error.
Differential revision: https://reviews.llvm.org/D40263
llvm-svn: 318803
- We can still emit this error if the actual instruction has two or more
operands missing compared to the expected one.
- We should only emit this error once per instruction.
Differential revision: https://reviews.llvm.org/D36746
llvm-svn: 318770