Fixes an issue with revision 5c6f748c and ad40cb88.
Adds an mcpu argument to the test command, preventing an invalid default
CPU from being used on some platforms.
Fixes an issue with revision 5c6f748c.
Move the test added in the above commit into the X86 folder, ensuring
that it is only run on targets where its triple is valid.
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983
The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.
Differential Revision: https://reviews.llvm.org/D90234
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)
This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).
Differential Revision: https://reviews.llvm.org/D107367
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).
Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.
Differential Revision: https://reviews.llvm.org/D103955
Noticed while trying to clean up the shift costs model for SSE4 targets using the script in D10369 - SLM double-pumps all the 128-bit vector conversion ops and only use FP0 pipe - numbers taken from Intel AOM + Agner.
Specifying the latencies of specific LDP variants appears to improve
performance almost universally.
Differential Revision: https://reviews.llvm.org/D105882
This sets the latency of stores to 1 in the Cortex-A55 scheduling model,
to better match the values given in the software optimization guide.
The latency of a store in normal llvm scheduling does not appear to have
a lot of uses. If the store has no outputs then the latency is somewhat
meaningless (and pre/post increment update operands use the WriteAdr
write for those operands instead). The one place it does alter things is
the latency between a store and the end of the scheduling region, which
can in turn have an effect on the critical path length. As a result a
latency of 1 is more correct and offers ever-so-slightly better
scheduling of instructions near the end of the block.
They are marked as RetireOOO to keep the llvm-mca from introducing
stalls where non would exist.
Differential Revision: https://reviews.llvm.org/D105541
This patch addresses the last remaining problems reported in PR51008.
Previous fixes for PR51008 worked under the wrong assumption that code regions
are always named (except maybe for the default region, which was automatically
named "main").
In reality, it is quite common for users to declare multiple anonymous regions.
So we cannot really use the region name as the key string of a JSON object. In
practice, code region names are completely optional.
Using "main" for the default region was also problematic because there can be
another region with that same name.
This patch fixes these issues by introducing a json::array of regions. Each
region has a "Name" field, which would default to the empty string for anonymous
regions.
Added a few more tests to verify that the JSON file format is still valid, and
that multiple anonymous regions all appear in the final output.
This patch renames object "Resources" to "TargetInfo".
Moved the getJSONTargetInfo method from class InstructionView to the
PipelinePrinter.
Removed uses of std::stringstream.
Removed unused method View::printViewJSON().
Instead of printing each region individually when using JSON format,
this patch creates a JSON object which is updated with the values of
each region, printing them at the end. New test is added for JSON output
with multiple regions.
Bug: https://bugs.llvm.org/show_bug.cgi?id=51008
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D105618
This commit also makes some slight changes to the scheduling model for AMDGPU to set the RetireOOO flag for all scheduling classes.
This flag is only used by llvm-mca and allows instructions to retire out of order.
See the differential link below for a deeper explanation of everything.
Differential Revision: https://reviews.llvm.org/D104730
Match whats documented in the Intel AOM - almost all the conversion instructions requires BOTH ports (apart from the MMX cvtpi2ps/cvtpi2ps instructions which we already override) - this was being incorrectly modelled as EITHER port.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Based on the discussion in PR50922, minor changes have been done to properly
output a valid JSON. Removed "not implemented" keys.
Differential Revision: https://reviews.llvm.org/D105064
Previously this instruction could be used only in assembler. This change
makes it available for compiler also. Scheduling information was copied
from FTST instruction, hopefully this can be a satisfactory approximation.
Differential Revision: https://reviews.llvm.org/D104853
Change --max-timeline-cycles=0 to mean no limit on the number of cycles.
Use this in AMDGPU tests to show all instructions in the timeline view
instead of having it arbitrarily truncated.
Differential Revision: https://reviews.llvm.org/D104846
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.
Fixes PR50725
This patch fixes the logic that checks for variadic register definitions,
Before llvm-svn 348114 (commit 4cf35b4ab0), it was not possible to explicitly
mark variadic operands as definitions. By default, variadic operands of an
MCInst were always assumed to be uses. A number of had-hoc checks were
introduced in the InstrBuilder to fix the processing of variadic register
operands of ARM ldm/stm variants.
This patch simply replaces those old (and buggy) checks with a much simpler (and
correct) check for MCID::Flag::VariadicOpsAreDefs.
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.
Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.
Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port.
Discovered while investigating the correct fptoui costs to fix the regressions in D101555.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.
These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).
This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.
In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:
.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.
My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)
My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).
As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.
Differential Revision: https://reviews.llvm.org/D102709
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.
Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.