Commit Graph

597 Commits

Author SHA1 Message Date
Andrea Di Biagio 2d53e54f0e [X86][NFC] Pre-commit tests for PR51494 2021-08-18 19:55:21 +01:00
Tozer 6d5e31baaa Fix 2: [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes an issue with revision 5c6f748c and ad40cb88.

Adds an mcpu argument to the test command, preventing an invalid default
CPU from being used on some platforms.
2021-08-17 17:13:21 +01:00
Tozer ad40cb8821 Fix: [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes an issue with revision 5c6f748c.

Move the test added in the above commit into the X86 folder, ensuring
that it is only run on targets where its triple is valid.
2021-08-17 16:16:19 +01:00
Tozer 5c6f748cbc [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.

Differential Revision: https://reviews.llvm.org/D90234
2021-08-17 15:52:51 +01:00
Andrea Di Biagio 7a1a35a1d1 [X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322).
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)

This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).

Differential Revision: https://reviews.llvm.org/D107367
2021-08-04 17:50:22 +01:00
Andrea Di Biagio f0658c7a42 [MCA][NFC] Add tests for PR51318 and PR51322.
Also, regenerate existing X86 tests using update_mca_test.py.
2021-08-03 17:06:34 +01:00
Andrew Savonichev bcc83a2e83 [MCA] Use LSU for the in-order pipeline
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).

Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.

Differential Revision: https://reviews.llvm.org/D103955
2021-07-29 14:40:23 +03:00
Simon Pilgrim d073b19dbf [X86] Fix SLM FP<->INT throughputs.
Noticed while trying to clean up the shift costs model for SSE4 targets using the script in D10369 - SLM double-pumps all the 128-bit vector conversion ops and only use FP0 pipe - numbers taken from Intel AOM + Agner.
2021-07-22 19:39:04 +01:00
Nicholas Guy 9769535efd [AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling
Specifying the latencies of specific LDP variants appears to improve
performance almost universally.

Differential Revision: https://reviews.llvm.org/D105882
2021-07-16 12:00:57 +01:00
Marcos Horro 77f2f0f9b7 [llvm-mca][JSON] Store extra information about driver flags used for the simulation
Added information stored in PipelineOptions and the MCSubtargetInfo.

Bug: https://bugs.llvm.org/show_bug.cgi?id=51041

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D106077
2021-07-16 09:18:40 +02:00
David Green f73334c46d [AArch64] Set the latency of Cortex-A55 stores to 1
This sets the latency of stores to 1 in the Cortex-A55 scheduling model,
to better match the values given in the software optimization guide.

The latency of a store in normal llvm scheduling does not appear to have
a lot of uses. If the store has no outputs then the latency is somewhat
meaningless (and pre/post increment update operands use the WriteAdr
write for those operands instead). The one place it does alter things is
the latency between a store and the end of the scheduling region, which
can in turn have an effect on the critical path length. As a result a
latency of 1 is more correct and offers ever-so-slightly better
scheduling of instructions near the end of the block.

They are marked as RetireOOO to keep the llvm-mca from introducing
stalls where non would exist.

Differential Revision: https://reviews.llvm.org/D105541
2021-07-12 13:39:35 +01:00
Andrea Di Biagio 4fe0fcd1c0 [llvm-mca][JSON] Teach the PipelinePrinter how to deal with anonymous code regions (PR51008)
This patch addresses the last remaining problems reported in PR51008.

Previous fixes for PR51008 worked under the wrong assumption that code regions
are always named (except maybe for the default region, which was automatically
named "main").

In reality, it is quite common for users to declare multiple anonymous regions.
So we cannot really use the region name as the key string of a JSON object.  In
practice, code region names are completely optional.

Using "main" for the default region was also problematic because there can be
another region with that same name.

This patch fixes these issues by introducing a json::array of regions.  Each
region has a "Name" field, which would default to the empty string for anonymous
regions.

Added a few more tests to verify that the JSON file format is still valid, and
that multiple anonymous regions all appear in the final output.
2021-07-10 13:57:52 +01:00
Andrea Di Biagio d919bca875 [llvm-mca][JSON] Further refactoring of the JSON printing logic.
This patch renames object "Resources" to "TargetInfo".

Moved the getJSONTargetInfo method from class InstructionView to the
PipelinePrinter.

Removed uses of std::stringstream.
Removed unused method View::printViewJSON().
2021-07-10 12:38:19 +01:00
Andrea Di Biagio 10cb036223 [llvm-mca] Refactor the logic that prints JSON files.
Moved most of the printing logic into the PipelinePrinter.

This patch also fixes the JSON output when flag -instruction-tables is
specified.
2021-07-09 22:56:39 +01:00
Marcos Horro b11d31eb73 [llvm-mca] Fix JSON format for multiple regions
Instead of printing each region individually when using JSON format,
this patch creates a JSON object which is updated with the values of
each region, printing them at the end. New test is added for JSON output
with multiple regions.

Bug: https://bugs.llvm.org/show_bug.cgi?id=51008

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D105618
2021-07-09 18:04:16 +02:00
Patrick Holland d38b9f1f31 Revert "[MCA] [AMDGPU] Adding an implementation to AMDGPUCustomBehaviour for handling s_waitcnt instructions."
Build failures when building with shared libraries. Reverting until I can fix.

Differential Revision: https://reviews.llvm.org/D104730
2021-07-07 20:48:42 -07:00
Patrick Holland af3baf1761 [MCA] [AMDGPU] Adding an implementation to AMDGPUCustomBehaviour for handling s_waitcnt instructions.
This commit also makes some slight changes to the scheduling model for AMDGPU to set the RetireOOO flag for all scheduling classes.

This flag is only used by llvm-mca and allows instructions to retire out of order.

See the differential link below for a deeper explanation of everything.

Differential Revision: https://reviews.llvm.org/D104730
2021-07-07 14:17:54 -07:00
Simon Pilgrim ded8866f4a [X86][Atom] Fix vector fp<->int resource/throughputs
Match whats documented in the Intel AOM - almost all the conversion instructions requires BOTH ports (apart from the MMX cvtpi2ps/cvtpi2ps instructions which we already override) - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-07-07 16:52:34 +01:00
Marcos Horro aa13e4fe7e [llvm-mca] Fix JSON output (PR50922)
Based on the discussion in PR50922, minor changes have been done to properly
output a valid JSON.  Removed "not implemented" keys.

Differential Revision: https://reviews.llvm.org/D105064
2021-07-01 12:53:20 +01:00
Serge Pavlov b36d214bed [X86] Add description of FXAM instruction
Previously this instruction could be used only in assembler. This change
makes it available for compiler also. Scheduling information was copied
from FTST instruction, hopefully this can be a satisfactory approximation.

Differential Revision: https://reviews.llvm.org/D104853
2021-06-25 12:26:51 +07:00
Jay Foad beebe5a056 [MCA] Allow unlimited cycles in the timeline view
Change --max-timeline-cycles=0 to mean no limit on the number of cycles.
Use this in AMDGPU tests to show all instructions in the timeline view
instead of having it arbitrarily truncated.

Differential Revision: https://reviews.llvm.org/D104846
2021-06-24 12:54:57 +01:00
Andrea Di Biagio 70b37f4c03 [MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725).
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.

Fixes PR50725
2021-06-16 14:51:12 +01:00
Andrea Di Biagio beb5213a2e [MCA][InstrBuilder] Check for the presence of flag VariadicOpsAreDefs.
This patch fixes the logic that checks for variadic register definitions,

Before llvm-svn 348114 (commit 4cf35b4ab0), it was not possible to explicitly
mark variadic operands as definitions. By default, variadic operands of an
MCInst were always assumed to be uses. A number of had-hoc checks were
introduced in the InstrBuilder to fix the processing of variadic register
operands of ARM ldm/stm variants.

This patch simply replaces those old (and buggy) checks with a much simpler (and
correct) check for MCID::Flag::VariadicOpsAreDefs.
2021-06-15 09:52:38 +01:00
Simon Pilgrim 630820bafc [X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate.
Match what's reported in the costs table, Agner's tables and the Intel AOM
2021-06-09 13:51:40 +01:00
Andrea Di Biagio 5f500d73cd [MCA] Add a test for PR50483. NFC 2021-05-26 15:52:11 +01:00
Andrea Di Biagio 63cc9fd579 [MCA][InOrderIssueStage] Fix LastWriteBackCycle computation.
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
2021-05-26 14:17:43 +01:00
Simon Pilgrim 21aec4fdc5 [X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs
Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.

Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.
2021-05-26 11:14:21 +01:00
Simon Pilgrim 66978466ba [X86][Atom] Fix vector variable shift resource/throughputs
Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-26 10:30:59 +01:00
Simon Pilgrim 57250f2f3c [X86][Atom] Fix vector PSHUFB resource/throughputs
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-25 17:31:45 +01:00
Simon Pilgrim a26288e803 [X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs
Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port.

Discovered while investigating the correct fptoui costs to fix the regressions in D101555.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-20 18:56:58 +01:00
Andrea Di Biagio 9acabe8b6f [MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit e5d59db469.
This should unbreak buildbot clang-ppc64le-linux-lnt.
2021-05-19 19:12:33 +01:00
Patrick Holland e5d59db469 [MCA] llvm-mca MCTargetStreamer segfault fix
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.

These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).

This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.

In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:

.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.

My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)

My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).

As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.

Differential Revision: https://reviews.llvm.org/D102709
2021-05-19 18:36:10 +01:00
Simon Pilgrim b14f9a1ebd [X86][Atom] Fix vector integer shift by immediate resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-19 14:39:40 +01:00
Simon Pilgrim f9b1208681 [X86][Atom] Fix vector integer multiplication resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.

Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
2021-05-15 14:25:48 +01:00
Roman Lebedev 990e806b36
[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.
2021-05-14 20:23:05 +03:00
Roman Lebedev 1fc1c88704
[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:05 +03:00
Roman Lebedev 2f8572d8e2
[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev f8f7c765a0
[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev d2fb4bfba8
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests 2021-05-14 20:23:04 +03:00
Roman Lebedev 094b493a3a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests 2021-05-14 20:23:04 +03:00
Roman Lebedev 1c0ac0b0f2
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests 2021-05-14 20:23:03 +03:00
Roman Lebedev 26eeb6e650
[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev 41a5dcdf87
[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev 6733fe5c0d
[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:03 +03:00
Roman Lebedev 9e9c80c250
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests 2021-05-14 20:23:03 +03:00
Roman Lebedev b6a0449b34
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests 2021-05-14 20:23:02 +03:00
Roman Lebedev 128d9c6bbd
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests 2021-05-14 20:23:02 +03:00
Roman Lebedev 555e1d2987
[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev 012417c980
[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev 29c4f892fe
[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:02 +03:00