Commit Graph

627 Commits

Author SHA1 Message Date
Jay Foad beebe5a056 [MCA] Allow unlimited cycles in the timeline view
Change --max-timeline-cycles=0 to mean no limit on the number of cycles.
Use this in AMDGPU tests to show all instructions in the timeline view
instead of having it arbitrarily truncated.

Differential Revision: https://reviews.llvm.org/D104846
2021-06-24 12:54:57 +01:00
Andrea Di Biagio 70b37f4c03 [MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725).
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.

Fixes PR50725
2021-06-16 14:51:12 +01:00
Andrea Di Biagio beb5213a2e [MCA][InstrBuilder] Check for the presence of flag VariadicOpsAreDefs.
This patch fixes the logic that checks for variadic register definitions,

Before llvm-svn 348114 (commit 4cf35b4ab0), it was not possible to explicitly
mark variadic operands as definitions. By default, variadic operands of an
MCInst were always assumed to be uses. A number of had-hoc checks were
introduced in the InstrBuilder to fix the processing of variadic register
operands of ARM ldm/stm variants.

This patch simply replaces those old (and buggy) checks with a much simpler (and
correct) check for MCID::Flag::VariadicOpsAreDefs.
2021-06-15 09:52:38 +01:00
Simon Pilgrim 630820bafc [X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate.
Match what's reported in the costs table, Agner's tables and the Intel AOM
2021-06-09 13:51:40 +01:00
Andrea Di Biagio 5f500d73cd [MCA] Add a test for PR50483. NFC 2021-05-26 15:52:11 +01:00
Andrea Di Biagio 63cc9fd579 [MCA][InOrderIssueStage] Fix LastWriteBackCycle computation.
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
2021-05-26 14:17:43 +01:00
Simon Pilgrim 21aec4fdc5 [X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs
Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.

Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.
2021-05-26 11:14:21 +01:00
Simon Pilgrim 66978466ba [X86][Atom] Fix vector variable shift resource/throughputs
Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-26 10:30:59 +01:00
Simon Pilgrim 57250f2f3c [X86][Atom] Fix vector PSHUFB resource/throughputs
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-25 17:31:45 +01:00
Simon Pilgrim a26288e803 [X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs
Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port.

Discovered while investigating the correct fptoui costs to fix the regressions in D101555.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-20 18:56:58 +01:00
Andrea Di Biagio 9acabe8b6f [MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit e5d59db469.
This should unbreak buildbot clang-ppc64le-linux-lnt.
2021-05-19 19:12:33 +01:00
Patrick Holland e5d59db469 [MCA] llvm-mca MCTargetStreamer segfault fix
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.

These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).

This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.

In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:

.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.

My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)

My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).

As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.

Differential Revision: https://reviews.llvm.org/D102709
2021-05-19 18:36:10 +01:00
Simon Pilgrim b14f9a1ebd [X86][Atom] Fix vector integer shift by immediate resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-19 14:39:40 +01:00
Simon Pilgrim f9b1208681 [X86][Atom] Fix vector integer multiplication resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.

Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
2021-05-15 14:25:48 +01:00
Roman Lebedev 990e806b36
[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.
2021-05-14 20:23:05 +03:00
Roman Lebedev 1fc1c88704
[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:05 +03:00
Roman Lebedev 2f8572d8e2
[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev f8f7c765a0
[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev d2fb4bfba8
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests 2021-05-14 20:23:04 +03:00
Roman Lebedev 094b493a3a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests 2021-05-14 20:23:04 +03:00
Roman Lebedev 1c0ac0b0f2
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests 2021-05-14 20:23:03 +03:00
Roman Lebedev 26eeb6e650
[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev 41a5dcdf87
[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev 6733fe5c0d
[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:03 +03:00
Roman Lebedev 9e9c80c250
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests 2021-05-14 20:23:03 +03:00
Roman Lebedev b6a0449b34
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests 2021-05-14 20:23:02 +03:00
Roman Lebedev 128d9c6bbd
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests 2021-05-14 20:23:02 +03:00
Roman Lebedev 555e1d2987
[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev 012417c980
[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev 29c4f892fe
[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:02 +03:00
Roman Lebedev 0e20d1f0ef
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests 2021-05-14 20:23:01 +03:00
Roman Lebedev 14e48cf8ee
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests 2021-05-14 20:23:01 +03:00
Roman Lebedev 4673af527e
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests 2021-05-14 20:23:01 +03:00
Roman Lebedev 93f2642871
[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:01 +03:00
Roman Lebedev 7a45b96e04
[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:01 +03:00
Roman Lebedev 1ea8be214f
[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev bbd2117c34
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests 2021-05-14 20:23:00 +03:00
Roman Lebedev d08909d1cb
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests 2021-05-14 20:23:00 +03:00
Roman Lebedev a6f5351443
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests 2021-05-14 20:23:00 +03:00
Roman Lebedev ce22f53916
[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev 44c2b4fe91
[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev a72cacb53f
[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:22:59 +03:00
Roman Lebedev 9acc589e5a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests 2021-05-14 20:22:59 +03:00
Roman Lebedev a3617138c2
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests 2021-05-14 20:22:59 +03:00
Roman Lebedev 3f235a0b84
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests 2021-05-14 20:22:59 +03:00
Roman Lebedev 1d73c2b8cf
[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:22:59 +03:00
Roman Lebedev 31669b5073
[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:22:58 +03:00
Roman Lebedev 498bf365f4
[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:22:58 +03:00
Roman Lebedev 3009f8a383
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests 2021-05-14 20:22:58 +03:00
Roman Lebedev d58d020b6c
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests 2021-05-14 20:22:58 +03:00
Roman Lebedev 0f7a595095
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests 2021-05-14 20:22:58 +03:00
Roman Lebedev 4af4afe014
[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 14:06:24 +03:00
Roman Lebedev 17f99a8a41
[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 14:06:24 +03:00
Roman Lebedev 38ceb46fb0
[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 14:06:24 +03:00
Roman Lebedev 3221e06e9b
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests 2021-05-14 14:06:24 +03:00
Roman Lebedev 0b7e52e725
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests 2021-05-14 14:06:24 +03:00
Roman Lebedev 055fa84cd8
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests 2021-05-14 14:06:24 +03:00
Roman Lebedev d8a595b81c
[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 14:06:24 +03:00
Roman Lebedev fd4cbc822b
[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 14:06:23 +03:00
Roman Lebedev f38dcbecb6
[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom
Same as SSE XMM XORPS/XORPD, it is not zero-cycle, even though it breaks the deps.
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 14:06:23 +03:00
Roman Lebedev c79c7bb980
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests 2021-05-14 14:06:23 +03:00
Roman Lebedev a57006d627
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests 2021-05-14 14:06:23 +03:00
Roman Lebedev a657808948
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests 2021-05-14 14:06:23 +03:00
Roman Lebedev 43a7f130a7
[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 11:56:07 +03:00
Roman Lebedev 336b9dbe88
[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 11:56:07 +03:00
Roman Lebedev 9c596bc541
[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom
Same as with it's float friend, unlike their AVX versions.
As confirmed by exegesis, and ref docs.
2021-05-14 11:56:07 +03:00
Roman Lebedev 3567c7eda1
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests 2021-05-14 11:56:07 +03:00
Roman Lebedev 57eee56d0a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests 2021-05-14 11:56:06 +03:00
Roman Lebedev fdc65e46b6
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests 2021-05-14 11:56:06 +03:00
Roman Lebedev 59554c01ab
[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis, and ref docs.
2021-05-14 11:56:06 +03:00
Roman Lebedev 2a7c52ff7f
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests 2021-05-14 11:56:06 +03:00
Roman Lebedev 26c1bffe67
[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom
Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle,
this one is certainly a zero-cycle instruction, in addition to both of them
being dependency breaking.

As confirmed by exegesis measurements, and ref docs.
2021-05-14 11:56:06 +03:00
Roman Lebedev a9fb321a67
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests 2021-05-14 11:56:06 +03:00
Roman Lebedev aa0dcb3ba4
[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom
While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.
2021-05-14 00:03:36 +03:00
Roman Lebedev 6c4596793d
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test 2021-05-14 00:03:36 +03:00
Roman Lebedev 6a64c462eb
[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom
As measured by exegesis, and confirmed by ref docs.
Still not zero-cycle :)
2021-05-10 23:49:27 +03:00
Roman Lebedev 5864e7b86b
[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX YMM VPCMP 2021-05-10 23:49:27 +03:00
Roman Lebedev 2953245337
[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom
As measured by exegesis, and confirmed by ref docs.
Again, it's not zero-cycle.
2021-05-10 23:49:26 +03:00
Roman Lebedev f59db6c4f8
[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX XMM VPCMP 2021-05-10 23:49:26 +03:00
Roman Lebedev 0f3bcb97ef
[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom
As measured by exegesis, and confirmed by ref docs.
Much like with MMX PCMP, it does actually have to execute, though.
2021-05-10 23:49:26 +03:00
Roman Lebedev 0e538f937a
[NFC][X86][MCA] AMD Zen 3: add tests for same-reg XMM SSE PCMP 2021-05-10 23:49:26 +03:00
Roman Lebedev b24edfff4f
[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom
They are, however, not zero-cycle, and do actually execute.

As measured by exegesis, and confirmed by ref docs.
2021-05-10 23:49:26 +03:00
Roman Lebedev ba225ce961
[NFC][X86][MCA] AMD Zen 3: add tests for same-reg MMX PCMPEQ 2021-05-10 23:49:25 +03:00
Roman Lebedev 08cf2776ac
[X86] AMD Zen 3: sub-32-bit CMP also break dependencies
They measure as having the same effect as 32-bit CMP.
2021-05-10 20:57:38 +03:00
Roman Lebedev ecff974b66
[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking 2021-05-10 20:57:37 +03:00
Roman Lebedev be23d5e814
[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction
As measured by exegesis, and confirmed by ref docs.
2021-05-10 00:03:20 +03:00
Roman Lebedev 9a31efa2f5
[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking 2021-05-10 00:03:20 +03:00
Roman Lebedev 11b0568dce
[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction
As confirmed by exegesis measurements, and ref docs.
It does actually execute.

While there, bump latency for MULX32rr, that seems to match measurements.
2021-05-10 00:03:20 +03:00
Roman Lebedev 8d0e2d2b0f
[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking 2021-05-10 00:03:20 +03:00
Roman Lebedev eed8552787
[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms
As measured by exegesis and confirmed in reference docs.
2021-05-10 00:03:20 +03:00
Roman Lebedev ab794852ed
[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests 2021-05-10 00:03:20 +03:00
Roman Lebedev a21df76db6
[X86] AMD Zen 3: XCHG is a zero-cycle instruction
As measured by exegesis and confirmed by reference docs.
2021-05-09 20:37:57 +03:00
Roman Lebedev 2819009b5a
[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261)
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
2021-05-07 18:27:40 +03:00
Roman Lebedev a8e30e63ac
[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move 2021-05-07 18:27:40 +03:00
Roman Lebedev 34de155f7e
[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests
Drop it just enough so it still produces the right IPC.
2021-05-07 17:06:45 +03:00
Roman Lebedev 758c173309
[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:45 +03:00
Roman Lebedev 715c0d0bd4
[X86] AMD Zen 3: AVX YMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:45 +03:00
Roman Lebedev ee020b930d
[X86] AMD Zen 3: AVX XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:44 +03:00
Roman Lebedev 9db4203883
[X86] AMD Zen 3: SSE XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.

Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.

Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
2021-05-07 17:06:44 +03:00
Roman Lebedev 0d961fbd52
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves 2021-05-07 17:06:44 +03:00