Commit Graph

489 Commits

Author SHA1 Message Date
Simon Pilgrim ded8866f4a [X86][Atom] Fix vector fp<->int resource/throughputs
Match whats documented in the Intel AOM - almost all the conversion instructions requires BOTH ports (apart from the MMX cvtpi2ps/cvtpi2ps instructions which we already override) - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-07-07 16:52:34 +01:00
Serge Pavlov b36d214bed [X86] Add description of FXAM instruction
Previously this instruction could be used only in assembler. This change
makes it available for compiler also. Scheduling information was copied
from FTST instruction, hopefully this can be a satisfactory approximation.

Differential Revision: https://reviews.llvm.org/D104853
2021-06-25 12:26:51 +07:00
Andrea Di Biagio 70b37f4c03 [MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725).
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.

Fixes PR50725
2021-06-16 14:51:12 +01:00
Simon Pilgrim 630820bafc [X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate.
Match what's reported in the costs table, Agner's tables and the Intel AOM
2021-06-09 13:51:40 +01:00
Simon Pilgrim 21aec4fdc5 [X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs
Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.

Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.
2021-05-26 11:14:21 +01:00
Simon Pilgrim 66978466ba [X86][Atom] Fix vector variable shift resource/throughputs
Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-26 10:30:59 +01:00
Simon Pilgrim 57250f2f3c [X86][Atom] Fix vector PSHUFB resource/throughputs
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-25 17:31:45 +01:00
Simon Pilgrim a26288e803 [X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs
Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port.

Discovered while investigating the correct fptoui costs to fix the regressions in D101555.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-20 18:56:58 +01:00
Andrea Di Biagio 9acabe8b6f [MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit e5d59db469.
This should unbreak buildbot clang-ppc64le-linux-lnt.
2021-05-19 19:12:33 +01:00
Patrick Holland e5d59db469 [MCA] llvm-mca MCTargetStreamer segfault fix
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.

These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).

This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.

In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:

.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.

My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)

My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).

As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.

Differential Revision: https://reviews.llvm.org/D102709
2021-05-19 18:36:10 +01:00
Simon Pilgrim b14f9a1ebd [X86][Atom] Fix vector integer shift by immediate resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-19 14:39:40 +01:00
Simon Pilgrim f9b1208681 [X86][Atom] Fix vector integer multiplication resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.

Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
2021-05-15 14:25:48 +01:00
Roman Lebedev 990e806b36
[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.
2021-05-14 20:23:05 +03:00
Roman Lebedev 1fc1c88704
[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:05 +03:00
Roman Lebedev 2f8572d8e2
[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev f8f7c765a0
[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev d2fb4bfba8
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests 2021-05-14 20:23:04 +03:00
Roman Lebedev 094b493a3a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests 2021-05-14 20:23:04 +03:00
Roman Lebedev 1c0ac0b0f2
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests 2021-05-14 20:23:03 +03:00
Roman Lebedev 26eeb6e650
[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev 41a5dcdf87
[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev 6733fe5c0d
[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:03 +03:00
Roman Lebedev 9e9c80c250
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests 2021-05-14 20:23:03 +03:00
Roman Lebedev b6a0449b34
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests 2021-05-14 20:23:02 +03:00
Roman Lebedev 128d9c6bbd
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests 2021-05-14 20:23:02 +03:00
Roman Lebedev 555e1d2987
[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev 012417c980
[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev 29c4f892fe
[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:02 +03:00
Roman Lebedev 0e20d1f0ef
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests 2021-05-14 20:23:01 +03:00
Roman Lebedev 14e48cf8ee
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests 2021-05-14 20:23:01 +03:00
Roman Lebedev 4673af527e
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests 2021-05-14 20:23:01 +03:00
Roman Lebedev 93f2642871
[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:01 +03:00
Roman Lebedev 7a45b96e04
[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:01 +03:00
Roman Lebedev 1ea8be214f
[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev bbd2117c34
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests 2021-05-14 20:23:00 +03:00
Roman Lebedev d08909d1cb
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests 2021-05-14 20:23:00 +03:00
Roman Lebedev a6f5351443
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests 2021-05-14 20:23:00 +03:00
Roman Lebedev ce22f53916
[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev 44c2b4fe91
[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev a72cacb53f
[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:22:59 +03:00
Roman Lebedev 9acc589e5a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests 2021-05-14 20:22:59 +03:00
Roman Lebedev a3617138c2
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests 2021-05-14 20:22:59 +03:00
Roman Lebedev 3f235a0b84
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests 2021-05-14 20:22:59 +03:00
Roman Lebedev 1d73c2b8cf
[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:22:59 +03:00
Roman Lebedev 31669b5073
[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:22:58 +03:00
Roman Lebedev 498bf365f4
[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:22:58 +03:00
Roman Lebedev 3009f8a383
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests 2021-05-14 20:22:58 +03:00
Roman Lebedev d58d020b6c
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests 2021-05-14 20:22:58 +03:00
Roman Lebedev 0f7a595095
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests 2021-05-14 20:22:58 +03:00
Roman Lebedev 4af4afe014
[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom
As confirmed by exegesis measurements, and ref docs.
2021-05-14 14:06:24 +03:00