Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.
This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.
I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)
Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
{F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
{F16422601}
* throughput is within reason
{F16422607}
I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}
I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
I'm not really sure if it's a modelling defect that needs to be fixed,
or it's a defect of measurments.
* Pipe distributions are probably bad :)
I can't do much here until AMD allows that to be fixed
by documenting the appropriate counters and updating libpfm
That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!
Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.
Differential Revision: https://reviews.llvm.org/D94395