Commit Graph

336 Commits

Author SHA1 Message Date
Simon Pilgrim 0b73b29388 [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

This also adds missing vcvttss2si tests 

llvm-svn: 328505
2018-03-26 15:30:47 +00:00
Simon Pilgrim 3aa9344605 [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.

llvm-svn: 328501
2018-03-26 14:44:24 +00:00
Andrea Di Biagio 5ffd2c3cfc [llvm-mca] Fix how views are added to the InstructionTables.
This should fix the stack-use-after-scope reported by the asan buildbots after
revision 328493.

llvm-svn: 328499
2018-03-26 14:25:52 +00:00
Simon Pilgrim 67df1cf597 [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps

llvm-svn: 328497
2018-03-26 14:03:40 +00:00
Andrea Di Biagio ff9c1092b7 [llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view.
llvm-svn: 328493
2018-03-26 13:44:54 +00:00
Simon Pilgrim caa203aed5 [X86][Btver2] Double the AGU and schedule pipe resources for YMM
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.

llvm-svn: 328491
2018-03-26 13:15:20 +00:00
Andrea Di Biagio d1569290ef [llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874)
The goal of this patch is to address most of PR36874.  To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.

The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".

We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.

At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view.  That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.

Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.

Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.

Differential Revision: https://reviews.llvm.org/D44839

llvm-svn: 328487
2018-03-26 12:04:53 +00:00
Simon Pilgrim 6c63e6c222 [X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit
llvm-svn: 328343
2018-03-23 17:59:22 +00:00
Simon Pilgrim e5c0a041ff [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern

llvm-svn: 328338
2018-03-23 17:38:59 +00:00
Simon Pilgrim 256f149bf0 [X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit
llvm-svn: 328331
2018-03-23 16:17:56 +00:00
Simon Pilgrim ee282b3160 [X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units
llvm-svn: 328328
2018-03-23 15:35:13 +00:00
Simon Pilgrim 1335b9c0ca [X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units
llvm-svn: 328324
2018-03-23 15:17:50 +00:00
Simon Pilgrim 5792e10ffb [X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions
This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops.

llvm-svn: 328320
2018-03-23 14:45:03 +00:00
Simon Pilgrim 8619962c73 [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
Fixes throughput to match Agner/Fam16h-SoG as well.

llvm-svn: 328318
2018-03-23 14:27:26 +00:00
Simon Pilgrim a1e3ea01ef [X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs
llvm-svn: 328304
2018-03-23 11:27:31 +00:00
Craig Topper 659c66dfc1 [X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. Change pblendvb/blendvps/blendvpd to use WriteFVarBlend
llvm-svn: 328294
2018-03-23 06:41:41 +00:00
Craig Topper 7580a7997d [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.
llvm-svn: 328293
2018-03-23 06:41:40 +00:00
Simon Pilgrim bcb86bb927 [X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are scheduled through the JFPU1 pipe
llvm-svn: 328226
2018-03-22 18:29:16 +00:00
Simon Pilgrim 0e031afa95 [X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipe
The ymm instructions are double pumped as well.

llvm-svn: 328222
2018-03-22 17:43:12 +00:00
Simon Pilgrim e5b51f6786 [X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional pipe
llvm-svn: 328217
2018-03-22 17:25:38 +00:00
Andrea Di Biagio 12ef5260ea [llvm-mca] Move the logic that computes the register file usage to the BackendStatistics view.
With this patch, the "instruction dispatched" event now provides information
related to the number of microarchitectural registers used in each register
file. Similarly, the "instruction retired" event is now able to tell how may
registers are freed in each register file.

Currently, the BackendStatistics view is the only consumer of register
usage/pressure information. BackendStatistics uses that info to print out a few
general statistics (i.e. max number of mappings used; total mapping created).
Before this patch, the BackendStatistics was forced to query the Backend to
obtain the register pressure information.

This helps removes that dependency. Now views are completely independent from
the Backend.  As a consequence, it should be easier to address PR36663 and
further modularize the pipeline.

Added a couple of test cases in the BtVer2 specific directory.

llvm-svn: 328129
2018-03-21 18:11:05 +00:00
Simon Pilgrim 203876f104 [X86][Btver2] Fix crc32 schedule costs
The default is currently FAdd for some reason

llvm-svn: 327807
2018-03-18 19:54:42 +00:00
Simon Pilgrim 13cd3b0961 [X86][Btver2] Add crc32 resource tests
llvm-svn: 327805
2018-03-18 18:55:34 +00:00
Simon Pilgrim c3db8c7cda [X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe
llvm-svn: 327804
2018-03-18 18:45:57 +00:00
Simon Pilgrim 036cc82622 [X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes
llvm-svn: 327803
2018-03-18 17:10:12 +00:00
Simon Pilgrim 87d2f7463f [X86][Btver2] F16C instructions are performed on the JSTC functional pipe
llvm-svn: 327801
2018-03-18 15:59:51 +00:00
Simon Pilgrim 40f6d6ad0b [X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes
llvm-svn: 327794
2018-03-18 13:05:09 +00:00
Simon Pilgrim e16790b133 [X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer.
llvm-svn: 327793
2018-03-18 12:37:35 +00:00
Simon Pilgrim e409f84e7e [X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.

This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.

llvm-svn: 327791
2018-03-18 12:09:17 +00:00
Simon Pilgrim 0ba4a0f3a6 [X86][Btver2] Add llvm-mca tests to show pipe resource usage of most vector instructions
Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes).

llvm-svn: 327788
2018-03-18 09:32:38 +00:00
Simon Pilgrim 9c4157bb70 [X86][Btver2] Tweak pipes test to remove register dependencies
It gives us a better view of pipe usage in the timeline which is what the test is trying to show.

llvm-svn: 327685
2018-03-15 23:15:11 +00:00
Simon Pilgrim 3894809997 [X86][Btver2] Fix ymm div/sqrt to use fmul unit
YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent.

This matches the pipes for WriteFSqrt/WriteFDiv definitions.

llvm-svn: 327682
2018-03-15 23:00:47 +00:00
Simon Pilgrim 49a56faee2 [X86][Btver2] Add test to show timeline of fpu instructions on different pipes/units
Try to demonstrate the scheduling from fpu0/fpu1 pipes to the valu0/vimul/fpa or valu1/stc/fpm functional units

llvm-svn: 327676
2018-03-15 22:34:24 +00:00
Andrea Di Biagio 7948738673 [llvm-mca] BackendStatistics: early exit from method printSchedulerUsage if the
no scheduler resources were consumed.

llvm-svn: 327215
2018-03-10 17:40:25 +00:00
Andrea Di Biagio 7bbac07f22 [llvm-mca] Emit the 'Instruction Info' table before the resource pressure view.
In future, both the summary information and the 'instruction info' table should
be moved into a separate "Summary" view.

llvm-svn: 327010
2018-03-08 15:34:38 +00:00
Andrea Di Biagio 3a6b092017 [llvm-mca] LLVM Machine Code Analyzer.
llvm-mca is an LLVM based performance analysis tool that can be used to
statically measure the performance of code, and to help triage potential
problems with target scheduling models.

llvm-mca uses information which is already available in LLVM (e.g. scheduling
models) to statically measure the performance of machine code in a specific cpu.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.

The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.

Given an assembly code sequence, llvm-mca estimates the IPC (instructions per
cycle), as well as hardware resources pressure. The analysis and reporting style
were mostly inspired by the IACA tool from Intel.

This patch is related to the RFC on llvm-dev visible at this link:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html

Differential Revision: https://reviews.llvm.org/D43951

llvm-svn: 326998
2018-03-08 13:05:02 +00:00