llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrea Di Biagio	a88281d8ae	[llvm-mca] Use an ordered map to collect hardware statistics. NFC. Histogram entries are now ordered by key. This should improves their readability when statistics are printed. llvm-svn: 334961	2018-06-18 17:04:56 +00:00
Andrea Di Biagio	487da729a2	[llvm-mca] Add tests for XOP and AVX512 instructions that implicitly clear the upper portion of a super-register. When the destination register of a XOP instruction is an XMM register, bits [255:128] of the corresponding YMM register are cleared. When the destination register of a EVEX encoded instruction is an XMM/YMM register, the upper bits of the corresponding ZMM are cleared. On processors that feature AVX512, a write to an XMM registers always clears the upper portion of the corresponding ZMM register if the instruction is VEX or EVEX encoded. These new tests show some interesting cases which aren't correctly analyzed by llvm-mca. The lack of knowledge related to the implicit update on the super-registers is addressed by D48225. llvm-svn: 334945	2018-06-18 14:00:30 +00:00
Clement Courbet	0d9da88d18	[X86] Fix NOOP sched overrides on BDW/HSW/SKL. Summary: Noop certainly does not use resources. Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits, gchatelet Differential Revision: https://reviews.llvm.org/D48028 llvm-svn: 334927	2018-06-18 06:48:22 +00:00
Simon Pilgrim	e930f569f7	[llvm-mca][X86] Add some avx512f/avx512vl resource test placeholders There are a lot of instructions to add under these ISAs (and the other AVX512 variants) but this should demonstrate how to test for the EVEX instructions with different maskings llvm-svn: 334907	2018-06-17 16:25:48 +00:00
Simon Pilgrim	f5ecd8d50d	[llvm-mca][x86] Add Generic cpu resource tests Added a Generic x86 cpu set of resource tests to allow us to check all ISAs. We currently use SandyBridge as our generic CPU model, but it's better if we actually duplicate these tests for if/when we change the model, it also means we don't end up polluting the SandyBridge folder with tests for ISAs it doesn't support. llvm-svn: 334853	2018-06-15 18:35:25 +00:00
Roman Lebedev	9ddf128f79	[MCA] Add -summary-view option Summary: While that is indeed a quite interesting summary stat, there are cases where it does not really add anything other than consuming extra lines. Declutters the output of D48190. Reviewers: RKSimon, andreadb, courbet, craig.topper Reviewed By: andreadb Subscribers: javed.absar, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48209 llvm-svn: 334833	2018-06-15 14:01:43 +00:00
Roman Lebedev	7c423001e4	[MCA][x86][NFC] Add tests for -register-file-stats, -scheduler-stats Summary: There does not seem to be any other tests for this. Split off from D47676. Reviewers: RKSimon, craig.topper, courbet, andreadb Reviewed By: andreadb Subscribers: javed.absar, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48190 llvm-svn: 334832	2018-06-15 14:01:35 +00:00
Andrea Di Biagio	4cafb297d5	[llvm-mca] Add tests for instructions that implicitly clear the upper portion of a super-register. On x86-64, a write to register EAX implicitly clears the upper half or RAX. 128-bit AVX instructions clear the upper 128-bit of the YMM register that aliases the XMM definition register. llvm-mca doesn't know about register writes that implicitly clear the upper portion of an aliasing super-register. This issue will be fixed in a future patch. llvm-svn: 334742	2018-06-14 17:48:42 +00:00
Andrea Di Biagio	4729d1ff27	[llvm-mca] Add another test for partial register stalls. This test checks that a physical register is correctly allocated for the partial write to register BX. The ADD instruction has to wait for the write to RBX (and BX) before being executed. llvm-svn: 334730	2018-06-14 15:54:34 +00:00
Andrea Di Biagio	0ffb2271a1	[llvm-mca] Fixed a bug in the logic that checks if a memory operation is ready to execute. Fixes PR37790. In some (very rare) cases, the LSUnit (Load/Store unit) was wrongly marking a load (or store) as "ready to execute" effectively bypassing older memory barrier instructions. To reproduce this bug, the memory barrier must be the first instruction in the input assembly sequence, and it doesn't have to perform any register writes. llvm-svn: 334633	2018-06-13 18:30:14 +00:00
Clement Courbet	7db69cc08a	[X86] Fix skylake server scheduling info. Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407	2018-06-11 14:37:53 +00:00
Simon Pilgrim	89deac6694	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303	2018-06-08 17:00:45 +00:00
Simon Pilgrim	efb4806bb9	[X86][BtVer2] Remove SBB tests that were accidentally added in rL334296 These aren't true zero-idiom instructions (just dependency breaking). llvm-svn: 334297	2018-06-08 15:43:00 +00:00
Simon Pilgrim	53766a986d	[X86][BtVer2] Add tests for scalar SUB/XOR instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions). llvm-svn: 334296	2018-06-08 15:28:43 +00:00
Simon Pilgrim	aafcf9e4a1	[X86][BtVer2] Limit zero idiom tests to a single iteration. Reduces output size and we're only wanting to check that the instructions are fast-path'd (just Dispatch+Retire) anyhow llvm-svn: 334292	2018-06-08 15:01:40 +00:00
Simon Pilgrim	aef5bdbea1	[X86][BtVer2] Add support for all vector instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. llvm-svn: 334119	2018-06-06 19:06:09 +00:00
Simon Pilgrim	7a48bb6e44	[llvm-mca][x86] Fix all resources-x86_64.s tests to use different registers in reg-reg cases I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2. Fixed in all targets to keep them in sync. llvm-svn: 334110	2018-06-06 18:20:25 +00:00
Simon Pilgrim	64541ff297	[X86][BtVer2] Add tests for all vector instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. TODO: Scalar instructions still need to be tested (need to check EFLAGS handling). llvm-svn: 334104	2018-06-06 16:14:37 +00:00
Sanjay Patel	59313be8d3	[CodeGen] assume max/default throughput for unspecified instructions This is a fix for the problem arising in D47374 (PR37678): https://bugs.llvm.org/show_bug.cgi?id=37678 We may not have throughput info because it's not specified in the model or it's not available with variant scheduling, so assume that those instructions can execute/complete at max-issue-width. Differential Revision: https://reviews.llvm.org/D47723 llvm-svn: 334055	2018-06-05 23:34:45 +00:00
Andrea Di Biagio	757600bccb	[llvm-mca] Correctly update the CyclesLeft of a register read in the presence of partial register updates. This patch fixe the logic in ReadState::cycleEvent(). That method was not correctly updating field `TotalCycles`. Added extra code comments in class ReadState to better describe each field. llvm-svn: 334028	2018-06-05 17:12:02 +00:00
Andrea Di Biagio	39e5a5695f	[RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca. This patch is the last of a sequence of three patches related to LLVM-dev RFC "MC support for variant scheduling classes". http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html This fixes PR36672. The main goal of this patch is to teach llvm-mca how to solve variant scheduling classes. This patch does that, plus it adds new variant scheduling classes to the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called dependency breaking instructions that are known to generate zero, and that are optimized out in hardware at register renaming stage). Without the BtVer2 change, this patch would not have had any meaningful tests. This patch is effectively the union of two changes: 1) a change that teaches llvm-mca how to resolve variant scheduling classes. 2) a change to the BtVer2 scheduling model that allows us to special-case packed XOR zero-idioms (this partially fixes PR36671). Differential Revision: https://reviews.llvm.org/D47374 llvm-svn: 333909	2018-06-04 15:43:09 +00:00
Greg Bedwell	bbe64af0a0	[llvm-mca] Regenerate a test to remove a double newline Command used: py update_mca_test_checks.py ..\test\tools\llvm-mca\\.s ..\test\tools\llvm-mca\\\*.s llvm-svn: 333893	2018-06-04 12:30:03 +00:00
Roman Lebedev	7b53d1454f	[llvm-mca] Make sure not to end the test files with an empty line. Summary: It's super irritating. [properly configured] git client then complains about that double-newline, and you have to use `--force` to ignore the warning, since even if you fix it manually, it will be reintroduced the very next runtime :/ Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell Reviewed By: gbedwell Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47697 llvm-svn: 333887	2018-06-04 11:48:46 +00:00
Clement Courbet	2e41c5a79c	[X86] Introduce WriteFLDC for x87 constant loads. Summary: {FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded. - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form InstRWs. - For SLM and BtVer2, I've guessed some values :( Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47585 llvm-svn: 333656	2018-05-31 14:22:01 +00:00
Clement Courbet	b78ab5097d	[X86] Extract latency of fldz/fld1 in separate classes. Summary: - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form `InstRW`s. - For SLM and BtVer2, values are from Agner. This is split off from https://reviews.llvm.org/D47377 Reviewers: RKSimon, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47523 llvm-svn: 333642	2018-05-31 11:41:27 +00:00
Clement Courbet	07c9ec6f2e	[X86][Sched] Add InstRW for CLC on Intel after SNB. Summary: After SNB, Intel CPUs can rename CF independently of other EFLAGS, so the renamer can zero it for free. Note that STC still consumes resources. To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC` On SNB: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: sandybridge llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.0014, debug_string: SBPort0 } - { key: '4', value: 0.0013, debug_string: SBPort1 } - { key: '5', value: 0.0003, debug_string: SBPort4 } - { key: '6', value: 0.0029, debug_string: SBPort5 } - { key: '10', value: 0.0003, debug_string: SBPort23 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` On HSW: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.001, debug_string: HWPort0 } - { key: '4', value: 0.0009, debug_string: HWPort1 } - { key: '5', value: 0.0004, debug_string: HWPort2 } - { key: '6', value: 0.0006, debug_string: HWPort3 } - { key: '7', value: 0.0002, debug_string: HWPort4 } - { key: '8', value: 0.0012, debug_string: HWPort5 } - { key: '9', value: 0.0022, debug_string: HWPort6 } - { key: '10', value: 0.0001, debug_string: HWPort7 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` Reviewers: craig.topper, RKSimon Subscribers: gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D47362 llvm-svn: 333392	2018-05-29 06:19:39 +00:00
Simon Pilgrim	0155bf0da9	[X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286) As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models...... llvm-svn: 333271	2018-05-25 12:18:11 +00:00
Greg Bedwell	e790f6fb06	[UpdateTestChecks] Improved update_mca_test_checks block analysis Previously update_mca_test_checks worked entirely at "block" level where a block is some sequence of lines delimited by at least one empty line. This generally worked well, but could sometimes lead to excessive repetition of check lines for various prefixes if some block was almost identical between prefixes, but not quite (for example, due to a different dispatch width in the otherwise identical summary views). This new analyis attempts to split blocks further in the case where the following conditions are met: a) There is some prefix common to every RUN line (typically 'ALL'). b) The first line of the block is common to the output with every prefix. c) The block has the same number of lines for the output with every prefix. Also, regenerated all llvm-mca test files with the following command: update_mca_test_checks.py "../test/tools/llvm-mca//.s" "../test/tools/llvm-mca///*.s" The new analysis showed a "multiple lines not disambiguated by prefixes" warning for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some explicit prefixes to each of the RUN lines in that test. Differential Revision: https://reviews.llvm.org/D47321 llvm-svn: 333204	2018-05-24 16:36:44 +00:00
Andrea Di Biagio	3fc20c9c7f	[llvm-mca] Print the "Block RThroughput" in the SummaryView. This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095	2018-05-23 15:59:27 +00:00
Andrea Di Biagio	cb1ed400a4	[llvm-mca] Removed an empty line generated by the timeline view. NFC. Also, regenerate all tests. llvm-svn: 332853	2018-05-21 17:11:56 +00:00
Andrea Di Biagio	b5757abefb	[X86][BtVer2] Add a 'J' prefix to the PRF/RCU defs. NFC This is to keep the Jaguar model's naming convention. Processor resources all have a 'J' prefix in the BtVer2 scheduling model. llvm-svn: 332851	2018-05-21 16:30:26 +00:00
Simon Pilgrim	1273f4ad93	[X86] Add GPR<->XMM Schedule Tags BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745	2018-05-18 17:58:36 +00:00
Simon Pilgrim	007b50fd35	[X86][BtVer2] Improve simulation of (V)PINSR values Include the 6cy delay transferring from the GPR to FPU. llvm-svn: 332737	2018-05-18 17:09:41 +00:00
Simon Pilgrim	3ecb0b80f6	[X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latency llvm-svn: 332722	2018-05-18 14:22:22 +00:00
Simon Pilgrim	c4b8d367a8	[X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332718	2018-05-18 14:08:01 +00:00
Simon Pilgrim	d749b321b2	[X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332714	2018-05-18 13:13:59 +00:00
Simon Pilgrim	e389ea0e3e	[llvm-mca][X86] Add CMOV test files llvm-svn: 332622	2018-05-17 16:29:12 +00:00
Simon Pilgrim	b5741f5c3d	[X86][BtVer2] ADC/SBB take 2cy on an ALU pipe, not 1cy like ADD/SUB llvm-svn: 332616	2018-05-17 15:43:23 +00:00
Andrea Di Biagio	650b5fc6cb	[llvm-mca] add flag -all-views and flag -all-stats. Flag -all-views enables all the views. Flag -all-stats enables all the views that print hardware statistics. llvm-svn: 332602	2018-05-17 12:27:03 +00:00
Simon Pilgrim	b4fd145fc3	[llvm-mca][X86] Add ADX test files llvm-svn: 332595	2018-05-17 11:32:38 +00:00
Simon Pilgrim	d5d77dcb46	[X86] Fix typo in instregex for CVTSI642SDrr llvm-svn: 332510	2018-05-16 18:31:17 +00:00
Andrea Di Biagio	45ccdd1785	[llvm-mca] Regenerate tests after r332381 and r332361. NFC llvm-svn: 332447	2018-05-16 10:12:06 +00:00
Simon Pilgrim	be9a206883	[X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376	2018-05-15 17:36:49 +00:00
Simon Pilgrim	891ebcdbaa	[X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes Btver2 - VCVTPH2PSYrm needs to double pump the AGU Broadwell - missing VCVTPS2PH*mr stores extra latency Allows us to remove the WriteCvtF2FSt conversion store class llvm-svn: 332357	2018-05-15 14:12:32 +00:00
Simon Pilgrim	2aa395abcf	[llvm-mca][x86] Add F16C instruction tests llvm-svn: 332347	2018-05-15 12:50:06 +00:00
Simon Pilgrim	5bd5e2fd3e	[llvm-mca][X86] Add missing SSE4A test file llvm-svn: 332270	2018-05-14 18:20:40 +00:00
Simon Pilgrim	228d24a2d6	[X86][BtVer2] Fix MMX/YMM integer vector nt store schedules MMX was missing and YMM was tagged as a fp nt store llvm-svn: 332269	2018-05-14 18:07:28 +00:00
Simon Pilgrim	4135de2e93	[llvm-mca][x86] Add scalar nt-store instruction tests llvm-svn: 332262	2018-05-14 17:10:33 +00:00
Simon Pilgrim	7340d88740	[llvm-mca][x86] Add and/not/or/xor instruction tests llvm-svn: 332257	2018-05-14 16:26:24 +00:00
Simon Pilgrim	661ae7778d	[X86][BtVer2] Model ymm move as double pumped instructions We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions llvm-svn: 332109	2018-05-11 17:38:36 +00:00
Simon Pilgrim	706403bab8	[X86][MMX] Tag MMX Move/Load/Store as WriteVec schedule classes Fixes an issue on SLM/Btver2 where we had instructions were being treated as scalar loads/stores llvm-svn: 332104	2018-05-11 16:38:59 +00:00
Simon Pilgrim	032a01f74a	[X86][SLM] Vector stores only use the MEC port. Confirmed by both Agner and Intel's AOM - the IEC/FPC are not required for pure load/stores (even if its a partial update). Can't fix WriteStore until all RMW instructions are cleaned up though.... llvm-svn: 332096	2018-05-11 15:16:15 +00:00
Simon Pilgrim	22dd72b995	[X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector width Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions llvm-svn: 332094	2018-05-11 14:30:54 +00:00
Simon Pilgrim	37fbb7f173	[X86][SNB] Fix typo in PEXTRDmr instregex, was missing VPEXTRDmr. llvm-svn: 332002	2018-05-10 17:30:49 +00:00
Simon Pilgrim	ab34aa8294	[X86] Cleanup WriteFStore/WriteVecStore schedules MOVNTPD/MOVNTPS should be WriteFStore Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction.... llvm-svn: 331864	2018-05-09 11:01:16 +00:00
Simon Pilgrim	2864b46469	[X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931) This fixes a couple of BtVer2 missing instructions that weren't been handled in the override. NOTE: There are still a lot of overrides that still need cleaning up! llvm-svn: 331770	2018-05-08 14:55:16 +00:00
Simon Pilgrim	739d1a68aa	[llvm][x86] SandyBridge/IvyBridge don't support BMI1/BMI2 llvm-svn: 331769	2018-05-08 14:20:25 +00:00
Simon Pilgrim	2580554333	[X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930) I've created the necessary classes but there are still a lot of overrides that need cleaning up. NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides. llvm-svn: 331767	2018-05-08 13:51:45 +00:00
Simon Pilgrim	4283924e08	[llvm-mca][x86] Add div/idiv, mul/imul and inc/dec/neg/nop instruction tests llvm-svn: 331765	2018-05-08 13:30:58 +00:00
Simon Pilgrim	061096d2c2	[llvm-mca][x86] Remove addsubpd from SSE2 tests llvm-svn: 331678	2018-05-07 21:10:48 +00:00
Simon Pilgrim	1233e1234a	[X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions. Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour. llvm-svn: 331672	2018-05-07 20:52:53 +00:00
Simon Pilgrim	e480ed0b9f	[X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256 These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents. Differential Revision: https://reviews.llvm.org/D46229 llvm-svn: 331659	2018-05-07 18:25:19 +00:00
Simon Pilgrim	763bf12085	[X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases. Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel. llvm-svn: 331645	2018-05-07 16:34:26 +00:00
Simon Pilgrim	ac5d0a31ef	[X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions. This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values. llvm-svn: 331643	2018-05-07 16:15:46 +00:00
Simon Pilgrim	f3ae50fca2	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629	2018-05-07 11:50:44 +00:00
Simon Pilgrim	0e51a125ea	[X86] Add WriteEMMS scheduler class Filled in the missing values from Btver2 SoG or Agner llvm-svn: 331546	2018-05-04 18:16:13 +00:00
Simon Pilgrim	be51b20127	[X86] Add SchedWriteFRnd fp rounding scheduler classes Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515	2018-05-04 12:59:24 +00:00
Simon Pilgrim	0aed731516	[X86][Znver1] Use SchedAlias to tag microcoded scheduler classes Avoids extra entries in the class tables. Found a typo that missed the MMX_PHSUBSW instruction. llvm-svn: 331488	2018-05-03 22:12:23 +00:00
Simon Pilgrim	350c22c587	[X86][SNB] Fix scheduling of MMX integer multiply instructions. The entries were being bound to the wrong class. llvm-svn: 331388	2018-05-02 19:26:14 +00:00
Clement Courbet	a1a3095d88	[X86] Fix scheduling info for (V?)SQRTPDm on silvermont. https://reviews.llvm.org/D46356 llvm-svn: 331356	2018-05-02 13:46:14 +00:00
Andrea Di Biagio	e047d3529b	[llvm-mca] Correctly handle zero-latency stores that consume pipeline resources. This fixes PR37293. We can have scheduling classes with no write latency entries, that still consume processor resources. We don't want to treat those instructions as zero-latency instructions; they still have to be issued to the underlying pipelines, so they still consume resource cycles. This is likely to be a regression which I have accidentally introduced at revision 330807. Now, if an instruction has a non-empty set of write processor resources, we conservatively treat it as a normal (i.e. non zero-latency) instruction. llvm-svn: 331193	2018-04-30 15:55:04 +00:00
Andrea Di Biagio	77bd1c748a	[llvm-mca] Regenerate test Atom/resources-sse3.s. NFC Before this change, it wrongly specified -mcpu=slm instead of -mcpu=atom. llvm-svn: 331170	2018-04-30 12:13:04 +00:00
Andrea Di Biagio	e9384eb13b	[llvm-mca] Support for in-order CPU for -instruction-tables testing. Added Intel Atom tests to verify that the tool correctly generates instruction tables even if the CPU is in-order. Fixes PR37282. llvm-svn: 331169	2018-04-30 12:05:34 +00:00
Simon Pilgrim	8962c344f9	[llvm-mca][X86] Add BT resource tests to all models llvm-svn: 331144	2018-04-29 15:45:31 +00:00
Simon Pilgrim	2d569361fc	[llvm-mca][X86] Add add/adc + sub/sbb resource tests to all models llvm-svn: 331140	2018-04-29 11:03:25 +00:00
Simon Pilgrim	318e9d39ab	[llvm-mca][X86] Add double shift resource tests to all relevant models llvm-svn: 331109	2018-04-28 15:18:49 +00:00
Simon Pilgrim	4d0187c893	[llvm-mca][X86] Add shift/rotate resource tests to all relevant models I intend to add further instruction tests to the resources-x86_64.s test file as required, but this initial commit is to help remove a load of unnecessary InstRW overrides in a future patch llvm-svn: 331108	2018-04-28 14:56:18 +00:00
Simon Pilgrim	7574ffd7bc	[llvm-mca][X86] Updated fma3 tests after rL330820 llvm-svn: 330822	2018-04-25 13:19:04 +00:00
Andrea Di Biagio	93c49d5e58	[llvm-mca] Default to the native host cpu if flag -mcpu is not specified. llvm-svn: 330809	2018-04-25 10:18:25 +00:00
Simon Pilgrim	27bc83e228	[X86] Split off PHMINPOSUW to their own schedule class This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. llvm-svn: 330756	2018-04-24 18:49:25 +00:00
Simon Pilgrim	f0945aa0e0	[X86][F16C] Add WriteCvtF2FSt scheduling class Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887) llvm-svn: 330737	2018-04-24 16:43:07 +00:00
Simon Pilgrim	828ef9e013	[X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies These are stores, not loads, so don't need to account for load latency. llvm-svn: 330735	2018-04-24 16:26:51 +00:00
Simon Pilgrim	f35b8ac196	[X86][IVB] Add F16C resource tests. Note this is IvyBridge (which shares the model) NOT SandyBridge. llvm-svn: 330734	2018-04-24 16:22:59 +00:00
Andrea Di Biagio	0626864fa4	[llvm-mca] Default the output asm dialect used by the instruction printer to the input asm dialect. The instruction printer used by llvm-mca to generate the performance report now defaults the output assembly format to the format used for the input assembly file. On x86, the asm format can be either AT&T or Intel, depending on the presence/absence of directive `.intel_syntax`. Users can still specify a different assembly dialect with the command line flag -output-asm-variant=<uint>. llvm-svn: 330733	2018-04-24 16:19:08 +00:00
Simon Pilgrim	16299273d0	[X86] Remove unnecessary FMA reg-mem InstRW scheduler overrides. llvm-svn: 330720	2018-04-24 14:47:11 +00:00
Simon Pilgrim	f7d2a93d5f	[X86] Add vector element insertion/extraction scheduler classes Split off pinsr/pextr and extractps instructions. (Mostly) fixes PR36887. Note: It might be worth adding a WriteFInsertLd class as well in the future. Differential Revision: https://reviews.llvm.org/D45929 llvm-svn: 330714	2018-04-24 13:21:41 +00:00
Simon Pilgrim	87ba905fe9	[llvm-mca][X86] Add BMI/LZCNT/POPCNT resource tests to all relevant models The SandyBridge BMI tests are actually run on IvyBridge as that's the first lowest CPU that actually support the ISAs (but still use the SandyBridge model). llvm-svn: 330556	2018-04-22 20:42:24 +00:00
Simon Pilgrim	96855ec39e	[X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides. This also fixes some of the ReadAfterLd issues due to InstRW. llvm-svn: 330544	2018-04-22 14:43:12 +00:00
Simon Pilgrim	5e9f1da0cd	[llvm-mca][X86] Add POPCNT resource test llvm-svn: 330540	2018-04-22 09:58:00 +00:00
Simon Pilgrim	e25aa02bc4	[llvm-mca][X86] Add AVX2 resource tests llvm-svn: 330512	2018-04-21 16:12:42 +00:00
Simon Pilgrim	d73bd154d9	[llvm-mca][X86] Add SSE resource tests to all models llvm-svn: 330506	2018-04-21 14:16:57 +00:00
Simon Pilgrim	26178d4336	[llvm-mca][X86] Add MMX resource tests llvm-svn: 330502	2018-04-21 11:28:59 +00:00
Simon Pilgrim	1264066cd7	[llvm-mca][X86] Add X87 resource tests llvm-svn: 330499	2018-04-21 10:36:19 +00:00
Simon Pilgrim	1803bfb75f	[llvm-mca][X86] Add MMX/SSE/AES/CLMUL resource SandyBridge tests llvm-svn: 330486	2018-04-20 22:04:11 +00:00
Simon Pilgrim	0a6bfb1843	[llvm-mca][X86] Add prefetch instruction resource tests llvm-svn: 330371	2018-04-19 22:11:58 +00:00
Simon Pilgrim	7209117868	[llvm-mca][FMA] Add FMA resource tests llvm-svn: 330366	2018-04-19 21:32:22 +00:00
Simon Pilgrim	4a486c13fa	[llvm-mca][X86] Add resource test for every out-of-order scheduler model I've copied and regenerated a resource file from btver2 to every x86 scheduler model supported by llvm-mca so we have at least some basic coverage. For most this has been the avx1 tests, but for silvermont I've used sse42 as thats the latest it supports. More will be added later. llvm-svn: 330352	2018-04-19 18:08:10 +00:00
Simon Pilgrim	f209321d61	[llvm-mca][X86] Add mmx instruction to btver2 resource tests Useful to see scheduler class deltas against xmm equivalents llvm-svn: 330335	2018-04-19 15:09:46 +00:00
Simon Pilgrim	c310bfa193	[llvm-mca][X86] Add mmx versions of SSSE3 instructions Move PABS instructions incorrectly tested under SSE2 llvm-svn: 330295	2018-04-18 20:47:48 +00:00
Greg Bedwell	90d141a295	[UpdateTestChecks] Add update_mca_test_checks.py script This script can be used to regenerate tests in the test/tools/llvm-mca directory (PR36904). Regenerated a number of tests using the pattern: test/tools/llvm-mca///*.s Differential Revision: https://reviews.llvm.org/D45369 llvm-svn: 330246	2018-04-18 10:27:45 +00:00
Craig Topper	e56a2fc5e7	[X86] Add separate scheduling class for PSADBW instruction. llvm-svn: 330204	2018-04-17 19:35:19 +00:00
Andrea Di Biagio	c752616f30	[llvm-mca] Ensure that instructions with a schedule read-advance are always issued in the right order. Normally, the Scheduler prioritizes older instructions over younger instructions during the instruction issue stage. In one particular case where a dependent instruction had a schedule read-advance associated to one of the input operands, this rule was not correctly applied. This patch fixes the issue and adds a test to verify that we don't regress that particular case. llvm-svn: 330032	2018-04-13 15:19:07 +00:00
Andrea Di Biagio	f41ad5c59e	[llvm-mca] Renamed BackendStatistics to RetireControlUnitStatistics. Also, removed flag -verbose in favor of flag -retire-stats. llvm-svn: 329794	2018-04-11 12:12:53 +00:00
Andrea Di Biagio	1cc29c045e	[llvm-mca] Move the logic that prints scheduler statistics from BackendStatistics to its own view. Added flag -scheduler-stats to print scheduler related statistics. llvm-svn: 329792	2018-04-11 11:37:46 +00:00
Andrea Di Biagio	821f650bba	[llvm-mca] Move the logic that prints dispatch unit statistics from BackendStatistics to its own view. This patch moves the logic that collects and analyzes dispatch events to the DispatchStatistics view. Added flag -dispatch-stats to print statistics related to the dispatch logic. llvm-svn: 329708	2018-04-10 14:55:14 +00:00
Andrea Di Biagio	074cef3dfb	[llvm-mca] Increase the default number of iterations to 100. llvm-svn: 329694	2018-04-10 12:50:03 +00:00
Andrea Di Biagio	c9f409eb6f	Reapply "[llvm-mca] Do not separate iterations with a newline in the timeline view." This reapplies r329403 with a fix for the floating point rounding issue. llvm-svn: 329680	2018-04-10 09:55:33 +00:00
Andrea Di Biagio	c65901282b	[llvm-mca] Add the ability to mark regions of code for analysis (PR36875) This patch teaches llvm-mca how to parse code comments in search for special "markers" used to select regions of code. Example: # LLVM-MCA-BEGIN My Code Region .... # LLVM-MCA-END The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an AsmCommentConsumer) the parsing of code comments to search for begin/end code region markers. A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new region of code. A comment starting with substring "LLVM-MCA-END" marks the end of the last region. This implementation doesn't allow regions to overlap. Each region can have a optional description; internally, each region is identified by a range of source code locations (SMLoc). MCInst objects are added to a region R only if the source location for the MCInst is in the range of locations specified by R. By default, the tool allocates an implicit "Default" code region which contains every source location. See new tests llvm-mca-marker-*.s for a few examples. A new Backend object is created for every region. So, the analysis is conducted on every parsed code region. The final report is the union of the reports generated for every code region. Note that empty regions are skipped. Special "[#] Code Region - ..." strings are used in the report to mark the portion which is specific to a code region only. For example, see llvm-mca-markers-5.s. Differential Revision: https://reviews.llvm.org/D45433 llvm-svn: 329590	2018-04-09 16:39:52 +00:00
Hans Wennborg	6400c03e6a	Revert r329403 "[llvm-mca] Do not separate iterations with a newline in the timeline view." This made AArch64/CortexA57/direct-branch.s fail on Windows, e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/11251 > Also, update a few tests to minimize the diff in D45369. > No functional change intended. llvm-svn: 329569	2018-04-09 13:53:41 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Andrea Di Biagio	85b8138bc6	[llvm-mca] Do not separate iterations with a newline in the timeline view. Also, update a few tests to minimize the diff in D45369. No functional change intended. llvm-svn: 329403	2018-04-06 15:30:02 +00:00
Andrea Di Biagio	c74ad502ce	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304	2018-04-05 15:41:41 +00:00
Simon Pilgrim	8139a88cb6	[X86][Btver2] Strip unnecessary check prefixes from resources tests llvm-svn: 329192	2018-04-04 13:25:45 +00:00
Andrea Di Biagio	8dabf4f145	[llvm-mca] Move the logic that prints register file statistics to its own view. NFCI Before this patch, the "BackendStatistics" view was responsible for printing the register file usage (as well as many other statistics). Now users can enable register file usage statistics using the command line flag `-register-file-stats`. By default, the tool doesn't print register file statistics. llvm-svn: 329083	2018-04-03 16:46:23 +00:00
Andrea Di Biagio	9da4d6db33	[MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067	2018-04-03 13:36:24 +00:00
Andrea Di Biagio	6fd62feff8	[llvm-mca] Do not assume that implicit reads cannot be associated with ReadAdvance entries. Before, the instruction builder incorrectly assumed that only explicit reads could have been associated with ReadAdvance entries. This patch fixes the issue and adds a test to verify it. llvm-svn: 328972	2018-04-02 13:46:49 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Andrea Di Biagio	dc97172b2f	[X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892	2018-03-30 18:53:47 +00:00
Andrea Di Biagio	3eaa26bb64	[X86][BtVer2] Fix the number of uOps for horizontal operations. llvm-svn: 328886	2018-03-30 18:15:30 +00:00
Andrea Di Biagio	073a9d74ca	[X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867	2018-03-30 14:48:08 +00:00
Andrea Di Biagio	42d8ea22c0	[X86][BtVer2] Add tests that show how ReadAfterLd is missing for some instructions. In the Btver2 model, there are a few InstRW overrides that don't specify a ReadAfterLd for the register input operand. As a result, a few AVX variants of horizontal operations and most vector logic operations with a folded memory operand don't have a ReadAdvance info associated to their input register operands. llvm-svn: 328865	2018-03-30 14:29:33 +00:00
Andrea Di Biagio	01043625cf	[X86] Add llvm-mca tests for r328834. Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend instructions. As Craig pointed out in D44726, some Intel models still have to be fixed. llvm-svn: 328861	2018-03-30 13:38:37 +00:00
Andrea Di Biagio	0823090843	[X86] Add tests to verify the presence of "ReadAfterLd" after r328823. This change adds a couple of tests to verify the change introduced by revision 328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI). llvm-svn: 328859	2018-03-30 11:44:48 +00:00
Andrea Di Biagio	0a837ef6b1	[llvm-mca] Correctly set the ReadAdvance information for register use operands. The tool was passing the wrong operand index to method MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and not the operand index. This was found when testing X86 code where instructions had a memory folded operand. This patch fixes the issue and adds test read-advance-1.s to ensure that the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used. llvm-svn: 328790	2018-03-29 14:26:56 +00:00
Andrea Di Biagio	5076b98fb9	[X86][BtVer2] Fix the number of micro opcodes for AES[ENC\|DEC] and other YMM instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698	2018-03-28 12:12:04 +00:00
Andrea Di Biagio	010924e35c	[X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions. The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694	2018-03-28 10:49:33 +00:00
Andrea Di Biagio	9ecb4011ca	[llvm-mca] pass the correct set of used registers in checkRAT. We were incorrectly initializing the array of used registers in method checkRAT. As a consequence, the number of register file stalls was misreported. Added a test to cover this case. llvm-svn: 328629	2018-03-27 15:23:41 +00:00
Simon Pilgrim	fcf49df21c	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573	2018-03-26 19:01:06 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Simon Pilgrim	aa40148cae	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write) llvm-svn: 328536	2018-03-26 16:10:08 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Andrea Di Biagio	5ffd2c3cfc	[llvm-mca] Fix how views are added to the InstructionTables. This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499	2018-03-26 14:25:52 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Andrea Di Biagio	ff9c1092b7	[llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view. llvm-svn: 328493	2018-03-26 13:44:54 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Andrea Di Biagio	d1569290ef	[llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487	2018-03-26 12:04:53 +00:00
Simon Pilgrim	6c63e6c222	[X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit llvm-svn: 328343	2018-03-23 17:59:22 +00:00
Simon Pilgrim	e5c0a041ff	[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338	2018-03-23 17:38:59 +00:00
Simon Pilgrim	256f149bf0	[X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit llvm-svn: 328331	2018-03-23 16:17:56 +00:00
Simon Pilgrim	ee282b3160	[X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units llvm-svn: 328328	2018-03-23 15:35:13 +00:00
Simon Pilgrim	1335b9c0ca	[X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units llvm-svn: 328324	2018-03-23 15:17:50 +00:00
Simon Pilgrim	5792e10ffb	[X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops. llvm-svn: 328320	2018-03-23 14:45:03 +00:00
Simon Pilgrim	8619962c73	[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318	2018-03-23 14:27:26 +00:00
Simon Pilgrim	a1e3ea01ef	[X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304	2018-03-23 11:27:31 +00:00
Craig Topper	659c66dfc1	[X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. Change pblendvb/blendvps/blendvpd to use WriteFVarBlend llvm-svn: 328294	2018-03-23 06:41:41 +00:00
Craig Topper	7580a7997d	[X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version. llvm-svn: 328293	2018-03-23 06:41:40 +00:00
Simon Pilgrim	bcb86bb927	[X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are scheduled through the JFPU1 pipe llvm-svn: 328226	2018-03-22 18:29:16 +00:00
Simon Pilgrim	0e031afa95	[X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipe The ymm instructions are double pumped as well. llvm-svn: 328222	2018-03-22 17:43:12 +00:00
Simon Pilgrim	e5b51f6786	[X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional pipe llvm-svn: 328217	2018-03-22 17:25:38 +00:00
Andrea Di Biagio	12ef5260ea	[llvm-mca] Move the logic that computes the register file usage to the BackendStatistics view. With this patch, the "instruction dispatched" event now provides information related to the number of microarchitectural registers used in each register file. Similarly, the "instruction retired" event is now able to tell how may registers are freed in each register file. Currently, the BackendStatistics view is the only consumer of register usage/pressure information. BackendStatistics uses that info to print out a few general statistics (i.e. max number of mappings used; total mapping created). Before this patch, the BackendStatistics was forced to query the Backend to obtain the register pressure information. This helps removes that dependency. Now views are completely independent from the Backend. As a consequence, it should be easier to address PR36663 and further modularize the pipeline. Added a couple of test cases in the BtVer2 specific directory. llvm-svn: 328129	2018-03-21 18:11:05 +00:00
Simon Pilgrim	203876f104	[X86][Btver2] Fix crc32 schedule costs The default is currently FAdd for some reason llvm-svn: 327807	2018-03-18 19:54:42 +00:00
Simon Pilgrim	13cd3b0961	[X86][Btver2] Add crc32 resource tests llvm-svn: 327805	2018-03-18 18:55:34 +00:00
Simon Pilgrim	c3db8c7cda	[X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe llvm-svn: 327804	2018-03-18 18:45:57 +00:00
Simon Pilgrim	036cc82622	[X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes llvm-svn: 327803	2018-03-18 17:10:12 +00:00
Simon Pilgrim	87d2f7463f	[X86][Btver2] F16C instructions are performed on the JSTC functional pipe llvm-svn: 327801	2018-03-18 15:59:51 +00:00
Simon Pilgrim	40f6d6ad0b	[X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes llvm-svn: 327794	2018-03-18 13:05:09 +00:00
Simon Pilgrim	e16790b133	[X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer. llvm-svn: 327793	2018-03-18 12:37:35 +00:00
Simon Pilgrim	e409f84e7e	[X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit. This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later. llvm-svn: 327791	2018-03-18 12:09:17 +00:00
Simon Pilgrim	0ba4a0f3a6	[X86][Btver2] Add llvm-mca tests to show pipe resource usage of most vector instructions Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes). llvm-svn: 327788	2018-03-18 09:32:38 +00:00
Simon Pilgrim	9c4157bb70	[X86][Btver2] Tweak pipes test to remove register dependencies It gives us a better view of pipe usage in the timeline which is what the test is trying to show. llvm-svn: 327685	2018-03-15 23:15:11 +00:00
Simon Pilgrim	3894809997	[X86][Btver2] Fix ymm div/sqrt to use fmul unit YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent. This matches the pipes for WriteFSqrt/WriteFDiv definitions. llvm-svn: 327682	2018-03-15 23:00:47 +00:00
Simon Pilgrim	49a56faee2	[X86][Btver2] Add test to show timeline of fpu instructions on different pipes/units Try to demonstrate the scheduling from fpu0/fpu1 pipes to the valu0/vimul/fpa or valu1/stc/fpm functional units llvm-svn: 327676	2018-03-15 22:34:24 +00:00
Andrea Di Biagio	7948738673	[llvm-mca] BackendStatistics: early exit from method printSchedulerUsage if the no scheduler resources were consumed. llvm-svn: 327215	2018-03-10 17:40:25 +00:00
Andrea Di Biagio	373c38a2db	[llvm-mca] Fix handling of zero-latency instructions. This patch fixes a problem found when testing zero latency instructions on target AArch64 -mcpu=exynos-m3 / -mcpu=exynos-m1. On Exynos-m3/m1, direct branches are zero-latency instructions that don't consume any processor resources. The DispatchUnit marks zero-latency instructions as "executed", so that no scheduling is required. The event of instruction executed is then notified to all the listeners, and the reorder buffer (managed by the RetireControlUnit) is updated. In particular, the entry associated to the zero-latency instruction in the reorder buffer is marked as executed. Before this patch, the DispatchUnit forgot to assign a retire control unit token (RCUToken) to the zero-latency instruction. As a consequence, the RCUToken was used uninitialized. This was causing a crash in the RetireControlUnit logic. Fixes PR36650. llvm-svn: 327056	2018-03-08 20:21:55 +00:00
Andrea Di Biagio	7bbac07f22	[llvm-mca] Emit the 'Instruction Info' table before the resource pressure view. In future, both the summary information and the 'instruction info' table should be moved into a separate "Summary" view. llvm-svn: 327010	2018-03-08 15:34:38 +00:00
Andrea Di Biagio	3a6b092017	[llvm-mca] LLVM Machine Code Analyzer. llvm-mca is an LLVM based performance analysis tool that can be used to statically measure the performance of code, and to help triage potential problems with target scheduling models. llvm-mca uses information which is already available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific cpu. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with an out-of-order backend, for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were mostly inspired by the IACA tool from Intel. This patch is related to the RFC on llvm-dev visible at this link: http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html Differential Revision: https://reviews.llvm.org/D43951 llvm-svn: 326998	2018-03-08 13:05:02 +00:00

... 4 5 6 7 8 ...

418 Commits