llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrea Di Biagio	d768d35515	[MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965	2019-01-23 16:35:07 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Andrea Di Biagio	97ed076dd1	[MCA] Fix wrong definition of ResourceUnitMask in DefaultResourceStrategy. Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It should have been a 64 bit quantity instead. That means, ResourceUnitMask was always implicitly truncated to a 32 bit quantity. This issue has been found by inspection. Surprisingly, that bug was latent, and it never negatively affected any existing upstream targets. This patch fixes the wrong definition of ResourceUnitMask, and adds a bunch of extra debug prints to help debugging potential issues related to invalid processor resource masks. llvm-svn: 350820	2019-01-10 13:59:13 +00:00
Clement Courbet	cc5e6a72de	[llvm-mca] Move llvm-mca library to llvm/lib/MCA. Summary: See PR38731. Reviewers: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D55557 llvm-svn: 349332	2018-12-17 08:08:31 +00:00
Andrea Di Biagio	373a4ccf6c	[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857	2018-11-29 12:15:56 +00:00
Andrea Di Biagio	07a8255a78	[llvm-mca][View] Improved Retire Control Unit Statistics. RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493	2018-11-23 12:12:57 +00:00
Andrea Di Biagio	fe3bc1b9bf	[llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics. This patch teaches view RegisterFileStatistics how to report events for optimizable register moves. For each processor register file, view RegisterFileStatistics reports the following extra information: - Number of optimizable register moves - Number of register moves eliminated - Number of zero moves (i.e. register moves that propagate a zero) - Max Number of moves eliminated per cycle. Differential Revision: https://reviews.llvm.org/D53976 llvm-svn: 345865	2018-11-01 18:04:39 +00:00
Fangrui Song	5a8fd65700	[llvm-mca] Move namespace mca inside llvm:: Summary: This allows to remove `using namespace llvm;` in those .cpp files When we want to revisit the decision (everything resides in llvm::mca::) in the future, we can move things to a nested namespace of llvm::mca::, to conceptually make them separate from the rest of llvm::mca::* Reviewers: andreadb, mattd Reviewed By: andreadb Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D53407 llvm-svn: 345612	2018-10-30 15:56:08 +00:00
Sam McCall	0739ffc161	[llvm-mca] Fix -wreorder and -Wunused-private-field after r345376. NFC llvm-svn: 345378	2018-10-26 12:19:48 +00:00
Andrea Di Biagio	84d0051310	[llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFC llvm-svn: 345376	2018-10-26 10:48:04 +00:00
Andrea Di Biagio	77c26aebda	[llvm-mca] Removed a couple of redundant method declarations, and simplified code in ResourcePressureView. NFC llvm-svn: 345259	2018-10-25 11:51:34 +00:00
Andrea Di Biagio	65c77d7283	[llvm-mca] Remove dependency from InstrBuilder in class InstructionTables. Also, removed the initialization of vectors used for processor resource masks. Support function 'computeProcResourceMasks()' already calls method resize on those vectors. No functional change intended. llvm-svn: 345161	2018-10-24 16:56:43 +00:00
Andrea Di Biagio	7be45b0f85	[llvm-mca] Refactor class SourceMgr. NFCI Added begin()/end() methods to allow the usage of SourceMgr in foreach loops. With this change, method getMCInstFromIndex() (as well as a couple of other methods) are now redundant, and can be removed from the public interface. llvm-svn: 345147	2018-10-24 15:06:27 +00:00
Andrea Di Biagio	db158be646	[llvm-mca] Remove a couple of using directives and a bunch of redundant namespace llvm prefixes. NFC llvm-svn: 344916	2018-10-22 16:28:07 +00:00
Andrea Di Biagio	b7f120f071	[llvm-mca] Remove method RegisterFileStatistics::initializeRegisterFileInfo(). NFC llvm-svn: 344339	2018-10-12 12:38:27 +00:00
Matt Davis	db834837c2	[llvm-mca] Delay calculation of Cycles per Resources, separate the cycles and resource quantities. Summary: This patch removes the storing of accumulated floating point data within the llvm-mca library. This patch splits-up the two quantities: cycles and number of resource units. By splitting-up these two quantities, we delay the calculation of "cycles per resource unit" until that value is read, reducing the chance of accumulating floating point error. I considered using the APFloat, but after measuring performance, for a large (many iteration) sample, I decided to go with this faster solution. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: llvm-commits, javed.absar, tschuett, gbedwell Differential Revision: https://reviews.llvm.org/D51903 llvm-svn: 341980	2018-09-11 18:47:48 +00:00
Andrea Di Biagio	7f2230ff16	[llvm-mca] correctly initialize field 'CycleRetired' in the TimelineView. This fixes a [-Wmissing-field-initializers] warning reported by buildbot lld-x86_64-darwin13, build #25152. llvm-svn: 341056	2018-08-30 11:17:58 +00:00
Andrea Di Biagio	8b647dcf4b	[llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view. This patch introduces the following changes to the DispatchStatistics view: * DispatchStatistics now reports the number of dispatched opcodes instead of the number of dispatched instructions. * The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of stall cycles against the total simulated cycles. This change allows users to easily compare dispatch group sizes with the processor DispatchWidth. Before this change, it was difficult to correlate the two numbers, since DispatchStatistics view reported numbers of instructions (instead of opcodes). DispatchWidth defines the maximum size of a dispatch group in terms of number of micro opcodes. The other change introduced by this patch is related to how DispatchStage generates "instruction dispatch" events. In particular: * There can be multiple dispatch events associated with a same instruction * Each dispatch event now encapsulates the number of dispatched micro opcodes. The number of micro opcodes declared by an instruction may exceed the processor DispatchWidth. Therefore, we cannot assume that instructions are always fully dispatched in a single cycle. DispatchStage knows already how to handle instructions declaring a number of opcodes bigger that DispatchWidth. However, DispatchStage always emitted a single instruction dispatch event (during the first simulated dispatch cycle) for instructions dispatched. With this patch, DispatchStage now correctly notifies multiple dispatch events for instructions that cannot be dispatched in a single cycle. A few views had to be modified. Views can no longer assume that there can only be one dispatch event per instruction. Tests (and docs) have been updated. Differential Revision: https://reviews.llvm.org/D51430 llvm-svn: 341055	2018-08-30 10:50:20 +00:00
Andrea Di Biagio	a2eee47450	[llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. This patch adds two new fields to the perf report generated by the SummaryView. Fields are now logically organized into two small groups; only the second group contains throughput indicators. Example: ``` Iterations: 100 Instructions: 300 Total Cycles: 414 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.69 IPC: 0.72 Block RThroughput: 4.0 ``` This patch also updates the docs for llvm-mca. Due to the nature of this change, several tests in the tools/llvm-mca directory were affected, and had to be updated using script `update_mca_test_checks.py`. llvm-svn: 340946	2018-08-29 17:56:39 +00:00
Andrea Di Biagio	4269d64b20	[llvm-mca] Initialize each element in vector TimelineView::UsedBuffers to a default invalid buffer descriptor. NFCI Also change the default buffer size for UsedBuffer entries to -1 (i.e. "unknown size"). No functional change intended. llvm-svn: 340830	2018-08-28 15:07:11 +00:00
Andrea Di Biagio	d17d371c40	[llvm-mca][TimelineView] Force the same number of executions for every entry in the 'wait-times' table. This patch also uses colors to highlight problematic wait-time entries. A problematic entry is an entry with an high wait time that tends to match (or exceed) the size of the scheduler's buffer. Color RED is used if an instruction had to wait an average number of cycles which is bigger than (or equal to) the size of the underlying scheduler's buffer. Color YELLOW is used if the time (in cycles) spend waiting for the operands or pipeline resources is bigger than half the size of the underlying scheduler's buffer. Color MAGENTA is used if an instruction does not consume buffer resources according to the scheduling model. llvm-svn: 340825	2018-08-28 14:27:01 +00:00
Andrea Di Biagio	29c5d5aa36	[llvm-mca] Pass an instruction reference when notifying event listeners about reserved/released buffer resources. NFC llvm-svn: 340821	2018-08-28 13:14:42 +00:00
Andrea Di Biagio	1a87a80d2f	[llvm-mca] Remove unused include. NFC llvm-svn: 340768	2018-08-27 19:14:35 +00:00
Andrea Di Biagio	b89b96c1b2	[llvm-mca] Improved report generated by the SchedulerStatistics view. Before this patch, the SchedulerStatistics only printed the maximum number of buffer entries consumed in each scheduler's queue at a given point of the simulation. This patch restructures the reported table, and adds an extra field named "Average number of used buffer entries" to it. This patch also uses different colors to help identifying bottlenecks caused by high scheduler's buffer pressure. llvm-svn: 340746	2018-08-27 14:52:52 +00:00
Matt Davis	10aa09f008	[llvm-mca] Move views and stats into a Views subdir. NFC. llvm-svn: 340645	2018-08-24 20:24:53 +00:00

25 Commits