336 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			336 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
===================
 | 
						|
Debugging with XRay
 | 
						|
===================
 | 
						|
 | 
						|
This document shows an example of how you would go about analyzing applications
 | 
						|
built with XRay instrumentation. Here we will attempt to debug ``llc``
 | 
						|
compiling some sample LLVM IR generated by Clang.
 | 
						|
 | 
						|
.. contents::
 | 
						|
  :local:
 | 
						|
 | 
						|
Building with XRay
 | 
						|
------------------
 | 
						|
 | 
						|
To debug an application with XRay instrumentation, we need to build it with a
 | 
						|
Clang that supports the ``-fxray-instrument`` option. See `XRay <XRay.html>`_
 | 
						|
for more technical details of how XRay works for background information.
 | 
						|
 | 
						|
In our example, we need to add ``-fxray-instrument`` to the list of flags
 | 
						|
passed to Clang when building a binary. Note that we need to link with Clang as
 | 
						|
well to get the XRay runtime linked in appropriately. For building ``llc`` with
 | 
						|
XRay, we do something similar below for our LLVM build:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ mkdir -p llvm-build && cd llvm-build
 | 
						|
  # Assume that the LLVM sources are at ../llvm
 | 
						|
  $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
 | 
						|
      -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \
 | 
						|
  # Once this finishes, we should build llc
 | 
						|
  $ ninja llc
 | 
						|
 | 
						|
 | 
						|
To verify that we have an XRay instrumented binary, we can use ``objdump`` to
 | 
						|
look for the ``xray_instr_map`` section.
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ objdump -h -j xray_instr_map ./bin/llc
 | 
						|
  ./bin/llc:     file format elf64-x86-64
 | 
						|
  
 | 
						|
  Sections:
 | 
						|
  Idx Name          Size      VMA               LMA               File off  Algn
 | 
						|
   14 xray_instr_map 00002fc0  00000000041516c6  00000000041516c6  03d516c6  2**0
 | 
						|
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
 | 
						|
 | 
						|
Getting Traces
 | 
						|
--------------
 | 
						|
 | 
						|
By default, XRay does not write out the trace files or patch the application
 | 
						|
before main starts. If we run ``llc`` it should work like a normally built
 | 
						|
binary. If we want to get a full trace of the application's operations (of the
 | 
						|
functions we do end up instrumenting with XRay) then we need to enable XRay
 | 
						|
at application start. To do this, XRay checks the ``XRAY_OPTIONS`` environment
 | 
						|
variable.
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  # The following doesn't create an XRay trace by default.
 | 
						|
  $ ./bin/llc input.ll
 | 
						|
 | 
						|
  # We need to set the XRAY_OPTIONS to enable some features.
 | 
						|
  $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./bin/llc input.ll
 | 
						|
  ==69819==XRay: Log file in 'xray-log.llc.m35qPB'
 | 
						|
 | 
						|
At this point we now have an XRay trace we can start analysing.
 | 
						|
 | 
						|
The ``llvm-xray`` Tool
 | 
						|
----------------------
 | 
						|
 | 
						|
Having a trace then allows us to do basic accounting of the functions that were
 | 
						|
instrumented, and how much time we're spending in parts of the code. To make
 | 
						|
sense of this data, we use the ``llvm-xray`` tool which has a few subcommands
 | 
						|
to help us understand our trace.
 | 
						|
 | 
						|
One of the things we can do is to get an accounting of the functions that have
 | 
						|
been instrumented. We can see an example accounting with ``llvm-xray account``:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
 | 
						|
  Functions with latencies: 29
 | 
						|
     funcid      count [      min,       med,       90p,       99p,       max]       sum  function
 | 
						|
        187        360 [ 0.000000,  0.000001,  0.000014,  0.000032,  0.000075]  0.001596  LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier()
 | 
						|
         85        130 [ 0.000000,  0.000000,  0.000018,  0.000023,  0.000156]  0.000799  X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*)
 | 
						|
        138        130 [ 0.000000,  0.000000,  0.000017,  0.000155,  0.000155]  0.000774  SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
 | 
						|
        188        103 [ 0.000000,  0.000000,  0.000003,  0.000123,  0.000214]  0.000737  LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*)
 | 
						|
         88          1 [ 0.000562,  0.000562,  0.000562,  0.000562,  0.000562]  0.000562  X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&)
 | 
						|
        125        102 [ 0.000001,  0.000003,  0.000010,  0.000017,  0.000049]  0.000471  Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&)
 | 
						|
         90          8 [ 0.000023,  0.000035,  0.000106,  0.000106,  0.000106]  0.000342  X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const
 | 
						|
        124         32 [ 0.000003,  0.000007,  0.000016,  0.000041,  0.000041]  0.000310  Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&)
 | 
						|
        123          1 [ 0.000302,  0.000302,  0.000302,  0.000302,  0.000302]  0.000302  LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl()
 | 
						|
        139         46 [ 0.000000,  0.000002,  0.000006,  0.000008,  0.000019]  0.000138  TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const
 | 
						|
 | 
						|
This shows us that for our input file, ``llc`` spent the most cumulative time
 | 
						|
in the lexer (a total of 1 millisecond). If we wanted for example to work with
 | 
						|
this data in a spreadsheet, we can output the results as CSV using the
 | 
						|
``-format=csv`` option to the command for further analysis.
 | 
						|
 | 
						|
If we want to get a textual representation of the raw trace we can use the
 | 
						|
``llvm-xray convert`` tool to get YAML output. The first few lines of that
 | 
						|
output for an example trace would look like the following:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB
 | 
						|
  ---
 | 
						|
  header:          
 | 
						|
    version:         1
 | 
						|
    type:            0
 | 
						|
    constant-tsc:    true
 | 
						|
    nonstop-tsc:     true
 | 
						|
    cycle-frequency: 2601000000
 | 
						|
  records:         
 | 
						|
    - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 }
 | 
						|
    - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 }
 | 
						|
    - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 }
 | 
						|
    - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 }
 | 
						|
    - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 }
 | 
						|
    - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 }
 | 
						|
    - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 }
 | 
						|
    - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 }
 | 
						|
    - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 }
 | 
						|
    - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 }
 | 
						|
    - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 }
 | 
						|
    - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 }
 | 
						|
 | 
						|
Controlling Fidelity
 | 
						|
--------------------
 | 
						|
 | 
						|
So far in our examples, we haven't been getting full coverage of the functions
 | 
						|
we have in the binary. To get that, we need to modify the compiler flags so
 | 
						|
that we can instrument more (if not all) the functions we have in the binary.
 | 
						|
We have two options for doing that, and we explore both of these below.
 | 
						|
 | 
						|
Instruction Threshold
 | 
						|
`````````````````````
 | 
						|
 | 
						|
The first "blunt" way of doing this is by setting the minimum threshold for
 | 
						|
function bodies to 1. We can do that with the
 | 
						|
``-fxray-instruction-threshold=N`` flag when building our binary. We rebuild
 | 
						|
``llc`` with this option and observe the results:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ rm CMakeCache.txt
 | 
						|
  $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
 | 
						|
      -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \
 | 
						|
      -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1"
 | 
						|
  $ ninja llc
 | 
						|
  $ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
 | 
						|
  ==69819==XRay: Log file in 'xray-log.llc.5rqxkU'
 | 
						|
 | 
						|
  $ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
 | 
						|
  Functions with latencies: 36652
 | 
						|
   funcid      count [      min,       med,       90p,       99p,       max]       sum  function    
 | 
						|
       75          1 [ 0.672368,  0.672368,  0.672368,  0.672368,  0.672368]  0.672368  llc.cpp:271:0: main
 | 
						|
       78          1 [ 0.626455,  0.626455,  0.626455,  0.626455,  0.626455]  0.626455  llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&)
 | 
						|
   139617          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&)
 | 
						|
   139610          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&)
 | 
						|
   139612          1 [ 0.470948,  0.470948,  0.470948,  0.470948,  0.470948]  0.470948  LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&)
 | 
						|
   139607          2 [ 0.147345,  0.315994,  0.315994,  0.315994,  0.315994]  0.463340  LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&)
 | 
						|
   139605         21 [ 0.000002,  0.000002,  0.102593,  0.213336,  0.213336]  0.463331  LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&)
 | 
						|
   139563      26096 [ 0.000002,  0.000002,  0.000037,  0.000063,  0.000215]  0.225708  LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool)
 | 
						|
   108055        188 [ 0.000002,  0.000120,  0.001375,  0.004523,  0.062624]  0.159279  MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
 | 
						|
    62635         22 [ 0.000041,  0.000046,  0.000050,  0.126744,  0.126744]  0.127715  X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const
 | 
						|
 | 
						|
 | 
						|
Instrumentation Attributes
 | 
						|
``````````````````````````
 | 
						|
 | 
						|
The other way is to use configuration files for selecting which functions
 | 
						|
should always be instrumented by the compiler. This gives us a way of ensuring
 | 
						|
that certain functions are either always or never instrumented by not having to
 | 
						|
add the attribute to the source.
 | 
						|
 | 
						|
To use this feature, you can define one file for the functions to always
 | 
						|
instrument, and another for functions to never instrument. The format of these
 | 
						|
files are exactly the same as the SanitizerLists files that control similar
 | 
						|
things for the sanitizer implementations. For example:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  # xray-attr-list.txt
 | 
						|
  # always instrument functions that match the following filters:
 | 
						|
  [always]
 | 
						|
  fun:main
 | 
						|
 | 
						|
  # never instrument functions that match the following filters:
 | 
						|
  [never]
 | 
						|
  fun:__cxx_*
 | 
						|
 | 
						|
Given the file above we can re-build by providing it to the
 | 
						|
``-fxray-attr-list=`` flag to clang. You can have multiple files, each defining
 | 
						|
different sets of attribute sets, to be combined into a single list by clang.
 | 
						|
 | 
						|
The XRay stack tool
 | 
						|
-------------------
 | 
						|
 | 
						|
Given a trace, and optionally an instrumentation map, the ``llvm-xray stack``
 | 
						|
command can be used to analyze a call stack graph constructed from the function
 | 
						|
call timeline.
 | 
						|
 | 
						|
The way to use the command is to output the top stacks by call count and time spent.
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc
 | 
						|
 | 
						|
  Unique Stacks: 3069
 | 
						|
  Top 10 Stacks by leaf sum:
 | 
						|
 | 
						|
  Sum: 9633790
 | 
						|
  lvl   function                                                            count              sum
 | 
						|
  #0    main                                                                    1         58421550
 | 
						|
  #1    compileModule(char**, llvm::LLVMContext&)                               1         51440360
 | 
						|
  #2    llvm::legacy::PassManagerImpl::run(llvm::Module&)                       1         40535375
 | 
						|
  #3    llvm::FPPassManager::runOnModule(llvm::Module&)                         2         39337525
 | 
						|
  #4    llvm::FPPassManager::runOnFunction(llvm::Function&)                     6         39331465
 | 
						|
  #5    llvm::PMDataManager::verifyPreservedAnalysis(llvm::Pass*)             399         16628590
 | 
						|
  #6    llvm::PMTopLevelManager::findAnalysisPass(void const*)               4584         15155600
 | 
						|
  #7    llvm::PMDataManager::findAnalysisPass(void const*, bool)            32088          9633790
 | 
						|
 | 
						|
  ..etc..
 | 
						|
 | 
						|
In the default mode, identical stacks on different threads are independently
 | 
						|
aggregated. In a multithreaded program, you may end up having identical call
 | 
						|
stacks fill your list of top calls.
 | 
						|
 | 
						|
To address this, you may specify the ``-aggregate-threads`` or
 | 
						|
``-per-thread-stacks`` flags. ``-per-thread-stacks`` treats the thread id as an
 | 
						|
implicit root in each call stack tree, while ``-aggregate-threads`` combines
 | 
						|
identical stacks from all threads.
 | 
						|
 | 
						|
Flame Graph Generation
 | 
						|
----------------------
 | 
						|
 | 
						|
The ``llvm-xray stack`` tool may also be used to generate flamegraphs for
 | 
						|
visualizing your instrumented invocations. The tool does not generate the graphs
 | 
						|
themselves, but instead generates a format that can be used with Brendan Gregg's
 | 
						|
FlameGraph tool, currently available on `github
 | 
						|
<https://github.com/brendangregg/FlameGraph>`_.
 | 
						|
 | 
						|
To generate output for a flamegraph, a few more options are necessary.
 | 
						|
 | 
						|
- ``-all-stacks`` - Emits all of the stacks.
 | 
						|
- ``-stack-format`` - Choose the flamegraph output format 'flame'.
 | 
						|
- ``-aggregation-type`` - Choose the metric to graph.
 | 
						|
 | 
						|
You may pipe the command output directly to the flamegraph tool to obtain an
 | 
						|
svg file.
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc -stack-format=flame -aggregation-type=time -all-stacks | \
 | 
						|
  /path/to/FlameGraph/flamegraph.pl > flamegraph.svg
 | 
						|
 | 
						|
If you open the svg in a browser, mouse events allow exploring the call stacks.
 | 
						|
 | 
						|
Further Exploration
 | 
						|
-------------------
 | 
						|
 | 
						|
The ``llvm-xray`` tool has a few other subcommands that are in various stages
 | 
						|
of being developed. One interesting subcommand that can highlight a few
 | 
						|
interesting things is the ``graph`` subcommand. Given for example the following
 | 
						|
toy program that we build with XRay instrumentation, we can see how the
 | 
						|
generated graph may be a helpful indicator of where time is being spent for the
 | 
						|
application.
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
  // sample.cc
 | 
						|
  #include <iostream>
 | 
						|
  #include <thread>
 | 
						|
 | 
						|
  [[clang::xray_always_instrument]] void f() {
 | 
						|
    std::cerr << '.';
 | 
						|
  }
 | 
						|
 | 
						|
  [[clang::xray_always_instrument]] void g() {
 | 
						|
    for (int i = 0; i < 1 << 10; ++i) {
 | 
						|
      std::cerr << '-';
 | 
						|
    }
 | 
						|
  }
 | 
						|
 | 
						|
  int main(int argc, char* argv[]) {
 | 
						|
    std::thread t1([] {
 | 
						|
      for (int i = 0; i < 1 << 10; ++i)
 | 
						|
        f();
 | 
						|
    });
 | 
						|
    std::thread t2([] {
 | 
						|
      g();
 | 
						|
    });
 | 
						|
    t1.join();
 | 
						|
    t2.join();
 | 
						|
    std::cerr << '\n';
 | 
						|
  }
 | 
						|
 | 
						|
We then build the above with XRay instrumentation:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1
 | 
						|
  $ XRAY_OPTIONS="patch_premain=true" ./sample
 | 
						|
 | 
						|
We can then explore the graph rendering of the trace generated by this sample
 | 
						|
application. We assume you have the graphviz toosl available in your system,
 | 
						|
including both ``unflatten`` and ``dot``. If you prefer rendering or exploring
 | 
						|
the graph using another tool, then that should be feasible as well. ``llvm-xray
 | 
						|
graph`` will create DOT format graphs which should be usable in most graph
 | 
						|
rendering applications. One example invocation of the ``llvm-xray graph``
 | 
						|
command should yield some interesting insights to the workings of C++
 | 
						|
applications:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
  $ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \
 | 
						|
      | unflatten -f -l10 | dot -Tsvg -o sample.svg
 | 
						|
 | 
						|
Next Steps
 | 
						|
----------
 | 
						|
 | 
						|
If you have some interesting analyses you'd like to implement as part of the
 | 
						|
llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list.
 | 
						|
The following are some ideas to inspire you in getting involved and potentially
 | 
						|
making things better.
 | 
						|
 | 
						|
  - Implement a query/filtering library that allows for finding patterns in the
 | 
						|
    XRay traces.
 | 
						|
  - A conversion from the XRay trace onto something that can be visualised
 | 
						|
    better by other tools (like the Chrome trace viewer for example).
 | 
						|
  - Collecting function call stacks and how often they're encountered in the
 | 
						|
    XRay trace.
 | 
						|
 | 
						|
 |