88 lines
		
	
	
		
			2.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			88 lines
		
	
	
		
			2.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| ==================================
 | |
| Benchmarking tips
 | |
| ==================================
 | |
| 
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| For benchmarking a patch we want to reduce all possible sources of
 | |
| noise as much as possible. How to do that is very OS dependent.
 | |
| 
 | |
| Note that low noise is required, but not sufficient. It does not
 | |
| exclude measurement bias. See
 | |
| https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for
 | |
| example.
 | |
| 
 | |
| General
 | |
| ================================
 | |
| 
 | |
| * Use a high resolution timer, e.g. perf under linux.
 | |
| 
 | |
| * Run the benchmark multiple times to be able to recognize noise.
 | |
| 
 | |
| * Disable as many processes or services as possible on the target system.
 | |
| 
 | |
| * Disable frequency scaling, turbo boost and address space
 | |
|   randomization (see OS specific section).
 | |
| 
 | |
| * Static link if the OS supports it. That avoids any variation that
 | |
|   might be introduced by loading dynamic libraries. This can be done
 | |
|   by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.
 | |
| 
 | |
| * Try to avoid storage. On some systems you can use tmpfs. Putting the
 | |
|   program, inputs and outputs on tmpfs avoids touching a real storage
 | |
|   system, which can have a pretty big variability.
 | |
| 
 | |
|   To mount it (on linux and freebsd at least)::
 | |
| 
 | |
|     mount -t tmpfs -o size=<XX>g none dir_to_mount
 | |
| 
 | |
| Linux
 | |
| =====
 | |
| 
 | |
| * Disable address space randomization::
 | |
| 
 | |
|     echo 0 > /proc/sys/kernel/randomize_va_space
 | |
| 
 | |
| * Set scaling_governor to performance::
 | |
| 
 | |
|    for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
 | |
|    do
 | |
|      echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
 | |
|    done
 | |
| 
 | |
| * Use https://github.com/lpechacek/cpuset to reserve cpus for just the
 | |
|   program you are benchmarking. If using perf, leave at least 2 cores
 | |
|   so that perf runs in one and your program in another::
 | |
| 
 | |
|     cset shield -c N1,N2 -k on
 | |
| 
 | |
|   This will move all threads out of N1 and N2. The ``-k on`` means
 | |
|   that even kernel threads are moved out.
 | |
| 
 | |
| * Disable the SMT pair of the cpus you will use for the benchmark. The
 | |
|   pair of cpu N can be found in
 | |
|   ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and
 | |
|   disabled with::
 | |
| 
 | |
|     echo 0 > /sys/devices/system/cpu/cpuX/online
 | |
| 
 | |
| 
 | |
| * Run the program with::
 | |
| 
 | |
|     cset shield --exec -- perf stat -r 10 <cmd>
 | |
| 
 | |
|   This will run the command after ``--`` in the isolated cpus. The
 | |
|   particular perf command runs the ``<cmd>`` 10 times and reports
 | |
|   statistics.
 | |
| 
 | |
| With these in place you can expect perf variations of less than 0.1%.
 | |
| 
 | |
| Linux Intel
 | |
| -----------
 | |
| 
 | |
| * Disable turbo mode::
 | |
| 
 | |
|     echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
 |