llvm-project/llvm/test/tools/llvm-objdump/X86
Hongtao Yu 819b2d9c79 [llvm-objdump] Symbolize binary addresses for low-noisy asm diff.
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.

So far only the X86 disassemblers are supported.

Test Plan:

llvm-objdump -d  --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr [rip + 4112]  # 202182 <g>
               	jge	0x20117e <_start+0x25>
               	call	0x201158 <foo>
               	inc	dword ptr [rsp]
               	jmp	0x201169 <_start+0x10>
               	xor	eax, eax
               	pop	rcx
               	ret
```

llvm-objdump -d  **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
<L1>:
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr  <g>
               	jge	 <L0>
               	call	 <foo>
               	inc	dword ptr [rsp]
               	jmp	 <L1>
<L0>:
               	xor	eax, eax
               	pop	rcx
               	ret
```

Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D84191
2020-08-17 16:55:12 -07:00
..
Inputs [llvm-objdump][test] Move {AArch64,X86}/macho-* to MachO/ 2020-03-15 15:05:12 -07:00
adjust-vma.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
coff-dis-internal.test
coff-disassemble-export.test [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form 2020-03-26 08:28:59 -07:00
debug-info-fileinfo.test
demangle.s [llvm-objdump] Demangle C++ Symbols in branch and call targets 2020-04-18 08:30:50 -07:00
disassemble-align.s [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
disassemble-archive-with-source.ll
disassemble-code-data-mix.s
disassemble-data.test
disassemble-demangle.test
disassemble-functions-mangling.test [llvm-objdump] Rename --disassemble-functions to --disassemble-symbols 2020-03-09 08:25:45 -07:00
disassemble-functions.test Be more strict when checking existence of foo 2020-03-15 12:02:19 +09:00
disassemble-implied-by-disassemble-functions.test [llvm-objdump] Rename --disassemble-functions to --disassemble-symbols 2020-03-09 08:25:45 -07:00
disassemble-invalid-byte-sequences.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
disassemble-long-instructions.test
disassemble-no-symbol-at-section-start.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
disassemble-same-section-addr.test [llvm-objdump] Look in all viable sections for call/branch targets 2020-04-22 12:28:30 +01:00
disassemble-section-name.s [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
disassemble-show-raw.test [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
disassemble-text.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
disassemble-zeroes-relocations.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
elf-disassemble-bss.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
elf-disassemble-dynamic-symbols.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
elf-disassemble-no-symtab.test [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form 2020-03-26 08:28:59 -07:00
elf-disassemble-relocs.test [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form 2020-03-26 08:28:59 -07:00
elf-disassemble-symbol-labels-exec.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
elf-disassemble-symbol-labels-rel.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
elf-disassemble-symbol-references.yaml [llvm-objdump] Print target address with evaluateMemoryOperandAddress() 2020-04-27 09:43:51 -07:00
elf-disassemble-symbololize-operands.yaml [llvm-objdump] Symbolize binary addresses for low-noisy asm diff. 2020-08-17 16:55:12 -07:00
elf-disassemble.test [test] llvm/test/: change llvm-objdump single-dash long options to double-dash options 2020-03-15 17:46:23 -07:00
elf-dynamic-relocs.test
elf-dynamic-symbols.test [llvm-objdump] Teach `llvm-objdump` dump dynamic symbols. 2020-04-05 10:46:59 +08:00
function-sections-line-numbers.s [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
invalid-macho-build-version.yaml [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
lit.local.cfg
out-of-section-sym.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
output-ordering.test
phdrs-lma.test
phdrs-lma2.test [yaml2obj] - Set a default value for `PAddr` property of a program header to a value of `VAddr` 2020-03-14 17:44:57 +03:00
phdrs.test
plt.test [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
print-symbol-addr.s [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
section-filter-disasm.test
section-filter-relocs.test [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form 2020-03-26 08:28:59 -07:00
section-index.s [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
source-interleave-function-from-debug.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
source-interleave-invalid-source.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
source-interleave-missing-source.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
source-interleave-no-debug-info.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
source-interleave-relative-paths.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
source-interleave-same-line-different-file.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
source-interleave-x86_64.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
start-stop-address-relocatable-object.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
start-stop-address.test [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` 2020-03-05 18:05:28 -08:00
warn-missing-disasm-func.test [llvm-objdump] Rename --disassemble-functions to --disassemble-symbols 2020-03-09 08:25:45 -07:00