We need to make sure not to emit R_X86_64_GOTPCRELX relocations for
instructions that use a REX prefix. If a REX prefix is present, we need to
instead use a R_X86_64_REX_GOTPCRELX relocation. The existing logic for
CALL64m, JMP64m, etc. already handles this by checking the HasREX parameter
and using it to determine which relocation type to use. Do this for all
instructions that can use relaxed relocations.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93561
The REX prefix is needed to allow linker relaxations: even if the
instruction we emit may not need it, the linker may change it to a
different instruction which does need it.
clang may produce `movl x@GOTPCREL+4(%rip), %eax` when loading the high
32 bits of the address of a global variable in -fpic/-fpie mode.
If assembled by GNU as, the fixup emits R_X86_64_GOTPCRELX with an addend != -4.
The instruction loads from the GOT entry with an offset and thus it is incorrect
to relax the instruction.
This patch does not emit a relaxable relocation for a GOT load with an offset
because R_X86_64_[REX_]GOTPCRELX do not make sense for instructions which cannot
be relaxed. The result is good enough for LLD to work. GNU ld relaxes
mov+GOTPCREL as well, but it suppresses the relaxation if addend != -4.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D92114
This reapplies 36c64af9d7 in updated
form.
Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)
The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.
Differential Revision: https://reviews.llvm.org/D87448
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).
Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D90039
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
For now, we lost the encoding information if we using inline assembly.
The encoding for the inline assembly will keep default even if we add
the vex/evex prefix.
Differential Revision: https://reviews.llvm.org/D90009
We have been producing R_X86_64_REX_GOTPCRELX (MOV64rm/TEST64rm/...) and
R_X86_64_GOTPCRELX for CALL64m/JMP64m without the REX prefix since 2016 (to be
consistent with GNU as), but not for MOV32rm/TEST32rm/...
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.
So far only the X86 disassemblers are supported.
Test Plan:
llvm-objdump -d --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:
<_start>:
push rax
mov dword ptr [rsp + 4], 0
mov dword ptr [rsp], 0
mov eax, dword ptr [rsp]
cmp eax, dword ptr [rip + 4112] # 202182 <g>
jge 0x20117e <_start+0x25>
call 0x201158 <foo>
inc dword ptr [rsp]
jmp 0x201169 <_start+0x10>
xor eax, eax
pop rcx
ret
```
llvm-objdump -d **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:
<_start>:
push rax
mov dword ptr [rsp + 4], 0
mov dword ptr [rsp], 0
<L1>:
mov eax, dword ptr [rsp]
cmp eax, dword ptr <g>
jge <L0>
call <foo>
inc dword ptr [rsp]
jmp <L1>
<L0>:
xor eax, eax
pop rcx
ret
```
Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D84191
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
These prefixes should override the default behavior and force a larger immediate size. I don't believe gas issues any warning if you use {disp8} when a 32-bit displacement is already required. And this patch doesn't either.
This completes the {disp8} and {disp32} support from PR46650.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D84793
By repeating the Disp.isImm() check in a couple spots we can
make the normal case for immediate and for expression the same.
And then always rely on the ForceDisp32 flag to remove a later
non-zero immediate check.
This should make {disp32} pseudo prefix handling
slightly easier as we need the normal disp32 handler to handle a
immediate of 0.
We currently handle EVEX and non-EVEX separately in two places. By sinking the EVEX
check into the existing helper for CDisp8 we can simplify these two places.
Differential Revision: https://reviews.llvm.org/D84730
In 16-bit mode we can encode a 32-bit address using 0x67 prefix.
We were failing to do this when the index register was a 32-bit
register, the base register was not present, and the displacement
fit in 16-bits.
Fixes PR46866.
ParseX86Triple already checks for 64-bit mode and produces a
static string. We can just add +sse2 to the end of that static
string. This avoids a potential reallocation when appending it
to the std::string at runtime.
This is a slight change to the behavior of tools that only use
MC layer which weren't implicitly enabling sse2 before, but will
now. I don't think we check for sse2 explicitly in any MC layer
components so this shouldn't matter in practice. And if it did
matter the new behavior is more correct.
The input to these functions is a StringRef. We then convert it
to a std::string. Then maybe replace with "generic". I think we
can just overwrite the incoming StringRef with "generic" if needed
and then pass it along without creating any std::string.
These are documented as using modrm byte of 0xe8, 0xf0, and 0xf8
respectively. But hardware ignore bits 2:0. So 0xe9-0xef is treated
the same as 0xe8. Similar for the other two.
Fixing this required adding 8 new formats to the X86 instructions
to convey this information. Could have gotten away with 3, but
adding all 8 made for a more logical conversion from format to
modrm encoding.
I renumbered the format encodings to keep the register modrm
formats grouped together.
Most of the wrappers exist to print the memory size in Intel syntax
and then call the printMemReference. But printanymem/printopaquemem
don't print anything extra in Intel syntax so just drop them.
Negations are incorrectly added in numerous places and the code just happens to work.
Also fix a missed DW_CFA_def_cfa_offset negation in c693b9c321d5a40d012340619674cf790c9ac86c:
ARMAsmBackendDarwin::generateCompactUnwindEncoding
The generic implementation is actually specific to x86. It assumes the
offset is relative to the end of the instruction and the immediate is
not scaled (which is false on most RISC).
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
1) The first argument's type is `const MCInst &`, the third
argument's type is `MCInst &`, but they may be aliased to the same
variable
2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
loop.
In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.
Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain
Reviewed By: Razer6, MaskRay, bcain
Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78364
We don't need all of MCStreamer.h, just FormattedStream.h. The rest can be replaced with forward declarations.
X86WinAllocaExpander.cpp had an implicit dependency on MapVector.h which I've added locally.
Summary:
When we encode an instruction, we need to know the number of bytes being
emitted to determine the fixups in `X86MCCodeEmitter::emitImmediate`.
There are only two callers for `emitImmediate`: `emitMemModRMByte` and
`encodeInstruction`.
Before this patch, we kept track of the current byte being emitted
by passing a reference parameter `CurByte` across all the `emit*`
funtions, which is ugly and unnecessary. For example, we don't have any
fixups when emitting prefixes, so we don't need to track this value.
In this patch, we use `StartByte` to record the initial status of the
streamer, and use `OS.tell()` to get the current status of the streamer
when we need to know the number of bytes being emitted. On one hand,
this eliminates the parameter `CurByte` for most `emit*` functions, on
the other hand, this make things clear: Only pass the parameter when we
really need it.
Reviewers: craig.topper, pengfei, MaskRay
Reviewed By: craig.topper, MaskRay
Subscribers: hiraditya, llvm-commits, annita.zhang
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78419
Summary:
The instruction in non-text section can not be executed, so they will not affect performance.
In addition, their encoding values are treated as data, so we should not touch them.
Reviewers: MaskRay, reames, LuoYuanke, jyknight
Reviewed By: MaskRay
Subscribers: annita.zhang, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77971
Previous patch didn't handle the early return in `emitREXPrefix` correctly,
which causes REX prefix was not emitted for instruction without
operands. This patch includes the fix for that.
Summary:
We determine the REX prefix used by instruction in `determineREXPrefix`,
and this value is used in `emitMemModRMByte' and used as the return
value of `emitOpcodePrefix`.
Before this patch, REX was passed as reference to `emitPrefixImpl`, it
is strange and not necessary, e.g, we have to write
```
bool Rex = false;
emitPrefixImpl(CurOp, CurByte, Rex, MI, STI, OS);
```
in `emitPrefix` even if `Rex` will not be used.
So we let HasREX be the return value of `emitPrefixImpl`. The HasREX is passed
from `emitREXPrefix` to `emitOpcodePrefix` and then to
`emitPrefixImpl`. This makes sense since REX is a kind of opcode prefix
and of course is a prefix.
Reviewers: craig.topper, pengfei
Reviewed By: craig.topper
Subscribers: annita.zhang, craig.topper, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78276