Summary:
In SjLj exception mode, the old landingpad BB will create a new landingpad BB and use indirect branch jump to the old landingpad BB in lowering.
So we should add 2 endbr for this exception model.
Reviewers: hjl.tools, craig.topper, annita.zhang, LuoYuanke, pengfei, efriedma
Reviewed By: LuoYuanke
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77124
Summary:
When we encode an instruction, we need to know the number of bytes being
emitted to determine the fixups in `X86MCCodeEmitter::emitImmediate`.
There are only two callers for `emitImmediate`: `emitMemModRMByte` and
`encodeInstruction`.
Before this patch, we kept track of the current byte being emitted
by passing a reference parameter `CurByte` across all the `emit*`
funtions, which is ugly and unnecessary. For example, we don't have any
fixups when emitting prefixes, so we don't need to track this value.
In this patch, we use `StartByte` to record the initial status of the
streamer, and use `OS.tell()` to get the current status of the streamer
when we need to know the number of bytes being emitted. On one hand,
this eliminates the parameter `CurByte` for most `emit*` functions, on
the other hand, this make things clear: Only pass the parameter when we
really need it.
Reviewers: craig.topper, pengfei, MaskRay
Reviewed By: craig.topper, MaskRay
Subscribers: hiraditya, llvm-commits, annita.zhang
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78419
gcc may warn here because X86ISD::NodeType is specified as "unsigned",
but ISD::NodeType is a naked C enum (although passed as an "unsigned"
throughout SDAG).
getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain.
PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do.
To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.
This is an enhancement to D77895 to avoid another
round-trip from XMM->GPR->XMM. This time we handle
the case of starting/ending with an f64 and casting
to signed i32 as the intermediate value.
It's a bit more involved than I initially assumed
because we need to use target-specific opcodes to
represent the non-standard cast ops.
Differential Revision: https://reviews.llvm.org/D78362
Reduce X86Subtarget.h/MCCodeEmitter.h/TargetMachine.h includes to forward declarations
Add explicit X86Subtarget.h/TargetMachine.h includes to X86AsmPrinter.cpp/X86MCInstLower.cpp
Remove unused MCSymbol forward declaration
We were only including MachineInstr.h for DebugLoc.h. This exposes an implicit include dependency in BTFDebug.h where I've had to add the MachineInstr.h include.
Summary:
The instruction in non-text section can not be executed, so they will not affect performance.
In addition, their encoding values are treated as data, so we should not touch them.
Reviewers: MaskRay, reames, LuoYuanke, jyknight
Reviewed By: MaskRay
Subscribers: annita.zhang, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77971
[MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining
New test to check that we only outline CFI instruction if all CFI
Instructions in the function would be captured by the outlining
adding x86 tests analagous to AARCH64 cfi tests
Revision: https://reviews.llvm.org/D77852
Starting with hasRedZone adding MachineFunctionInfo to be put in the YAML for MIR files.
Split out of: D78062
Based on implementation for MachineFunctionInfo for WebAssembly
Differential Revision: https://reviews.llvm.org/D78173
Patch by Andrew Litteken! (AndrewLitteken)
This reverts commit 17b1869b72.
It is an attempt to fix the failure reported at
The patch differs from the original one reviwed at
https://reviews.llvm.org/D77435 only for the use of the std::make_tuple
in building the return value of `findAddrModeSVELoadStore`:
- return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
+ return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase,
the original patch submitted at
fc4e954ed5
was failing the following build:
http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/
with error:
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10:
error: chosen constructor is explicit in copy-initialization
return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
note: explicit constructor declared here
constexpr tuple(_UElements&&... __elements)
^
1 error generated.
Summary:
This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets.
Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction.
FWIW, GCC also emits a `prfb` for these cases.
Reviewers: sdesmalen, andwar, rengolin
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78069
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: craig.topper, sdesmalen, efriedma, RKSimon
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77264
Summary:
Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The
optimisation is also enabled by default now.
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77387
Add initial support for PC Relative addressing for global values that
require GOT indirect addressing. This patch adds PCRelative support for
global addresses that may not be known at link time and may require
access through the GOT.
Differential Revision: https://reviews.llvm.org/D76064
Summary:
Introduce new helper functions getVGPRClassForBitWidth,
getAGPRClassForBitWidth, getSGPRClassForBitWidth and use them to
refactor various other functions that all contained their own lists of
valid register class widths. NFC.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78311
Summary:
AIX symbol have qualname and unqualified name. The stock getSymbol
could only return unqualified name, which leads us to patch many
caller side(lowerConstant, getMCSymbolForTOCPseudoMO).
So we should try to address this problem in the callee
side(getSymbol) and clean up the caller side instead.
Note: this is a "mostly" NFC patch, with a fix for the original
lowerConstant behavior.
Differential Revision: https://reviews.llvm.org/D78045
Summary:
Use more logic and fewer tables. This reduces the line count and
reduces the effort required to introduce more register classes of
different sizes in future.
Reviewers: arsenm, rampitec, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78351
Previous patch didn't handle the early return in `emitREXPrefix` correctly,
which causes REX prefix was not emitted for instruction without
operands. This patch includes the fix for that.
This does for G_EXTRACT_VECTOR_ELT what 588bd7be36 did for G_TRUNC.
Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.
Differential Revision: https://reviews.llvm.org/D77135
Add patterns that use a normal, non-wrapping, add and sub nodes along
with an arm vshr imm node.
Differential Revision: https://reviews.llvm.org/D77065
Summary:
We determine the REX prefix used by instruction in `determineREXPrefix`,
and this value is used in `emitMemModRMByte' and used as the return
value of `emitOpcodePrefix`.
Before this patch, REX was passed as reference to `emitPrefixImpl`, it
is strange and not necessary, e.g, we have to write
```
bool Rex = false;
emitPrefixImpl(CurOp, CurByte, Rex, MI, STI, OS);
```
in `emitPrefix` even if `Rex` will not be used.
So we let HasREX be the return value of `emitPrefixImpl`. The HasREX is passed
from `emitREXPrefix` to `emitOpcodePrefix` and then to
`emitPrefixImpl`. This makes sense since REX is a kind of opcode prefix
and of course is a prefix.
Reviewers: craig.topper, pengfei
Reviewed By: craig.topper
Subscribers: annita.zhang, craig.topper, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78276