Commit Graph

62465 Commits

Author SHA1 Message Date
David Green 15b5d1a5bf [ARM] Transfer memory operands for VLDn
We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering::
getTgtMemIntrinsic, but they do not currently make it ll the way through
ISel.  This changes that in the various places it needs changing, making
sure that the MMO is propagate through to the final instruction. This
can help in scheduling, not treating the VLD2/VST2 as a scheduling
barrier.

Differential Revision: https://reviews.llvm.org/D101096
2021-05-03 00:04:21 +01:00
Stelios Ioannou 36a44dfd95 [AArch64] Sets the preferred function alignment for Cortex-A53/A55.
Setting the preffered function alignment to 16 for Cortex A53/A55
improves performance in a wide range of benchmarks. This brings it
in line with the Cortex-A53/A55 tuning that is used in GCC
(gcc/config/aarch64/aarch64.c).

Differential Revision: https://reviews.llvm.org/D101636

Change-Id: I2ce47fe7ab5e3b54f49c89038d8da4e404742de2
2021-05-03 00:00:10 +01:00
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Harald van Dijk f30500632b
[X32][CET] Fix size and alignment of .note.gnu.property section
X32 uses 32-bit ELF object files with 32-bit alignment, so the
.note.gnu.property section needs to be emitted as it is for X86.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101689
2021-05-01 22:17:04 +01:00
Roman Lebedev 2b93c9c16c
[X86] AMD Zen 3 Scheduler Model
Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.

This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.

I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)

Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
  {F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
  {F16422601}
* throughput is within reason
  {F16422607}

I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}

I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
  I'm not really sure if it's a modelling defect that needs to be fixed,
  or it's a defect of measurments.
* Pipe distributions are probably bad :)
  I can't do much here until AMD allows that to be fixed
  by documenting the appropriate counters and updating libpfm

That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!

Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.

Differential Revision: https://reviews.llvm.org/D94395
2021-05-01 22:08:13 +03:00
LemonBoy 4751cadcca [AArch64] Prevent spilling between ldxr/stxr pairs
Apply the same logic used to check if CMPXCHG nodes should be expanded
at -O0: the register allocator may end up spilling some register in
between the atomic load/store pairs, breaking the atomicity and possibly
stalling the execution.

Fixes PR48017

Reviewed By: efriedman

Differential Revision: https://reviews.llvm.org/D101163
2021-05-01 17:17:05 +02:00
Dávid Bolvanský 2af95a5275 [X86] Promote 16-bit CTTZ_ZERO_UNDEF to 32-bit variant
Related to PR50172.

Protects us against regressions after we will start doing cttz(zext(x)) -> zext(cttz(x)) transformation in the middle-end.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101662
2021-05-01 00:42:15 +02:00
Amara Emerson 7d2562c2da [AArch64][GlobalISel] Use a single MachineIRBuilder for most of isel. NFC.
This is a long overdue cleanup. Not every use is eliminated, I stuck to uses
that were directly being called from select(), and not the render functions.

Differential Revision: https://reviews.llvm.org/D101590
2021-04-30 14:49:41 -07:00
Jay Foad 7e43483dd1 [AMDGPU] Remove set_gpr_idx instructions in conditional blocks
SIPreEmitPeephole did not try to remove redundant s_set_gpr_idx_*
instructions in blocks that end with a conditional branch instruction.
This seems like a simple oversight.

Differential Revision: https://reviews.llvm.org/D101629
2021-04-30 22:15:45 +01:00
Daniil Fukalov 3489c2d7b1 [TTI] NFC: Change getTypeLegalizationCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen, kparzysz

Differential Revision: https://reviews.llvm.org/D101533
2021-04-30 22:51:51 +03:00
Nick Desaulniers 93bc038126 [M68k] fix -Wdefaulted-function-deleted and -Woverloaded-virtual
Fixes the following warnings observerd when building the experimental
m68k backend (-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="M68k"):

../lib/Target/M68k/M68kMachineFunction.h:71:3: warning: explicitly
defaulted default constructor is implicitly deleted
[-Wdefaulted-function-deleted]
  M68kMachineFunctionInfo() = default;
  ^
../lib/Target/M68k/M68kMachineFunction.h:24:20: note: default
constructor of 'M68kMachineFunctionInfo' is implicitly deleted because
field 'MF' of reference type 'llvm::MachineFunction &' would not be
initialized
  MachineFunction &MF;
                   ^
In file included from ../lib/Target/M68k/M68kISelLowering.cpp:18:
In file included from ../lib/Target/M68k/M68kSubtarget.h:17:
../lib/Target/M68k/M68kFrameLowering.h:60:8: warning:
'llvm::M68kFrameLowering::emitCalleeSavedFrameMoves' hides overloaded
virtual functions [-Woverloaded-virtual]
  void emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
       ^
../include/llvm/CodeGen/TargetFrameLowering.h:215:3: note: hidden
overloaded virtual function
'llvm::TargetFrameLowering::emitCalleeSavedFrameMoves' declared here:
different number of parameters (2 vs 3)
  emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
  ^
../include/llvm/CodeGen/TargetFrameLowering.h:218:16: note: hidden
overloaded virtual function
'llvm::TargetFrameLowering::emitCalleeSavedFrameMoves' declared here:
different number of parameters (4 vs 3)
  virtual void emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
               ^

pr/50071

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D101588
2021-04-30 11:23:31 -07:00
Eli Friedman 6e6ae6c727 [AArch64] Fix lowering for fshl/fshr with SVE types.
These operations don't exist natively, so just let the
target-independent code expand to plain shifts.

The generated sequences could probably be optimized a bit more, but
they seem good enough for now.

Differential Revision: https://reviews.llvm.org/D101574
2021-04-30 10:51:25 -07:00
Stelios Ioannou 936c777e2b [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.
This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and
LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single
STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.
For each pair, there is a MIR test that verifies this optimization.

Differential Revision: https://reviews.llvm.org/D99272

Change-Id: Ie97a20c8c716c08492fe229c22e14e3c98ef08b7
2021-04-30 17:29:58 +01:00
Bradley Smith 62e9c7601a [AArch64][SVE] Remove unused function missed from D101302
The functionality in SVEIntrinsicOpts::isReinterpretToSVBool was moved in
D101302, however the original now unused function was not removed (NFC).

Differential Revision: https://reviews.llvm.org/D101642
2021-04-30 16:57:09 +01:00
Tomas Matheson c7df6b1223 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 3338290c18.

Broke expensive checks on debian.
2021-04-30 16:53:14 +01:00
Tomas Matheson 3338290c18 [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164
2021-04-30 16:40:33 +01:00
Amy Kwan 64d951be61 [PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.

The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.

The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.

This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.

Differential Revision: https://reviews.llvm.org/D93370
2021-04-30 09:53:19 -05:00
Sidharth Baveja 70c433a184 [XCOFF][AIX] Add Global Variables Directly to TOC for 32 bit AIX
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.

Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
2021-04-30 14:48:02 +00:00
Simon Moll 7a86645611 [VE] VP intrinsics are legal 2021-04-30 15:47:55 +02:00
Jun Ma b310dd1501 [AArch64][SVE] Lower index_vector to step_vector
As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D100816
2021-04-30 19:04:39 +08:00
David Stuttard a67a377014 [AMDGPU] Tidy up some simple expressions for clarity NFC
Slight refactor for clarity.

Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5

Differential Revision: https://reviews.llvm.org/D101610
2021-04-30 11:13:54 +01:00
Fraser Cormack 791766e6d2 [RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856
2021-04-30 09:36:18 +01:00
Jay Foad f251379a91 [AMDGPU] Simplify getWaitStatesSince. NFC. 2021-04-30 08:58:24 +01:00
Christudasan Devadasan 544be70864 [AMDGPU] Skip promote-alloca for insertelement/insertvalue users
It is difficult to track the users of vector and aggregate types.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D101562
2021-04-30 08:37:26 +05:30
luxufan 5603ed60ad [RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed
When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field

Differential Revision: https://reviews.llvm.org/D100286
2021-04-30 11:02:38 +08:00
Brendon Cahoon d7d85f72ef [AArch64][GlobalISel] Fix width value for G_SBFX/G_UBFX
When creating G_SBFX/G_UBFX opcodes, the last operand is the
width instead of the bit position. The bit position is used
for the AArch64 SBFM and UBFM instructions. The bit position
is converted to a width if the SBFX/UBFX aliases are generated.
For other SBMF/UBFM aliases, such as shifts, the bit position
is used.

Differential Revision: https://reviews.llvm.org/D101543
2021-04-29 21:54:19 -04:00
Carl Ritson 424f1f6f96 [AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101430
2021-04-30 09:18:56 +09:00
Carl Ritson 749702fc6b [AMDGPU] Remove dead early-out in GCNHazardRecognizer
Remove an early-out in wait state counting which can never be
taken.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D101520
2021-04-30 08:55:49 +09:00
Amara Emerson 96ec6d91e4 [AArch64][GlobalISel] Simplify out of range rotate amount.
Differential Revision: https://reviews.llvm.org/D101005
2021-04-29 14:05:58 -07:00
Jay Foad 16d707e656 [AMDGPU] Fix v_swap_b32 formation on physical registers
As explained in the comments, matchSwap matches:

// mov t, x
// mov x, y
// mov y, t

and turns it into:

// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y

On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.

Differential Revision: https://reviews.llvm.org/D101546
2021-04-29 20:53:40 +01:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Petar Avramovic c34900e133 AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return
When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.

Differential Revision: https://reviews.llvm.org/D101448
2021-04-29 20:56:03 +02:00
Victor Huang ae3377c553 [AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects
- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
  the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
  region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property

Reviewed by: sfertile

Differential Revision: https://reviews.llvm.org/D100956
2021-04-29 13:18:59 -05:00
Benjamin Kramer df323ba445 Revert "[X86] Support AMX fast register allocation"
This reverts commit 3b8ec86fd5.

Revert "[X86] Refine AMX fast register allocation"

This reverts commit c3f95e9197.

This pass breaks using LLVM in a multi-threaded environment by
introducing global state.
2021-04-29 18:56:33 +02:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Anirudh Prasad ded0a70aeb [AsmParser][SystemZ][z/OS] Reject "Dot" as current PC on z/OS
- Currently, the "." (Dot) character, when not identifying an Identifier or a Constant, refers to the current PC (Program Counter)
- However, in z/OS, for the HLASM dialect, it strictly accepts only the "*" as the current PC (Support for this will be put up in a follow-up patch)
- The changes in this patch allow individual platforms to choose whether they would like to use the "." (Dot) character as a marker for the current PC or not.
- It is achieved by introducing a new field in MCAsmInfo.h called `DotIsPC` (similar to `DollarIsPC`)

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D100975
2021-04-29 11:58:54 -04:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Bradley Smith 354604a2a7 [AArch64][SVE] Use SIMD variant of INSR when scalar is the result of a vector extract
At the intrinsic layer the sve.insr operation takes a scalar. When this
scalar is an integer we are forcing a data transition between GPRs and
ZPRs that is potentially costly.

Often the integer scalar is the result of a vector extract, when
performing a reduction for example. In such cases we should keep all
data within the ZPRs.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101169
2021-04-29 12:17:42 +01:00
Bradley Smith 89085bcc86 [AArch64][SVE] Convert svdup(vec, SV_VL1, elm) to insertelement(vec, elm, 0)
By converting the SVE intrinsic to a normal LLVM insertelement we give
the code generator a better chance to remove transitions between GPRs
and VPRs

Co-authored-by: Paul Walker <paul.walker@arm.com>

Depends on D101302

Differential Revision: https://reviews.llvm.org/D101167
2021-04-29 12:17:42 +01:00
Bradley Smith c8f20ed448 [AArch64][SVE] Move convert.{from,to}.svbool optimization into InstCombine
As part of this the ptrue coalescing done in SVEIntrinsicOpts has been
modified to not introduce redundant converts, since the convert removal
will no longer run after that optimisation to clean up.

Differential Revision: https://reviews.llvm.org/D101302
2021-04-29 12:17:42 +01:00
Sebastian Neubauer 9569d5ba02 [AMDGPU] Allow buildSpillLoadStore in empty bb
This allows calling buildSpillLoadStore for an empty basic block, where
MI points at the end of the block instead of to an instruction.

This only happens with downstream CFI changes, so I was not able to
create a testcase that works with upstream LLVM.

Differential Revision: https://reviews.llvm.org/D101356
2021-04-29 12:53:20 +02:00
David Green e11420ca23 [ARM] Ensure CSINC has one use in CSINV combine
Otherwise the CMP glue may be used in multiple nodes, needing to be
emitted multiple times. Currently this either increases instruction
count or fails as it attempt to insert the same node multiple times.
2021-04-29 10:59:14 +01:00
David Spickett 54ee962e47 [NVPTX] Fix unused var warning with asserts disabled
<...>/llvm-project/llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp:191:15:
warning: unused variable ‘ASC’ [-Wunused-variable]
  191 |     if (auto *ASC =
dyn_cast<AddrSpaceCastInst>(I.OldInstruction)) {
      |               ^~~
2021-04-29 09:54:03 +01:00
Qiu Chaofan 56d923efdb [SPE] Support constrained float operations on SPE
This patch enables support on SPE for constrained arithmetic and
comparison operations. This fixes bugzilla 50070.

One thing not covered is fcmp vs. fcmps on SPE. Some condition code
generates singaling comparison while some not. In this patch, all are
considered as singaling. So there might be still some issue when
compiling from C code.

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D101282
2021-04-29 16:34:10 +08:00
Fraser Cormack 43ad058a01 [RISCV] Fix stack slot for argument types (Bug 49500)
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D99087
2021-04-29 09:10:48 +01:00
Harald van Dijk 1b788607f5
[X32][CET] Fix handling of indirect branches
As X32 uses 32-bit pointers without having 32-bit indirect branch
instructions, we need to fix up indirect branches by extending the
branch targets to 64 bits. This was already done for BRIND but not yet
for NT_BRIND. The same logic works for both, so this applies that
existing logic to NT_BRIND as well.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101499
2021-04-29 08:33:22 +01:00
David Green 465df35355 [ARM] Use just ARM::t2B in ARMBlockPlacementPass
The ARMConstantIsland pass will convert any t2B to tB if they are within
range after it has added or moved any constant pools. They don't need to
be deliberately converted beforehand, and it doesn't deal with needing
to convert tB to t2B very well.
2021-04-29 07:44:04 +01:00