This allows for a much more efficient encoding for small negative
numbers by storing the sign bit first and negating the rest of
the bits. This was already being used for OPC_CheckInteger.
For every in tree target this affects, the table got smaller.
R600GenDAGISel.inc saw the largest reduction of 7K.
I did have to add a new opcode for StringIntegers used for
register class ids and subregister indices since we don't have the
integer value to encode. The enum name is emitted directly into
the table. Previously assumed the enum would expand to a positive
7-bit number. We might be able to just shift that right by 1 and
assume it is a positive 6 bit number, but that will need more
investigation.
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.
To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.
Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)
Differential Revision: https://reviews.llvm.org/D101164
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.
Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
Don't assert if there are unassigned virtual registers. Maintain
LiveIntervals by removing the RegUnits for allocated registers, since
they should not longer be necessary.
One part I find somewhat questionable is the special handling
necessary for handleIdentityCopy. The LiveIntervals for the relevant
regunits needs to be removed.
In a future change it will be possible to run register
allocation with a specific set of register classes,
so some of the remaining virtual registers will still
be meaningful.
Value only used by metadata can be removed from .addrsig table.
This solves the undefined symbol error when enabling addrsig table on COFF LTO.
Differential Revision: https://reviews.llvm.org/D101512
Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D101401
Functions can have section names set via #pragma or section attributes,
basic block sections should be correctly named for such functions.
With #pragma, the expectation is that all functions in that file are placed
in the same section in the final binary. Basic block sections should be
correctly named with the unique flag set so that the final binary has all the
basic blocks of the function in that named section. This patch fixes the bug
by calling getExplictSectionGlobal when implicit-section-name attribute is set
to make sure the function's basic blocks get the correct section name.
Differential Revision: https://reviews.llvm.org/D101311
Some liveins *can* come from this block (e.g. any SSA value except the call),
it's only the ones that produce `landingpad` values that can't and I didn't
think it through properly.
These registers get defined by the runtime, not the block being allocated, and
treating them as preassigned in RegAllocFast adds extra pressure, sometimes
enough to make the function unallocatable.
This reverts commit 3b8ec86fd5.
Revert "[X86] Refine AMX fast register allocation"
This reverts commit c3f95e9197.
This pass breaks using LLVM in a multi-threaded environment by
introducing global state.
This replaces D98479.
This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.
I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.
I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100803
This is a compile time optimization. DILocation:get() is expensive to call, and
we were calling it to create a line zero debug loc for *every* instruction we
translated. We only really need to do this just before we build constants in the
entry block, so I moved this code there. This reduces the LLVM -O0 codegen time
of sqlite3 IR by around 0.7% instructions executed and by about ~2% in CPU time.
We can probably do better with a more involved change, since the reason we need
to create one for each new constant is because we're using the debug scope and
inlined-at loc. If we just use a single instruction's scope and drop the
inlined-at, we can just cache these and have them be free.
This was picking a concrete size for a physical register, and
enforcing exact match on the virtual register's type size. Some
targets add multiple types to a register class, and some are smaller
than the full bit width. For example x86 adds f32 to 128-bit xmm
registers, and AMDGPU adds i16/f16 to 32-bit registers.
It might be better to represent these cases as a copy of the full
register and an extraction of the subpart, but a lot of code assumes
you can directly copy. This will help fix the current usage of the DAG
calling convention infrastructure which is incompatible with how
GlobalISel is now using it.
The API is somewhat cumbersome here, but I just mirrored the existing
functions, except now with LLTs (and allow returning null on failure,
unlike the MVT version). I think the concept of selecting register
classes based on type is flawed to begin with, but I'm trying to keep
this compatible with the existing handling.
This patch fixes a crash in LiveDebugVariables for inputs where a
DBG_VALUE_LIST had 64 or more debug operands. This was triggering an
assert, which was added under the assumption that only bad CodeGen would
result in such a limit being hit, but relatively simple source files
that result in these incredibly long debug values have been found, so
this assert has been changed to a condition that drops the debug value
if it is not met.
Differential Revision: https://reviews.llvm.org/D101373
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.
This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101262
GCC supports negative values for -mstack-protector-guard-offset=, this
should be a signed value. Pre-req to D100919.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101325
The .file directive is changed to only have basename in D36018 for
ELF.
But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D99785
This reverts commit 0ce723cb22.
D76519 was not quite NFC. If we see a CFISection::Debug function before a
CFISection::EH one (-fexceptions -fno-asynchronous-unwind-tables), we may
incorrectly pick CFISection::Debug and emit a `.cfi_sections .debug_frame`.
We should use .eh_frame instead.
This scenario is untested.
https://reviews.llvm.org/D99400 set clang DefaultDebuggerTuning for AIX
to dbx. However, we still need to update the target default so that llc
and other tools will get the same default debuggertuning, and avoid
passing extra options in LTO.
Reviewed By: #powerpc, shchenz, dblaikie
Differential Revision: https://reviews.llvm.org/D101197
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
The data member 'shouldEmitMoves' is only used in DwarfCFIException::beginFunction()
and 'shouldEmitCFI' in DwarfCFIExceptionBase serves its purpose.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101155
At the moment, MachineCSE allows CSE-ing convergent instrs which are
non-local to each other. This can cause illegal codegen as convergent
instrs are control flow dependent. The patch prevents non-local CSE of
convergent instrs by adding a check in isProfitableToCSE and rejecting
CSE-ing if we're considering CSE-ing non-local convergent instrs. We
can still CSE convergent instrs which are in the same control flow
scope, so the patch purposely does not make all convergent instrs
non-CSE candidates in isCSECandidate.
https://reviews.llvm.org/D101187
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e.
Make following function return void:
addLabel()
addSectionLabel()
addSectionDelta()
This aligns with other attributes adding functions.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101022
This is mostly NFC except that for end of BB not previous slot is used.
Idx is used to find a def of sibling live interval in that slot.
The def on end of MBB and on previous slot of end MBB should be the same,
so it should be NFC.
Reviewers: reames, qcolombet, MatzeB, wmi, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100922
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.
Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.
Fixes PR50083
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X, step_vector(-C))
Differential Revision: https://reviews.llvm.org/D100812