When we have a terminator sequence (i.e. a tailcall or return),
MIIsInTerminatorSequence is used to work out where the preceding ABI-setup
instructions end, i.e. the parts that were glued to the terminator
instruction. This allows LLVM to split blocks safely without having to
worry about ABI stuff.
The function only ignores DBG_VALUE instructions, meaning that the two
debug instructions I recently added can end terminator sequences early,
causing various MachineVerifier errors. This patch promotes the test for
debug instructions from "isDebugValue" to "isDebugInstr", thus avoiding any
debug-info interfering with this function.
Differential Revision: https://reviews.llvm.org/D106660
This patch emits DBG_INSTR_REFs for two remaining flavours of variable
locations that weren't supported: copies, and inter-block VRegs. There are
still some locations that must be represented by DBG_VALUE such as
constants, but they're mostly independent of optimisations.
For variable locations that refer to values defined in different blocks,
vregs are allocated before isel begins, but the defining instruction
might not exist until late in isel. To get around this, emit
DBG_INSTR_REFs in a "half done" state, where the first operand refers to a
VReg. Then at the end of isel, patch these back up to refer to
instructions, using the finalizeDebugInstrRefs method.
Copies are something that I complained about the original RFC, and I
really don't want to have to put instruction numbers on copies. They don't
define a value: they move them. To address this isel, salvageCopySSA
interprets:
* COPYs,
* SUBREG_TO_REG,
* Anything that isCopyInstr thinks is a copy.
And follows chains of copies back to the defining instruction that they
read from. This relies on any physical registers that COPYs read being
defined in the same block, or being entry-block arguments. For the former
we can put an instruction number on the defining instruction; for the
latter we can drop a DBG_PHI that reads the incoming value.
Differential Revision: https://reviews.llvm.org/D88896
This intrinsic blocks floating point transformations by the optimizer.
Author: Pengfei
Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn
Differential Revision: https://reviews.llvm.org/D99675
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173
ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast
object file isn't linked into clang executable file as no symbol is
referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp,
then the ScheduleDAGFast object file will be linked into clang
executable file. The static RegisterScheduler will register scheduler
fast and linearize at clang boot time.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D101601
This allows for a much more efficient encoding for small negative
numbers by storing the sign bit first and negating the rest of
the bits. This was already being used for OPC_CheckInteger.
For every in tree target this affects, the table got smaller.
R600GenDAGISel.inc saw the largest reduction of 7K.
I did have to add a new opcode for StringIntegers used for
register class ids and subregister indices since we don't have the
integer value to encode. The enum name is emitted directly into
the table. Previously assumed the enum would expand to a positive
7-bit number. We might be able to just shift that right by 1 and
assume it is a positive 6 bit number, but that will need more
investigation.
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.
Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
This patch completes ISel support for DIArgList dbg.values by allowing
SDDbgValues with multiple location operands to be emitted as DBG_VALUE_LIST
instructions.
The primary change of this patch is refactoring EmitDbgValue by pulling location
operand emission out to the new function AddDbgValueLocationOps, which is used
for both DIArgList and single value dbg.values. Outside of that, the only
behaviour change is that the scheduler has a lambda added, HasUnknownVReg, to
prevent us from attempting to emit a DBG_VALUE_LIST before all of its used VRegs
have become available.
Differential Revision: https://reviews.llvm.org/D88592
CheckInteger uses an int64_t encoded using a variable width encoding
that is optimized for encoding a number with a lot of leading zeros.
Negative numbers have no leading zeros so use the largest encoding
requiring 9 bytes.
I believe its most like we want to check for positive and negative
numbers near 0. -1 is quite common due to its use in the 'not'
idiom.
To optimize for this, we can borrow an idea from the bitcode format
and move the sign bit to bit 0 with the magnitude stored in the
upper bits. This will drastically increase the number of leading
zeros for small magnitudes. Then we can run this value through
VBR encoding.
This gives a small reduction in the table size on all in tree
targets except VE where size increased by about 300 bytes due
to intrinsic ids now requiring 3 bytes instead of 2. Since the
intrinsic enum space is shared by all targets this an unfortunate
consquence of where VE is currently located in the range.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D96317
The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the
ISD::isBuildVectorAll(Ones|Zeros) helper functions. This was inhibiting
their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In
particular, RISC-V had to define its own 'vnot' fragment.
In order to extend the scope of these nodes to include support for
ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced:
ISD::isConstantSplatVectorAll(Ones|Zeros). These effectively supersede
the older "isBuildVector" predicates, which are now simple wrappers for
the new functions. They pass a defaulted boolean toggle which preserves
the old behaviour. It is hoped that in time all call-sites can be ported
to the "isConstantSplatVector" functions.
While the use of ISD::isBuildVectorAll(Ones|Zeros) has not changed, the
behaviour of the TableGen immAll(Ones|Zeros)V **has**. To test the new
functionality, the custom RISC-V TableGen fragment has been removed and
replaced with the built-in 'vnot'. To test their use as pattern-roots, two
splat patterns have been updated accordingly.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94223
This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.
```
bb.0.bb0:
frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
PSEUDO_PROBE 837061429793323041, 1, 0
$edi = MOV32ri 1, debug-location !13; test.c:0
JCC_1 %bb.1, 4, implicit $eflags
bb.2.bb2:
PSEUDO_PROBE 837061429793323041, 3, 0
PSEUDO_PROBE 837061429793323041, 4, 0
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
RETQ
bb.1.bb1:
PSEUDO_PROBE 837061429793323041, 2, 0
PSEUDO_PROBE 837061429793323041, 4, 0
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
RETQ
```
The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86495
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.
For e.g.
svint32_t bar(svint32_t);
svint32_t foo(svint32_t x, bool *err) {
try { bar(x); } catch (...) { *err = true; }
return x;
}
`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84737
9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate
this, and move all isel pseudo expansion to FinalizeISel. This was a
bad rebase or something, and failed to actually delete this call.
GlobalISel also has a redundant call of finalizeLowering. However, it
requires more work to remove it since it currently triggers a lot of
verifier errors in tests.
It looks like 9cac4e6d14 accidentally
added a second copy of this from a bad rebase or something. This
second copy was added, and the finalizeLowering call was not deleted
as intended.
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
Summary:
This caused incorrect debug information for parameters:
Previously, after a COPY of a parameter that changes the width,
we would emit a DBG_VALUE that continues to be associated to that
parameter, even though it now used a different width.
This made the LiveDebugValues pass assume the parameter value
got clobbered and it stopped tracking the parameter entry
value, leading to incorrect debug information.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39715
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80819
While LazyBlockFrequencyInfo itself is lazy, the dominator tree
and loop info analyses it requires are not. Drop the dependency
on this pass in SelectionDAGIsel at O0.
This makes for a ~0.6% O0 compile-time improvement.
Differential Revision: https://reviews.llvm.org/D80387
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
Summary:
This code was throwing away the opcode for a boolean, which was then
reconstructing the opcode from that boolean. Just pass the opcode, and
forget the boolean.
Reviewers: srhines
Reviewed By: srhines
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77100
Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor.
This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case.
Differential Revision: https://reviews.llvm.org/D73875
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.
This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.
In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.
To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:
- For machine nodes, check the mayRaiseFPException property of
the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode
The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71841
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:
- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
compare can raise exceptions and therefore needs to be a strict compare.
I've made it signaling (even though quiet would also be correct) as
signaling is the more usual default for an LT. This code exists both
in common code and in the X86 target.
- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
that ends up not chosen. I've fixed the algorithm to use only a single
STRICT_SINT_TO_FP instead.
- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
the wrong thing because it calls getOperationAction using the result VT.
But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
be called using the operand VT.
- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D71840
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
A second try after reverted D71072.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71149
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71072
This patch implements the following changes:
1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.
This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG
2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.
This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.
In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening
3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations. This was not possible in the past due to the
problems described under 1) and 2) above.
With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.
Differential Revision: https://reviews.llvm.org/D70913
float node
This patch add an option 'disable-strictnode-mutation' to prevent strict
node mutating to an normal node.
So we can make sure that the patch which sets strict-node as legal works
correctly.
Patch by Chen Liu(LiuChen3)
Differential Revision: https://reviews.llvm.org/D70226
This allows operations that are marked Custom, but have some type
combinations that are legal to get past this code.
Add custom mutation code to X86's Select function for the nodes
that don't have isel patterns yet.
I reviewed the diff hunks of 05da2fe521 that don't contain
'#include' lines, and found two unintended changes. I deleted a header
banner inadvertently while inserting a header, and changed the
indentation of a constructor in an odd way. Add back the banner, and
reformat the constructor.
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211