In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.
This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.
To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.
Thanks to David Penry for the identifying this problem.
Differential Revision: https://reviews.llvm.org/D69775
With a few things fixed:
- initialisaiton of the optimisation remark pass (this was causing the buildbot
failures on PPC),
- a test case.
Differential Revision: https://reviews.llvm.org/D69660
Continuation of:
D69116
Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559
See also D69099 and D69116
Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.
Patch by: @joanlluch (Joan LLuch)
Differential Revision: https://reviews.llvm.org/D69120
Small refactoring in visitConstrainedFPIntrinsic that should make
it easier to create DAG nodes requiring extra arguments. That is
the case currently only for STRICT_FP_ROUND, but may be the case
for additional nodes (in particular compares) in the future.
Extracted from the patch for D69281.
NFC.
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.
The MachineVerifier earlier only catched the case of a live-in use without
an entry in the live-in list (as "using an undefined physical register").
A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.
Review: Quentin Colombet
https://reviews.llvm.org/D68267
Summary:
For below test case, we will get assert error except for AArch64 and ARM:
declare i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
define i8 @test_v3i8(<3 x i8> %a) nounwind {
%b = call i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
ret i8 %b
}
In the function getShuffleReduction (), we can see it needs the vector size must be power of 2.
This patch is fix below error when the number of element is not power of 2 for those llvm.experimental.vector.reduce.* function.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D68625
We need to be checking the value types for the inner setccs not
the outer setcc. We need to ensure those setccs produce a 0/1
value or that the xor is on the i1 type. I think at the time
this code was originally written, getBooleanContents didn't
take any arguments so this was probably correct. But now we can
have a different boolean contents for integer and floating point.
Not sure why the other combines below the xor were also checking
the boolean contents. None of them involve any setccs other than
the outer one and they only produce a new setcc.
Differential Revision: https://reviews.llvm.org/D69480
If there are debug instructions before the stopping point,
we need to skip over them before checking for begin in order
to avoid having the debug instructions effect behavior.
Fixes PR43758.
Differential Revision: https://reviews.llvm.org/D69606
Summary:
The general Function::hasAddressTaken has two issues that make it
inappropriate for our purposes:
1. it is sensitive to dead constant users (PR43858 / crbug.com/1019970),
leading to different codegen when debu info is enabled
2. it considers direct calls via a function cast to be address escapes
The first is fixable, but the second is not, because IPO clients rely on
this behavior. They assume this function means that all call sites are
analyzable for IPO purposes.
So, implement our own analysis, which gets closer to finding functions
that may be indirect call targets.
Reviewers: ajpaverd, efriedma, hans
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69676
Summary:
Make sure RAGreedy informs LiveDebugVariables about new VRegs
that is introduced at spill by InlineSpiller.
Consider this example
LDV: !"var" [48r;128r):0 Loc0=%2
48B %2 = ...
...
128B %7 = ADD %2, ...
If %2 is spilled the InlineSpiller will insert spill/reload
instructions and introduces some new vregs. So we get
48B %4 = ...
56B spill %4
...
120B reload %5
128B %3 = ADD %5, ...
In the past we did not inform LDV about this, and when reintroducing
DBG_VALUE instruction LDV still got information that "var" had the
location of the spilled register %2 for the interval [48r;128r).
The result was bad, since we mapped "var" to the spill slot even
before the spill happened:
%4 = ...
DBG_VALUE %spill.0, !"var"
spill %4 to %spill.0
...
reload %5
%3 = ADD %5, ...
This patch will inform LDV about the interval split introduced
due to spilling. So the location map in LDV will become
!"var" [48r;56r):1 [56r;120r):0 [120r;128r):2 Loc0=%2 Loc1=%4 Loc2=%5
And when inserting DBG_VALUE instructions we get
%4 = ...
DBG_VALUE %4, !"var"
spill %4 to %spill.0
DBG_VALUE %spill.0, !"var"
...
reload %5
DBG_VALUE %5, !"var"
%3 = ADD %5, ...
Fixes: https://bugs.llvm.org/show_bug.cgi?id=38899
Reviewers: jmorse, vsk, aprantl
Reviewed By: jmorse
Subscribers: dstenb, wuzish, MatzeB, qcolombet, nemanjai, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69584
Move TargetLoweringBase::isSuitableForJumpTable from
llvm/CodeGen/TargetLowering.h to .cpp, to avoid the undefined reference
from all LLVM${Target}ISelLowering.cpp.
Another fix is to add a dependency on TransformUtils to all
lib/Target/$Target/LLVMBuild.txt, but that is too disruptive.
Summary:
If a wrapper around one of the mem* stdlib functions bitcasts the returned
pointer value before returning it (e.g. to a wchar_t*), LLVM does not emit a
tail call.
Add a check for this scenario so that we emit a tail call.
Reviewers: wmi, mkuper, ramred01, dmgreen
Reviewed By: wmi, dmgreen
Subscribers: hiraditya, sanwou01, javed.absar, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59078
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D69622
This reverts commit f5e1b718a6.
PR43855 reports a performance regression with commit ee50590e. This commit
depends on the faulty one, so has to come out too.
This adds a flag to LLVM and clang to always generate a .debug_frame
section, even if other debug information is not being generated. In
situations where .eh_frame would normally be emitted, both .debug_frame
and .eh_frame will be used.
Differential Revision: https://reviews.llvm.org/D67216
Teach the combiner helper how to replace shuffle_vector of scalars
into build_vector.
I am not particularly happy about having to add this combine, but we
currently get those from <1 x iN> from the IR.
Bonus: This fixes an assert in the shuffle_vector combines since before
this patch, we were expecting vector types.
From SelectionDAGs point of view, debug variable locations specified with
dbg.declare and dbg.addr are indirect -- they specify the address of
something. But calling conventions might mean that a Value is placed on
the stack somewhere, and this too is indirection. Previously this was
mixed up in the "IsIndirect" field of DBG_VALUE insts; this patch
separates them by encoding the indirection in a DIExpression.
If we have a dbg.declare or dbg.addr, then the expression produces an
address that then becomes a DWARF memory location. We can represent
this by putting a DW_OP_deref on the _end_ of the expression. If a Value
has been placed on the stack, then we need to put a DW_OP_deref on the
_start_ of the expression, to load the Value from the stack and have
the rest of the expression operate on it.
Differential Revision: https://reviews.llvm.org/D69028
Summary:
This is used on AMDGPU for rounding from v3f64 (which is illegal) to
v3f32 (which is legal).
Subscribers: jvesely, nhaehnle, tpr, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69339
This is a follow-up to D67448.
Split live intervals with multiple dead defs during the initial
execution of the live interval analysis, but do it outside of the
function createAndComputeVirtRegInterval.
Differential Revision: https://reviews.llvm.org/D68666
Extend the describeLoadedValue() with support for target specific ARM and
AArch64 instructions interpretation. The patch provides specialization for
ADD and SUB operations that include a register and an immediate/offset
operand. Some of the instructions can operate with global string addresses
or constant pool indexes but such cases are omitted since we currently lack
flexible support for processing such operands at DWARF production stage.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D67556
Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp
expansions but do not change any default policy for now.
This also fixes a bug in the memcmp expansion itself when large
displacements are needed.
https://reviews.llvm.org/D69507
This patch adds support for deleted C++ special member functions in
clang and llvm. Also added Defaulted member encodings for future
support for defaulted member functions.
Patch by Sourabh Singh Tomar!
Differential Revision: https://reviews.llvm.org/D69215
Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other.
This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization.
Differential Revision: https://reviews.llvm.org/D69219
llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with
EXPENSIVE_CHECKS enabled, causing the patch to be reverted in
rG2c496bb5309c972d59b11f05aee4782ddc087e71.
This patch relands the patch with a proper fix to the
live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the
callee-saves correctly so that the CalleeSaved info is taken from MIR
directly, rather than letting it be recalculated by the PEI pass. I've
done this by running `llc -stop-before=prologepilog` on the LLVM
IR as captured in the test files, adding the extra MOV instructions
that were manually added in the original test file, then running `llc
-run-pass=prologepilog` and finally re-added the comments for the MOV
instructions.
Use the existing helper function in BranchFolding, "countsAsInstruction",
to skip over non-instructions. Otherwise debug instructions can be
identified as the last real instruction in a block, leading to different
codegen decisions when debug is enabled as demonstrated by the test case.
Patch by: yechunliang (Chris Ye)!
Differential Revision: https://reviews.llvm.org/D66467
Summary:
Fixes some things from original commit at https://reviews.llvm.org/D69136. The main
change is that the heap alloc marker is always stored as ExtraInfo in the machine
instruction instead of in the PointerSumType because it cannot hold more than
4 pointer types.
Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Use this marker to track heap alloc site call instructions.
Reviewers: rnk
Subscribers: MatzeB, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69536
Summary:
(Split of off D67120)
SizeOpts/MachineSizeOpts changes for profile guided size optimization.
(A second try after previously committed as r375254 and reverted as r375375.)
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69409
Emit a remarks section by default for the following formats:
* bitstream
* yaml-strtab
while still providing -remarks-section=<bool> to override the defaults.
I want to add the ability to rerun the outliner in certain cases, and I
thought this could be an NFC change that could make a subsequent change
that allows for rerunning the outliner a cleaner diff.
Differential Revision: https://reviews.llvm.org/D69482
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
In the Pre-RA machine sinker, previously we were relying on all DBG_VALUEs
being immediately after the instruction that defined their operands. This
isn't a valid assumption, as a variable location change doesn't
necessarily correspond to where the value is computed. In this patch, we
collect DBG_VALUEs that might need sinking as we walk through a block,
and sink all of them if their defining instruction is sunk.
This patch adds some copy propagation too, so that if we sink a copy inst,
the now non-dominated paths can use the copy source for the variable
location.
Differential Revision: https://reviews.llvm.org/D58386
This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53)
to handle the looser "any_extend" cast in addition to zext.
This is a prerequisite step for canonicalizing in the other direction
(narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
When we sink DBG_VALUEs between blocks, we simply move the DBG_VALUE
instruction to below the sunk instruction. However, we should also mark
the variable as being undef at the original location, to terminate any
earlier variable location. This patch does that -- plus, if the
instruction being sunk is a copy, it attempts to propagate the copy
through the DBG_VALUE, replacing the destination with the source.
Differential Revision: https://reviews.llvm.org/D58238
We would previously have no soft-float softening for cbrt, so could hit
a crash failing to select. This fills in what appears to be missing.
Differential Revision: https://reviews.llvm.org/D69345
Similar to:
rG4c47617627fb
This makes the DAG behavior consistent with IR's insertelement.
https://bugs.llvm.org/show_bug.cgi?id=42689
I've tried to maintain test intent for AArch64 and WebAssembly
by replacing undef index operands with something else.
If the target's preferred shift amount VT can't hold any shift
amount for the promoted VT, we should use i32. The specific shift
amount shouldn't matter. The type will be adjusted later when the
shift itself is type legalized. This avoids an assert in getNode.
Fixes PR43820.
This combine is only valid if the inner setcc produces a 0/1 result
or the inner type is MVT::i1.
I haven't seen this cause any issues, just happened to notice it
while reviewing combines in this function.
While there also fix another call to use the value type from the
SDValue for the operand instead of calling SDNode::getValueType(0).
Though its likely the use is result 0, its not guaranteed.
This makes the DAG behavior consistent with IR's extractelement after:
rGb32e4664a715
https://bugs.llvm.org/show_bug.cgi?id=42689
I've tried to maintain test intent for WebAssembly.
The AMDGPU test is trying to test for crashing or other bad behavior,
but I'm not sure if that's possible after this change.
zext (ctpop X) --> ctpop (zext X)
This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that.
The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this:
t5: i8 = ctpop t4
t6: i32 = zero_extend t5 <-- created based on IR, but unused node?
t7: i64 = zero_extend t5
Differential Revision: https://reviews.llvm.org/D69127
Summary:
Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Also undo the workaround in r375137.
Reviewers: rnk
Subscribers: MatzeB, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69136
Summary:
Ternary expression checks for ISD::ADD instead of ISD::UADDO inside DAGTypeLegalizer::ExpandIntRes_UADDSUBO.
This means the ternary expression will evaluate to ISD::SUBCARRY for both ISD::UADDO and ISD::USUBO nodes.
Targets are likely to implement both, so impact will be very limited in practice.
Reviewers: bogner, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68123
This broke various Windows builds, see comments on the Phabricator
review.
This also reverts the follow-up 20bf0cf.
> Summary:
> This fold, helps recover from the rest of the D62266 ARM regressions.
> https://rise4fun.com/Alive/TvpC
>
> Note that while the fold is quite flexible, i've restricted it
> to the single interesting pattern at the moment.
>
> Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
>
> Reviewed By: deadalnix
>
> Subscribers: javed.absar, kristof.beyls, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D62450
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
Summary:
The default implementation of the describeLoadedValue() hook uses the
MoveImm property to determine if an instruction moves an immediate. If
an instruction has that property the function returns the second
operand, assuming that that is the immediate value the instruction
moves. As far as I can tell, the MoveImm property does not imply that
the second operand is the immediate value, nor that any other operand
necessarily holds the immediate value; it just means that the
instruction moves some immediate value.
One example where the second operand is not the immediate is SystemZ's
LZER instruction, which moves a zero immediate implicitly: $f0S = LZER.
That case triggered an out-of-bound assertion when getting the operand.
I have added a test case for that instruction.
Another example is ARM's MVN instruction, which holds the logical
bitwise NOT'd value of the immediate that is moved. For the following
reproducer:
extern void foo(int);
int main() { foo(-11); }
an incorrect call site value would be emitted:
$ clang --target=arm foo.c -O1 -g -Xclang -femit-debug-entry-values \
-c -o - | ./build/bin/llvm-dwarfdump - | \
grep -A2 call_site_parameter
0x00000058: DW_TAG_GNU_call_site_parameter
DW_AT_location (DW_OP_reg0 R0)
DW_AT_GNU_call_site_value (DW_OP_lit10)
Another example is the A2_combineii instruction on Hexagon which moves
two immediates to a super-register: $d0 = A2_combineii 20, 10.
Perhaps these are rare exceptions, and most MoveImm instructions hold
the immediate in the second operand, but in my opinion the default
implementation of the hook should only describe values that it can, by
some contract, guarantee are safe to describe, rather than leaving it up
to the targets to override the exceptions, as that can silently result
in incorrect call site values.
This patch adds X86's relevant move immediate instructions to the
target's hook implementation, so this commit should be a NFC for that
target. We need to do the same for ARM and AArch64.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: vsk
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69109
We should do the fold only if both constants are plain,
non-opaque constants, at least that is the DAG.FoldConstantArithmetic()
requirement.
And if the constant we are comparing with is zero - we shouldn't be
trying to do this fold in the first place.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43769
Summary:
This fold, helps recover from the rest of the D62266 ARM regressions.
https://rise4fun.com/Alive/TvpC
Note that while the fold is quite flexible, i've restricted it
to the single interesting pattern at the moment.
Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
Reviewed By: deadalnix
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62450
MachineRegisterInfo::createGenericVirtualRegister sets
RegClassOrRegBank to static_cast<RegisterBank *>(nullptr).
MIParser on the other hand doesn't. When we attempt to constrain
Register Class on such VReg, additional COPY is generated.
This way we avoid COPY instructions showing in test that have MIR
input while they are not present with llvm-ir input that was used
to create given MIR for a -run-pass test.
Differential Revision: https://reviews.llvm.org/D68946
llvm-svn: 375502
Teach the CombinerHelper how to turn shuffle_vectors, that
concatenate vectors, into concat_vectors and add this combine
to the AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69149
llvm-svn: 375452
Commit message from D66935:
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*`
Changes after D66935:
Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return
the uninitialized CalleeSavedStackSize when running `llc` on a specific
pass where the MIR code has already been expected to have gone through PEI.
Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try
to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug
mode, the compiler will assert the recalculated size equals the cached
size as calculated through a call to determineCalleeSaves.
This fixes two tests:
test/DebugInfo/AArch64/asan-stack-vars.mir
test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir
that otherwise fail when compiled using msan.
Reviewed By: omjavaid, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68783
llvm-svn: 375425
Provides a TLI hook to allow targets to relax the emission of shifts, thus enabling
codegen improvements on targets with no multiple shift instructions and cheap selects
or branches.
Contributes to a Fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559
Patch by: @joanlluch (Joan LLuch)
Differential Revision: https://reviews.llvm.org/D69116
llvm-svn: 375347
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
llvm-svn: 375311
If a subregister def was moved across another subregister def and
another use, the main range was not correctly updated. The end point
of the moved interval ended too early and missed the use from theh
other lanes in the subreg def.
llvm-svn: 375300
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
llvm-svn: 375222
The default promotion for the add_sat/sub_sat nodes currently does:
ANY_EXTEND iN to iM
SHL by M-N
[US][ADD|SUB]SAT
L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.
Differential Revision: https://reviews.llvm.org/D68926
llvm-svn: 375211
There's no need to have more than one of these (there can be two
DwarfFiles - one for the .o, one for the .dwo - but only one loc/loclist
section (either in the .o or the .dwo) & certainly one per
DebugLocStream, which is currently singular in DwarfDebug)
llvm-svn: 375183
Summary:
In the long run we should come up with another mechanism for marking
call instructions as heap allocation sites, and remove this workaround.
For now, we've had two bug reports about this, so let's apply this
workaround. SLH (the other client of instruction labels) probably has
the same bug, but the solution there is more likely to be to mark the
call instruction as not duplicatable, which doesn't work for debug info.
Reviewers: akhuang
Subscribers: aprantl, hiraditya, aganea, chandlerc, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69068
llvm-svn: 375137
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.
After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68992
llvm-svn: 375086
Add generic DAG combine for extending masked loads.
Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.
Differential Revision: https://reviews.llvm.org/D68337
llvm-svn: 375085
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Teach the combiner helper how to flatten concat_vectors of build_vectors
into a build_vector.
Add this combine as part of AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69071
llvm-svn: 375066
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
This adds the initial plumbing to support optimisation remarks in
the IR hardware-loop pass.
I have left a todo in a comment where we can improve the reporting,
and will iterate on that now that we have this initial support in.
Differential Revision: https://reviews.llvm.org/D68579
llvm-svn: 374980
In LiveDebugVariables.cpp:
Prior to this patch, UserValues were grouped into linked list chains. Each
chain was the union of two sets: { A: Matching Source variable } or
{ B: Matching virtual register }. A ptr to the heads (or 'leaders')
of each of these chains were kept in a map with the { Source variable } used
as the key (set A predicate) and another with { Virtual register } as key
(set B predicate).
There was a search through the chains in the function getUserValue looking for
UserValues with matching { Source variable, Complex expression, Inlined-at
location }. Essentially searching for a subset of A through two interleaved
linked lists of set A and B. Importantly, by design, the subset will only
contain one or zero elements here. That is to say a UserValue can be uniquely
identified by the tuple { Source variable, Complex expression, Inlined-at
location } if it exists.
This patch removes the linked list and instead uses a DenseMap to map
the tuple { Source variable, Complex expression, Inlined-at location }
to UserValue ptrs so that the getUserValue search predicate is this map key.
The virtual register map now maps a vreg to a SmallVector<UserVal *> so that
set B is still available for quick searches.
Reviewers: aprantl, probinson, vsk, dblaikie
Reviewed By: aprantl
Subscribers: russell.gallop, gbedwell, bjope, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D68816
llvm-svn: 374979
Similar to r374970, but I don't have a test for this.
PromoteTargetBoolean is intended to be use for legalizing an
operand that needs to be promoted. It picks its type based on
the return from getSetccResultType and is intended to be used
when we have freedom to pick the new type. But the return type
we need for WidenVecOp_SETCC is completely determined by the
type of the input node.
llvm-svn: 374972
PromoteTargetBoolean calls getSetccResultType to get the return
type. But we were passing it the setcc result type rather than the
setcc input type. This causes an issue on X86 with avx512vl where
the setcc result type for vXf16 vectors is vXi16 while the
result type for vXi16 vectors is vXi1.
There's really no guarantee that getSetccResultType is the type
we need here. So now we just grab the extend type from
getExtendForContent and extend to the original result VT of the
node we're splitting.
llvm-svn: 374970
Examples:
i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1
This is a small generalization of a fold requested in PR43650:
https://bugs.llvm.org/show_bug.cgi?id=43650
The sign-bit of the condition operand can be used as a mask for the true operand:
https://rise4fun.com/Alive/paT
Note that we already handle some of the patterns (isNegative + scalar) because
there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd().
It doesn't use any TLI hooks, so I can't easily rip out that code even though we're
duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(),
so it should not cause problems for targets that prefer select over shift.
Also worth noting: I thought we could generalize this further to include the case where
the true operand of the select is not constant, but Alive says that may allow poison to
pass through where it does not in the original select form of the code.
Differential Revision: https://reviews.llvm.org/D68949
llvm-svn: 374902
Summary:
Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.
At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.
As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.
In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: aprantl
Subscribers: jyknight, fedor.sergeev, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67492
llvm-svn: 374881
Summary:
DWARF's DW_OP_entry_value operation has two operands; the first is a
ULEB128 operand that specifies the size of the second operand, which is
a DWARF block. This means that we need to be able to pre-calculate and
emit the size of DWARF expressions before emitting them. There is
currently no interface for doing this in DwarfExpression, so this patch
introduces that.
When implementing this I initially thought about running through
DwarfExpression's emission two times; first with a temporary buffer to
emit the expression, in order to being able to calculate the size of
that emitted data. However, DwarfExpression is a quite complex state
machine, so I decided against that, as it seemed like the two runs could
get out of sync, resulting in incorrect size operands. Therefore I have
implemented this in a way that we only have to run DwarfExpression once.
The idea is to emit DWARF to a temporary buffer, for which it is
possible to query the size. The data in the temporary buffer can then be
emitted to DwarfExpression's main output.
In the case of DIEDwarfExpression, a temporary DIE is used. The values
are all allocated using the same BumpPtrAllocator as for all other DIEs,
and the values are then transferred to the real value list. In the case
of DebugLocDwarfExpression, the temporary buffer is implemented using a
BufferByteStreamer which emits to a buffer in the DwarfExpression
object.
Reviewers: aprantl, vsk, NikolaPrica, djtodoro
Reviewed By: aprantl
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67768
llvm-svn: 374879
This patch kills off a significant user of the "IsIndirect" field of
DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is
techncally redundant as it can be expressed by the DIExpression of a
DBG_VALUE inst, and it isn't helpful to have two ways of expressing
things.
Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref
to the insts DIExpression. There should now be no appearences of
IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter,
which is ensured by an assertion in LDVImpl::handleDebugValue. This means
we also get to delete the IsIndirect handling in LiveDebugVariables. Tests
can be upgraded by for example swapping the following IsIndirect=True
DBG_VALUE:
DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo)
With one where the indirection is in the DIExpression, by _appending_
a deref:
DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref)
Which both mean the same thing.
Most of the test changes in this patch are updates of that form; also some
changes in how the textual assembly printer handles these insts.
Differential Revision: https://reviews.llvm.org/D68945
llvm-svn: 374877
This changes the 32-element SmallVector to a std::vector. When building
a RelWithDebInfo clang-8 binary, the average size of the vector was
~10000, so it does not seem very beneficial or practical to use a small
vector for that.
The DWARFBytes SmallVector grows in the same way as Comments, so perhaps
that also should be changed to a purely dynamically allocated structure,
but that requires some more code changes, so I let that remain as a
SmallVector for now.
llvm-svn: 374871
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
Summary:
This addresses a bug in collectCallSiteParameters() where call site
immediates would be truncated from int64_t to unsigned.
This fixes PR43525.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: aprantl
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D68869
llvm-svn: 374770
Add an extra parameter so the backend can take the alignment into
consideration.
Differential Revision: https://reviews.llvm.org/D68400
llvm-svn: 374763
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst.
llvm-svn: 374716
Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists.
Differential Revision: https://reviews.llvm.org/D68620
llvm-svn: 374600
The exciting code is actually already enough to handle the splitting
of vector arguments but we were lacking a test case.
This commit adds a test case for vector argument lowering involving
splitting and enable the related support in call lowering.
llvm-svn: 374589
Teach buildMerge how to deal with scalar to vector kind of requests.
Prior to this patch, buildMerge would issue either a G_MERGE_VALUES
when all the vregs are scalars or a G_CONCAT_VECTORS when the destination
vreg is a vector.
G_CONCAT_VECTORS was actually not the proper instruction when the source
vregs were scalars and the compiler would assert that the sources must
be vectors. Instead we want is to issue a G_BUILD_VECTOR when we are
in this situation.
This patch fixes that.
llvm-svn: 374588
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.
llvm-svn: 374555
In GISel we have both G_CONSTANT and G_FCONSTANT, but because
in GISel we don't really have a concept of Float vs Int value
the only difference between the two is where the data originates
from.
What both G_CONSTANT and G_FCONSTANT return is just a bag of bits
with the constant representation in it.
By making getConstantVRegVal() return G_FCONSTANTs bit representation
as well we allow ConstantFold and other things to operate with
G_FCONSTANT.
Adding tests that show ConstantFolding to work on mixed G_CONSTANT
and G_FCONSTANT sources.
Differential Revision: https://reviews.llvm.org/D68739
llvm-svn: 374458
This reverses the scalar canonicalization proposed in D63382.
Pre: isPowerOf2(C1)
%r = select i1 %cond, i32 C1, i32 0
=>
%z = zext i1 %cond to i32
%r = shl i32 %z, log2(C1)
https://rise4fun.com/Alive/Z50
x86 already tries to fold this pattern, but it isn't done
uniformly, so we still see a diff. AArch64 probably should
enable the TLI hook to benefit too, but that's a follow-on.
llvm-svn: 374397
The default promotion for the add_sat/sub_sat nodes currently does:
1. ANY_EXTEND iN to iM
2. SHL by M-N
3. [US][ADD|SUB]SAT
4. L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.
Differential Revision: https://reviews.llvm.org/D68643
llvm-svn: 374373
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 374340
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.
This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.
Differential revision: https://reviews.llvm.org/D67350
llvm-svn: 374301
Summary:
Visual Studio doesn't like it while stepping. It kicks you out of the
source view of the file being stepped through and tries to fall back to
the disassembly view.
Fixes PR43530
The fix is incomplete, because it's possible to have a basic block with
no source locations at all. In this case, we don't emit a .cv_loc, but
that will result in wrong stepping behavior in the debugger if the
layout predecessor of the location-less BB has an unrelated source
location. We could try harder to find a valid location that dominates or
post-dominates the current BB, but in general it's a dataflow problem,
and one still might not exist. I left a FIXME about this.
As an alternative, we might want to consider having the middle-end check
if its emitting codeview and get it to stop using line zero.
Reviewers: akhuang
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68747
llvm-svn: 374267
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.
As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.
This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.
Differential Revision: https://reviews.llvm.org/D68419
llvm-svn: 374261
Add own version of the mathematical constants from the upcoming C++20 `std::numbers`.
Differential revision: https://reviews.llvm.org/D68257
llvm-svn: 374207
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 374085
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.
Reviewers: aprantl, vsk, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D66955
llvm-svn: 374068
* Adds a TypeSize struct to represent the known minimum size of a type
along with a flag to indicate that the runtime size is a integer multiple
of that size
* Converts existing size query functions from Type.h and DataLayout.h to
return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
to uint64_t) so that most existing code 'just works' as if the return
values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
supported instructions used with scalable vectors can be constructed
in IR.
Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen
Reviewed By: rovka, sdesmalen
Differential Revision: https://reviews.llvm.org/D53137
llvm-svn: 374042
Summary:
When getValueInMiddleOfBlock happens to be called for a basic block
that has no incoming value at all, an IMPLICIT_DEF is inserted in that
block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at
the top of its basic block or it will likely not reach the use that
the caller intends to insert.
Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204
Reviewers: arsenm, rampitec
Subscribers: jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68183
llvm-svn: 374040
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.
Patch by Dwight Guth <dwight.guth@runtimeverification.com>!
Reviewed By: lebedev.ri, paquette, rnk
Differential Revision: https://reviews.llvm.org/D67855
llvm-svn: 373976
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937