This assert fires when attempting to extract a subregister from the
global PIC base register. This virtual register SD node is not in the
VRBaseMap, so we shouldn't call getVR to look it up there. If this is a
RegisterSDNode, we should be able to use the virtual register directly.
Fixes PR38385
llvm-svn: 339056
for all the uses from the same def is done.
We run into a compile time problem with flex generated code combined with
`-fno-jump-tables`. The cause is that machineLICM hoists a lot of invariants
outside of a big loop, and drastically increases the compile time in global
register splitting and copy coalescing. https://reviews.llvm.org/D49353
relieves the problem in global splitting. This patch is to handle the problem
in copy coalescing.
About the situation where the problem in copy coalescing happens. After
machineLICM, we have several defs outside of a big loop with hundreds or
thousands of uses inside the loop. Rematerialization in copy coalescing
happens for each use and everytime rematerialization is done, shrinkToUses
will be called to update the huge live interval. Because we have 'n' uses
for a def, and each live interval update will have at least 'n' complexity,
the total update work is n^2.
To fix the problem, we try to do the live interval update work in a collective
way. If a def has many copylike uses larger than a threshold, each time
rematerialization is done for one of those uses, we won't do the live interval
update in time but delay that work until rematerialization for all those uses
are completed, so we only have to do the live interval update work once.
Delaying the live interval update could potentially change the copy coalescing
result, so we hope to limit that change to those defs with many
(like above a hundred) copylike uses, and the cutoff can be adjusted by the
option -mllvm -late-remat-update-threshold=xxx.
Differential Revision: https://reviews.llvm.org/D49519
llvm-svn: 339035
In the past, DbgInfoIntrinsic has a strong assumption that these
intrinsics all have variables and expressions attached to them.
However, it is too strong to derive the class for other debug entities.
Now, it has problems for debug labels.
In order to make DbgInfoIntrinsic as a base class for 'debug info', I
create a class for 'variable debug info', DbgVariableIntrinsic.
DbgDeclareInst, DbgAddrIntrinsic, and DbgValueInst will be derived from it.
Differential Revision: https://reviews.llvm.org/D50220
llvm-svn: 338984
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.
Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.
llvm-svn: 338910
First step towards a BuildSDIV equivalent to D49248 for non-uniform vector support - this just pushes the splat detection down into TargetLowering::BuildSDIV where its still used.
Differential Revision: https://reviews.llvm.org/D50185
llvm-svn: 338838
At least on ELF, it's impossible to tell from the object file whether
two globals with the same section marking were merged: the merged global
uses "private" linkage to hide its symbol, and the aliases look like
regular symbols. I can't think of any other reason to disallow it.
(Of course, we can only merge globals in the same section.)
The weird alignment handling matches AsmPrinter; our alignment handling
for global variables should probably be refactored.
Differential Revision: https://reviews.llvm.org/D49822
llvm-svn: 338791
In expansion of FCOPYSIGN, the shift node is missing when the two
operands of FCOPYSIGN are of the same size. We should always generate
shift node (if the required shift bit is not zero) to put the sign
bit into the right position, regardless of the size of underlying
types.
Differential Revision: https://reviews.llvm.org/D49973
llvm-svn: 338665
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.
Differential Revision: https://reviews.llvm.org/D43860
llvm-svn: 338655
Summary:
Added an option that allows to emit only '.loc' and '.file' kind debug
directives, but disables emission of the DWARF sections. Required for
NVPTX target to support profiling. It requires '.loc' and '.file'
directives, but does not require any DWARF sections for the profiler.
Reviewers: probinson, echristo, dblaikie
Subscribers: aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46021
llvm-svn: 338616
The bug is visible in the constant-folded x86 tests. We can't use the
negated shift amount when the type is not power-of-2:
https://rise4fun.com/Alive/US1r
...so in that case, use the regular lowering that includes a select
to guard against a shift-by-bitwidth. This path is improved by only
calculating the modulo shift amount once now.
Also, improve the rotate (with power-of-2 size) lowering to use
a negate rather than subtract from bitwidth. This improves the
codegen whether we have a rotate instruction or not (although
we can still see that we're not matching to a legal rotate in
all cases).
llvm-svn: 338592
There is nothing x86-specific about this code, so it'd be nice to make this available for other targets to use in the future (and get it out of X86ISelLowering!).
Differential Revision: https://reviews.llvm.org/D50083
llvm-svn: 338586
Getting the DWARF types section is only implemented for ELF object
files. We already disabled emitting debug types in clang (r337717), but
now we also report an fatal error (rather than crashing) when trying to
obtain this section in MC. Additionally we ignore the generate debug
types flag for unsupported target triples.
See PR38190 for more information.
Differential revision: https://reviews.llvm.org/D50057
llvm-svn: 338527
This revision implements support for generating DWARFv5 .debug_addr section.
The implementation is pretty straight-forward: we just check the dwarf version
and emit section header if needed.
Reviewers: aprantl, dblaikie, probinson
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D50005
llvm-svn: 338487
Previously we were just visiting the blocks in the function in IR order, which
is rather arbitrary. Therefore we wouldn't always visit defs before uses, but
the translation code relies on this assumption in some places.
Only codegen change seen in tests is an elision of a redundant copy.
Fixes PR38396
llvm-svn: 338476
Call shouldOutlineFromFunctionByDefault, isFunctionSafeToOutlineFrom,
getOutliningType, and getMachineOutlinerMBBFlags using the correct
TargetInstrInfo. And don't create a MachineFunction for a function
declaration.
The call to getOutliningCandidateInfo is still a little weird, but at
least the weirdness is explicitly called out.
Differential Revision: https://reviews.llvm.org/D49880
llvm-svn: 338465
Correct the address space for the inserted argument
stack slot.
AMDGPU seems to not do anything with this information,
so I don't think this was breaking anything.
llvm-svn: 338428
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 338390
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.
llvm-svn: 338329
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH
This is another step towards improving select-of-constants codegen (see D48970).
x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.
Differential Revision: https://reviews.llvm.org/D49924
llvm-svn: 338317
Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead.
So its probably safer to leave this alone.
llvm-svn: 338298
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.
Patch by: sameconrad (Sam Conrad)
Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar
Reviewed By: lebedev.ri
Subscribers: efriedma, kparzysz, llvm-commits
Differential Revision: https://reviews.llvm.org/D47681
llvm-svn: 338270
This reapplies commit r338206 reverted by r338214 since the bug that
r338206 uncovered has been fixed in r338268.
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.
llvm-svn: 338269
The DAGCombiner has a mechanism for ensuring all nodes have been visited at least once. Every time a node is visited, it makes sure its operands have been in the worklist at least once. This ensures that when multiple nodes are created by a combine, only the last node needs to be returned. The earlier nodes can all be found Through this operand check. These means we don't need to explicitly add nodes to the worklist when a combine creates multiple nodes.
I've removed the most obvious cases here. There are probably more than can be removed.
llvm-svn: 338222
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.
llvm-svn: 338206
This removes the need for an assert to ensure the pointer isn't null.
Years ago we had ifs the checked the pointer was non-null before very access to the vector. These checks were removed and replaced with a single assert. But a reference seems more suitable here.
llvm-svn: 338205
There was a missing check for if a candidate list was entirely deleted. This
adds that check.
This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll
with the MachineOutliner enabled.
llvm-svn: 338148
This is a follow-up suggested in D48970.
Alive proofs:
https://rise4fun.com/Alive/sII
We can eliminate an instruction in the usual select-of-constants
to bit hack transform by adjusting the add/sub with constant.
This is always a win.
There are more transforms that are likely wins, but they may need
target hooks in case some targets do not benefit.
This is another step towards making up for canonicalizing to
select-of-constants in rL331486.
llvm-svn: 338132
Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load.
This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization.
llvm-svn: 338085