Summary:
If an input DICompileUnit is completely empty (e.g., the result of
running "clang -g" on an empty file), we don't bother emitting an empty
DWARF CU. When we do that, we must make sure we don't also emit a DWARF v5
name index, as DWARF specifies that each index must reference at least
one compilation unit.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45435
llvm-svn: 329575
Summary:
We were emitting accelerator entries for functions with no name, which
is contrary to the DWARF v5 spec: "All other (i.e., *not*
DW_TAG_namespace) debugging information entries without a DW_AT_name
attribute are excluded." Besides that, a name table entry with an empty
string as a key is fairly useless.
We can sometimes end up with functions which have a DW_AT_linkage_name but no
DW_AT_name. One such example is the global-constructor-initialization functions,
which C++ compilers synthesize for each compilation unit with global
constructors.
A very strict reading of the DWARF v5 spec would suggest that we should not even
emit the accelerator entry for the linkage name in this case, but I don't think
we should go that far.
I found this when running the dwarf verifier over llvm codebase compiled
with DWARF v5 accelerator tables.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: vleschuk, clayborg, echristo, probinson, llvm-commits
Differential Revision: https://reviews.llvm.org/D45367
llvm-svn: 329552
Recommitting r329283, third time lucky...
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 329551
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.
Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi
Subscribers: rengolin, javed.absar, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44812
llvm-svn: 329542
The TargetSchedModel is always initialized using the TargetSubtargetInfo's
MCSchedModel and TargetInstrInfo, so we don't need to extract those and
pass 3 parameters to init().
Differential Revision: https://reviews.llvm.org/D44789
llvm-svn: 329540
In our real world application, we found the following optimization is missed in DAGCombiner
(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))
If the user of original zext is an add, it may enable further lea optimization on x86.
This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.
Differential Revision: https://reviews.llvm.org/D44402
llvm-svn: 329516
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch mostly by myeisha (pmb).
llvm-svn: 329494
Summary:
The 'strong' StackProtector heuristic takes into consideration call instructions.
Certain intrinsics, such as lifetime.start, can cause the
StackProtector to protect functions that do not need to be protected.
Specifically, a volatile variable, (not optimized away), but belonging to a stack
allocation will encourage a llvm.lifetime.start to be inserted during
compilation. Because that intrinsic is a 'call' the strong StackProtector
will see that the alloca'd variable is being passed to a call instruction, and
insert a stack protector. In this case the intrinsic isn't really lowered to a
call. This can cause unnecessary stack checking, at the cost of additional
(wasted) CPU cycles.
In the future we should rely on TargetTransformInfo::isLoweredToCall, but as of
now that routine considers all intrinsics as not being lowerable. That needs
to be corrected, and such a change is on my list of things to get moving on.
As a side note, the updated stack-protector-dbginfo.ll test always seems to
pass. I never see the dbg.declare/dbg.value reaching the
StackProtector::HasAddressTaken, but I don't see any code excluding dbg
intrinsic calls either, so I think it's the safest thing to do.
Reviewers: void, timshen
Reviewed By: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45331
llvm-svn: 329450
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: bogner, rnk, MatzeB, RKSimon
Reviewed By: rnk
Subscribers: JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45133
llvm-svn: 329435
MFI.LocalFrameSize was not serialized.
It is usually set from LocalStackSlotAllocation, so if that pass doesn't
run it is impossible do deduce it from the stack objects. Until now, this
information was lost.
llvm-svn: 329382
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch by myeisha (pmb).
llvm-svn: 329287
The MachineOutliner has a bunch of target hooks that will call llvm_unreachable
if the target doesn't implement them. Therefore, if you enable the outliner on
such a target, it'll just crash. It'd be much better if it'd just *not* run
the outliner at all in this case.
This commit adds a hook to TargetInstrInfo that returns false by default.
Targets that implement the hook make it return true. The outliner checks the
return value of this hook to decide whether or not to continue.
llvm-svn: 329220
Some compilers do not like having an enum type and a variable with the
same name (AccelTableKind). I rename the variable to TheAccelTableKind.
Suggestions for a better name welcome.
llvm-svn: 329202
- MSVC was not OK with a static_assert referencing a non-static member
variable, even though it was just in a sizeof(expression). I move the
assert into the emit function, where it is probably more useful.
- Tests were failing in builds which did not have the X86 target
configured. Since this functionality is not target-specific, I have
removed the target specifiers from the .ll files.
llvm-svn: 329201
Summary:
This patch adds a DwarfAccelTableEmitter class, which generates an
accelerator table, as specified in DWARF v5 standard. At the moment it
only generates a DIE offset column and (if we are indexing more than one
compile unit) a CU column.
Indexing type units is not currently supported, as we don't even have
the ability to generate DWARF v5-compatible compile units.
The implementation is not data-source agnostic like the one generating
apple tables. This was not necessary as we currently only have one user
of this code, and without a second user it was not obvious to me how to
best abstract this. (The difference between these tables and the apple
ones is that they need a lot more metadata about the debug info they are
indexing).
The generation is triggered by the --accel-tables argument, which
supersedes the --dwarf-accel-tables arg -- the latter was a simple
on-off switch, but not we can choose between two kinds of accelerator
tables we can generate.
This is tested by parsing the generated tables with llvm-dwarfdump and
the DWARFVerifier, and I've also checked that GNU readelf is able to
make sense of the tables.
Differential Revision: https://reviews.llvm.org/D43286
llvm-svn: 329179
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.
Original commit message:
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 329160
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.
This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
llvm-svn: 329116
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
When running dsymutil as part of your build system, it can be desirable
for warnings to be part of the end product, rather than just being
emitted to the output stream. This patch upstreams that functionality.
Differential revision: https://reviews.llvm.org/D44639
llvm-svn: 328965
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.
Differential Revision: https://reviews.llvm.org/D44909
llvm-svn: 328921
Summary:
Tail duplication easily breaks the structure of CFG, e.g. duplicating on
a region entry. If the structure is intended to be preserved, then we
may want to configure tail duplication, or disable it for structured
CFG. From our benchmark results disabling it doesn't cause performance
regression.
Notice that this currently affects AMDGPU backend. In the next patch, I
also plan to turn on requiresStructuredCFG for NVPTX.
All unit tests still pass.
Reviewers: jlebar, arsenm
Subscribers: jholewinski, sanjoy, wdng, tpr, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45008
llvm-svn: 328884
The code has bugs dealing with -0.0.
Since D44550 introduced FABS pattern folding in InstCombine,
this patch removes the now-redundant code that causes
https://bugs.llvm.org/show_bug.cgi?id=36600.
Patch by Mikhail Dvoretckii!
Differential Revision: https://reviews.llvm.org/D44683
llvm-svn: 328872
MachineCopyPropagation::CopyPropagateBlock has a bunch of special
handling for COPY instructions. This handling assumes that COPY
instructions do not modify the source of the copy; this is wrong if
the COPY destination overlaps the source.
To fix the bug, check explicitly for this situation, and fall back to
the generic instruction handling.
This bug can't happen for most register classes because they don't
have this sort of overlap, but there are a few register classes
where this is possible. The testcase uses the AArch64 QQQQ register
class.
Differential Revision: https://reviews.llvm.org/D44911
llvm-svn: 328851
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files. This makes the line table more
independent of the .debug_info section.
We emit the new syntax only for DWARF v5 and later.
Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows. Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.
Differential Revision: https://reviews.llvm.org/D44054
llvm-svn: 328805
Summary: Mark CFG is preserved since this pass do not make any change in CFG.
Reviewers: sebpop, mzolotukhin, mcrosier
Reviewed By: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44845
llvm-svn: 328727
This reverts commit r328676.
Commit r328676 broke the -no-integrated-as flag necessary to build Linux kernel with Clang:
$ cat t.c
void foo() {}
$ clang -no-integrated-as -c t.c -g
/tmp/t-dcdec5.s: Assembler messages:
/tmp/t-dcdec5.s:8: Error: file number less than one
clang-7.0: error: assembler command failed with exit code 1 (use -v to see invocation)
llvm-svn: 328699
Summary:
RegisterCoalescer::removePartialRedundancy tries to hoist B = A from
BB0/BB2 to BB1:
BB1:
...
BB0/BB2: ----
B = A; |
... |
A = B; |
|-------
|
It does so if a number of conditions are fulfilled. However, it failed
to check if B was used by any of the terminators in BB1. Since we must
insert B = A before the terminators (since it's not a terminator itself),
this means that we could erroneously insert a new definition of B before a
use of it.
Reviewers: wmi, qcolombet
Reviewed By: wmi
Subscribers: MatzeB, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D44918
llvm-svn: 328689
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files. This makes the line table more
independent of the .debug_info section.
Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows. Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.
Differential Revision: https://reviews.llvm.org/D44054
llvm-svn: 328676
If a given split type unit does not have source locations, don't have
it refer to the split line table.
If no split type unit refers to the split line table, don't emit the
line table at all.
This will save a little space on rare occasions, but also refactors
things a bit to improve which class is responsible for what.
Responding to review comments on r326395.
Differential Revision: https://reviews.llvm.org/D44220
llvm-svn: 328670
Summary:
Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing"
broke -print-machineinstrs for us on AMDGPU, because we have custom
pseudo source values, and MIR serialization does not implement that.
This commit at least restores the functionality of -print-machineinstrs,
even if it does not properly implement the missing MIR serialization
functionality.
Differential Revision: https://reviews.llvm.org/D44871
Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66
llvm-svn: 328668
Summary: When a node is about to be erased from ReplacedValues, we should also remap its corresponding values in PromotedFloats.
Patch by Yan Luo (Yan.Luo2@synopsys.com)
Reviewers: pirama
Reviewed By: pirama
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D44872
llvm-svn: 328644
First, we change the heuristic that is used to ignore the recurrent
node-sets in the node ordering. In certain cases it's not important
to focus on the recurrent node-sets. Instead, the algorithm begins
by considering all the instructions in the node ordering step.
Second, a minor change to the bottom up traversal, which needs to
consider loop carried dependences (modeled as anti dependences).
Previously, these instructions were skipped, which caused problems
because the instruction ends up having both predecessors and
sucessors in the schedule.
Third, consider anti-dependences as a tie breaker when choosing
between instructions in the node ordering. We want to make sure
that the source of the anti-dependence does not end up with both
predecesssors and sucessors in the final node ordering.
Patch by Brendon Cahoon.
llvm-svn: 328554
The pipeliner must add a loop carried dependence between two memory
operations if the base register is not an affine (linear) exression.
The current implementation doesn't check how the base register is
defined, which allows non-affine expressions, and then the pipeliner
does not add a loop carried dependence when one is needed.
This patch adds code to isLoopCarriedOrder that checks if the base
register of the memory operations is defined by a phi, and the loop
definition for the phi is a constant increment value. This is a very
simple check for a linear expression.
Patch by Brendon Cahoon.
llvm-svn: 328550
The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.
The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.
The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.
Patch by Brendon Cahoon.
llvm-svn: 328547
The phi renaming code in the pipeliner uses the wrong value when
rewriting phi uses, which results in an undefined value. In this
case, the original phi is no longer needed due to the order of
instruction in the pipelined loop. The pipeliner was assuming, in
this case, the the phi loop definition should be used to
rewrite the uses. However, the pipeliner needs to check to make
sure that the loop definition has already been scheduled. If not,
then the phi initial value needs to be used instead.
Patch by Brendon Cahoon.
llvm-svn: 328545
The pipeliner was generating too many phis in the epilog blocks, which
caused incorrect code generation when rewriting an instruction that uses
the phi.
In this case, there 3 prolog and epilog stages. An existing phi was
scheduled at stage 1. When generating the code for the 2nd epilog an
extra new phi was generated.
To fix this, we need to update the code that calculates the maximum
number of phis that can be generated, which is based upon the current
prolog stage and the stage of the original phi. In this case, when the
prolog stage is 1 and the original phi stage is 1, the maximum number
of phis to generate is 2.
Patch by Brendon Cahoon.
llvm-svn: 328543
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.
The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.
The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).
The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.
The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound.
Patch by Brendon Cahoon.
llvm-svn: 328542
The pipeliner is asserting because the serialization step that
occurs at the end is deleting an instruction. The assert
occurs later on because there is a use without a definition.
The problem occurs when an instruction defines a value used
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet. The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.
There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.
This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.
Patch by Brendon Cahoon.
llvm-svn: 328540
The pipeliner changes dependences between base+offset instructions
(loads and stores) so that the instructions have more flexibility
to be scheduled with respect to each other. This occurs when the
pipeliner is able to compute that the instructions will not alias
if their order is changed. The prevous code enforced the alias
property by checking if the base register is the same, and that the
offset values are either both positive or negative.
This patch improves the alias check by using the API
areMemAccessesTriviallyDisjoint instead. This enables more cases,
especially if the offset is a negative value. The pipeliner uses
the function by creating a new instruction with the offset used
in the next iteration.
Patch by Brendon Cahoon.
llvm-svn: 328538
A schedule may require that a phi from the original loop is used in
multiple iterations in the scheduled loop. When this occurs, we generate
multiple phis in the pipelined loop to save the value across iterations.
When we generate the new phis and update the register names in the
pipelined loop, the pipeliner attempts to reuse a previously generated
phi, when possible. The calculation for the name of the new phi needs
to account for the version/iteration of the original phi. Also, in the
epilog, the code only needs to check backwards for a previous iteration
until reaching the first prolog block.
Patch by Brendon Cahoon.
llvm-svn: 328537
The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.
Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.
Patch by Brendon Cahoon.
llvm-svn: 328516
When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.
Patch by Brendon Cahoon.
llvm-svn: 328510
The software pipeliner attempts to delete dead instructions after
generating the pipelined loop. The code looks for uses of each
instruction. Physical registers should be treated differently because
the use chains do not exist. The code that checks for dead
instructions should assume that definitions of physical registers
are used if the operand doesn't contain the dead flag.
Patch by Brendon Cahoon.
llvm-svn: 328509
The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.
This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.
Patch by Brendon Cahoon.
llvm-svn: 328508
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.
Differential Revision: https://reviews.llvm.org/D40196
llvm-svn: 328326
Summary:
Some targets does not support labels inside debug sections, but support
references in form `section+offset`. Patch adds initial support
for this.
Reviewers: echristo, probinson, jlebar
Subscribers: llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D43943
llvm-svn: 328314
In our real world application, we found the following optimization is missed in DAGCombiner
(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))
If the user of original zext is an add, it may enable further lea optimization on x86.
This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.
Differential Revision: https://reviews.llvm.org/D44402
llvm-svn: 328252
Split up some of the if/else branches in runOnModule. Elaborate on some
comments. Replace a call to getOrCreateMachineFunction with getMachineFunction.
This makes it clearer what's happening in runOnModule, and ensures that the
outliner doesn't create any MachineFunctions which will never be used by the
outliner (or anything else, really).
llvm-svn: 328240
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated). This avoids executing the
the copy on paths where their results aren't needed. This also exposes
additional opportunites for dead copy elimination and shrink wrapping.
These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..
For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.
```
%bb.0:
%wzr = SUBSWri %w1, 1
%w19 = COPY %w0
Bcc 11, %bb.2
%bb.1:
Live Ins: %w19
BL @fun
%w0 = ADDWrr %w0, %w19
RET %w0
%bb.2:
%w0 = COPY %wzr
RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.
With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64.
Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz
Reviewed By: sebpop
Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41463
llvm-svn: 328237
As in SystemZ backend, correctly propagate node ids when inserting new
unselected nodes into the DAG during instruction Seleciton for X86
target.
Fixes PR36865.
Reviewers: jyknight, craig.topper
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44797
llvm-svn: 328233
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
The pipeliner needs to remove instructions from the SlotIndexes
structure when they are deleted. Otherwise, the SlotIndexes map
has stale data, and an assert will occur when adding new
instructions.
This patch also changes the pipeliner to make the back-edge of
a loop carried dependence 1 cycle. The 1 cycle latency is added
to the anti-dependence that represents the back-edge. This
changes eliminates a couple of hacks added to the pipeliner to
handle the latency of the back-edge. It is needed to correctly
pipeline the test case for the sub-register elimination pass.
llvm-svn: 328113
Summary:
When building the selection DAG we sometimes need to postpone
the handling of a dbg.value until the value it should refer to
is created. This is done by using the DanglingDebugInfoMap.
In the past this map has been limited to hold one dangling
dbg.value per value. This patch removes that restriction.
Reviewers: aprantl, rnk, probinson, vsk
Reviewed By: aprantl
Subscribers: Ka-Ka, llvm-commits, JDevlieghere
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D44610
llvm-svn: 328084
I'm not entirely sure these hacks are still needed. If you remove the hacks completely, the name of the library call that gets generated doesn't match the grep the test previously had. So the test wasn't really checking anything.
If the hack is still needed it belongs in PPC specific code. I believe the FP_TO_SINT code here is the only place in the tree where a FP_ROUND_INREG node is created today. And I don't think its even being used correctly because the legalization returned a BUILD_PAIR with the same value twice. That doesn't seem right to me. By moving the code entirely to PPC we can avoid creating the FP_ROUND_INREG at all.
I replaced the grep in the existing test with full checks generated by hacking update_llc_test_check.py to support ppc32 just long enough to generate it.
Differential Revision: https://reviews.llvm.org/D44061
llvm-svn: 328017
Summary:
Currently X-Ray Instrumentation pass has a dependency on MachineLoopInfo
(and thus on MachineDominatorTree as well) and we have to compute them
even if X-Ray is not used. This patch changes it to a lazy computation
to save compile time by avoiding these redundant computations.
Reviewers: dberris, kubamracek
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44666
llvm-svn: 327999
Summary:
Added a flag -no-dwarf-pub-sections, which allows to disable
emission of DWARF public sections.
Reviewers: probinson, echristo
Subscribers: aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D44385
llvm-svn: 327994
Summary:
Made PHI node simplifiations more robust in several ways:
- Minor refactoring to let the SimplificationTracker own the
sets with new PHI/Select nodes that are introduced. This is
maybe not mapping to the original intention with the
SimplificationTracker, but IMHO it encapsulates the logic behind
those sets a little bit better.
- MatchPhiNode can sometimes populate the Matched set with
several entries, where it maps one PHI node to different candidates
for replacement. The Matched set is changed into a SmallSetVector
to make sure we get a deterministic iteration when doing
the replacements.
- As described above we may get several different replacements
for a single PHI node. The loop in MatchPhiSet that is doing
the replacements could end up calling eraseFromParent several
times for the same PHI node, resulting in segmentation faults.
This problem was supposed to be fixed in rL327250, but due to
the non-determinism(?) it only appeared to be fixed (I still
got crashes sometime when turning on/off -print-after-all etc
to get different iteration order in the DenseSets).
With this patch we follow the deterministic ordering in the
Matched set when replacing the PHI nodes. If we find a new
replacement for an already replaced PHI node we replace the
new replacement by the old replacement instead. This is quite
similar to what happened in the rl327250 patch, but here we
also recursively verify that the old replacement hasn't been
replaced already.
- It was really hard to track down the fault described above
(segementation fault due to doing eraseFromParent multiple
times for the same instruction). The fault was intermittent and
small changes in the code, or simply turning on -print-after-all
etc could make the problem go away. This was basically due to
the iteration over PhiNodesToMatch in MatchPhiSet no being
deterministic. Therefore I've changed the data structure for
the SimplificationTracker::AllPhiNodes into an SmallSetVector.
This gives a deterministic behavior.
Reviewers: skatkov, john.brawn
Reviewed By: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44571
llvm-svn: 327961
When scanning the function for CSRs uses and defs, also check if
the basic block are landing pads.
Consider that landing pads needs the CSRs to be properly set.
That way we force the prologue/epilogue to always be pushed out
of the problematic "throw" region. The "throw" region is
problematic because the jumps are not properly modeled.
Fixes PR36513
llvm-svn: 327942
Summary:
DbgValue nodes were not transferred when integer DAG nodes were promoted. For example, if an i32 add node was promoted to an i64 add node by DAGTypeLegalizer::PromoteIntegerResult(), its DbgValue node was not transferred to the new node. The simple fix is to update SetPromotedInteger() to transfer DbgValues.
Add AArch64/dbg-value-i8.ll to test this change and fix ARM/debug-info-d16-reg.ll which had the wrong DILocalVariable nodes with arg numbers even though they are not for function parameters.
Patch by Se Jong Oh!
Reviewers: vsk, JDevlieghere, aprantl
Reviewed By: JDevlieghere
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D44546
llvm-svn: 327919
Summary:
This patch prevents DBG_VALUE instructions from influencing
LivePhysRegs::stepBackwards and stepForwards. In at least one case,
specifically branch folding, the stepBackwards logic was having an
influence on code generation. The result was that certain code
compiled with '-g -O2' would differ from that compiled with '-O2'
alone. It seems that the original logic, accounting for DBG_VALUE,
was influencing the placement of an IMPLICIT_DEF which had a later
impact on how blocks were processed in branch folding.
Reviewers: kparzysz, MatzeB
Reviewed By: kparzysz
Subscribers: bjope, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D43850
llvm-svn: 327862
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.
Differential Revision: https://reviews.llvm.org/D40196
llvm-svn: 327856
Now that almost all functionality of Apple's dsymutil has been
upstreamed, the open source variant can be used as a drop in
replacement. Hence we feel it's no longer necessary to have the llvm
prefix.
Differential revision: https://reviews.llvm.org/D44527
llvm-svn: 327790