The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.
llvm-svn: 338329
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH
This is another step towards improving select-of-constants codegen (see D48970).
x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.
Differential Revision: https://reviews.llvm.org/D49924
llvm-svn: 338317
Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead.
So its probably safer to leave this alone.
llvm-svn: 338298
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.
Patch by: sameconrad (Sam Conrad)
Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar
Reviewed By: lebedev.ri
Subscribers: efriedma, kparzysz, llvm-commits
Differential Revision: https://reviews.llvm.org/D47681
llvm-svn: 338270
This reapplies commit r338206 reverted by r338214 since the bug that
r338206 uncovered has been fixed in r338268.
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.
llvm-svn: 338269
The DAGCombiner has a mechanism for ensuring all nodes have been visited at least once. Every time a node is visited, it makes sure its operands have been in the worklist at least once. This ensures that when multiple nodes are created by a combine, only the last node needs to be returned. The earlier nodes can all be found Through this operand check. These means we don't need to explicitly add nodes to the worklist when a combine creates multiple nodes.
I've removed the most obvious cases here. There are probably more than can be removed.
llvm-svn: 338222
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.
llvm-svn: 338206
This removes the need for an assert to ensure the pointer isn't null.
Years ago we had ifs the checked the pointer was non-null before very access to the vector. These checks were removed and replaced with a single assert. But a reference seems more suitable here.
llvm-svn: 338205
There was a missing check for if a candidate list was entirely deleted. This
adds that check.
This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll
with the MachineOutliner enabled.
llvm-svn: 338148
This is a follow-up suggested in D48970.
Alive proofs:
https://rise4fun.com/Alive/sII
We can eliminate an instruction in the usual select-of-constants
to bit hack transform by adjusting the add/sub with constant.
This is always a win.
There are more transforms that are likely wins, but they may need
target hooks in case some targets do not benefit.
This is another step towards making up for canonicalizing to
select-of-constants in rL331486.
llvm-svn: 338132
Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load.
This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization.
llvm-svn: 338085
The test failure was caused by the compiler not emitting a __debug_ranges section with DWARF 4 and
earlier when no ranges are needed. The test checks for the existence regardless.
llvm-svn: 338081
The DAGCombiner has a system for ensuring all nodes are visited. It doesn't require an AddToWorkList for every node that is created by a combine.
llvm-svn: 338079
Summary:
The behavior of followCopyChain with a subreg depends on the order in
which subranges appear in a live interval, which is bad.
This commit fixes that, and allows the copy chain to continue only if
all matching subranges that are not undefined take us to the same def.
I don't have a test for this; the reproducer I had on my branch with
various other local changes does not reproduce the problem on upstream
llvm. Also that reproducer was an ll test; attempting to convert it to a
mir test made the subranges appear in a different order and hid the
problem.
However I would argue that the old behavior was obviously wrong
and needs fixing.
Subscribers: MatzeB, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D49535
Change-Id: Iee7936ef305918f3b498ac432e2cf651ae5cc2df
llvm-svn: 338070
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.
The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).
This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.
One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.
Testing:
- check-llvm, and an end-to-end test using lldb to debug an optimized
program.
- Existing unit tests for DIExpression::appendToStack fully cover the
new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)
Differential Revision: https://reviews.llvm.org/D49454
llvm-svn: 338069
When fusing instructions A and B, we must add all predecessors of B as
predecessors of A to avoid instructions getting scheduling in between.
There is a special case involving ExitSU: Every other node must be
scheduled before it by design and we don't need to make this explicit in
the graph, however when fusing with a different node we need to schedule
every othere node before the fused node too and we need to make this
explicit now: This patch adds a dependency from the fused node to all
roots in the graph.
Differential Revision: https://reviews.llvm.org/D49830
llvm-svn: 338046
Summary:
A follow-up for D49266 / rL337166.
At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3Xhttps://rise4fun.com/Alive/pQyhZZ
We won't get to these cases with I1 being -1,
as that will be constant-folded to true or false.
I'm also not sure we actually hit the 'ule' case,
but i think the worst think that could happen is that being dead code.
Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49497
llvm-svn: 338044
Summary:
NVPTX target dos not use register-based frame information. Instead it
relies on the artificial local_depot that is used instead of the frame
and the data for variables must be emitted relatively to this
local_depot.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45963
llvm-svn: 338039
Summary:
For NVPTX target the value of `DW_AT_frame_base` attribute must be set
to `DW_OP_call_frame_cfa`.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45785
llvm-svn: 338036
Previous version of this patch failed on darwin targets because of
different handling of cross-debug-section relocations. This fixes the
tests to emit the DW_AT_str_offsets_base attribute correctly in both
cases. Since doing this is a non-trivial amount of code, and I'm going
to need it in more than one test, I've added a helper function to the
dwarfgen DIE class to do it.
Original commit message follows:
The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.
This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
DwarfStringPool, so I can generate the section header more easily from
the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
generate the DW_AT_str_offsets_base, which links a compile unit to the
offset table.
I've also added a basic test for reading and writing DW_form_strx forms.
Reviewers: dblaikie, JDevlieghere, probinson
Subscribers: llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D49670
llvm-svn: 338031
This reverts commit r337951.
While that kind of shared constant generally works fine in a MinGW
setting, it broke some cases of inline assembly that worked before:
$ cat const-asm.c
int MULH(int a, int b) {
int rt, dummy;
__asm__ (
"imull %3"
:"=d"(rt), "=a"(dummy)
:"a"(a), "rm"(b)
);
return rt;
}
int func(int a) {
return MULH(a, 1);
}
$ clang -target x86_64-win32-gnu -c const-asm.c -O2
const-asm.c:4:9: error: invalid variant '00000001'
"imull %3"
^
<inline asm>:1:15: note: instantiated into assembly here
imull __real@00000001(%rip)
^
A similar error is produced for i686 as well. The same test with a
target of x86_64-win32-msvc or i686-win32-msvc works fine.
llvm-svn: 338018
- Remove unnecessary anchor function
- Remove unnecessary override of getAnalysisUsage
- Use reference instead of pointers where things cannot be nullptr
- Use ArrayRef instead of std::vector where possible
llvm-svn: 337989
- Avoid duplication of regmask size calculation.
- Simplify allocateRegisterMask() call.
- Rename allocateRegisterMask() to allocateRegMask() to be consistent
with naming in MachineOperand.
llvm-svn: 337986
Reuse the handling for llvm.used, and don't transform such globals.
Fixes a failure on the asan buildbot caused by my previous commit.
llvm-svn: 337973
If the DAGCombiner's rotate matching was working as expected,
I don't think we'd see any test diffs here.
This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.
llvm-svn: 337966
Instead of depending on implicit padding from the structure layout code,
use a packed struct and emit the padding explicitly.
Differential Revision: https://reviews.llvm.org/D49710
llvm-svn: 337961
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.
This effectively reverts SVN r335918 by hooking the rest of it up
properly.
This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.
Differential Revision: https://reviews.llvm.org/D49646
llvm-svn: 337951
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.
With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:
fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4
Differential Revision: https://reviews.llvm.org/D49644
llvm-svn: 337950
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.
Reviewed by: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D49805
llvm-svn: 337939
This recommits r337910 after fixing an "ambiguous call to addAttribute"
error with some compilers (gcc circa 4.9 and MSVC). It seems that these
compilers will consider a "false -> pointer" conversion during overload
resolution. This creates ambiguity because one I added an overload which
takes a MCExpr * as an argument.
I fix this by making the new overload take MCExpr&, which avoids the
conversion. It also documents the fact that we expect a valid MCExpr
object.
Original commit message follows:
The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.
This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
DwarfStringPool, so I can generate the section header more easily from
the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
generate the DW_AT_str_offsets_base, which links a compile unit to the
offset table.
I've also added a basic test for reading and writing DW_form_strx forms.
Reviewers: dblaikie, JDevlieghere, probinson
Subscribers: llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D49670
llvm-svn: 337933
This reverts commit r337910 as it's generating "ambiguous call to
addAttribute" errors on some bots.
Will resubmit once I get a chance to look into the problem.
llvm-svn: 337924
Summary:
The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.
This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
DwarfStringPool, so I can generate the section header more easily from
the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
generate the DW_AT_str_offsets_base, which links a compile unit to the
offset table.
I've also added a basic test for reading and writing DW_form_strx forms.
Reviewers: dblaikie, JDevlieghere, probinson
Subscribers: llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D49670
llvm-svn: 337910
Add support for inline assembly with output operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR as in the PR).
llvm-svn: 337903
Summary:
This is a follow-up to r303043. In computeMapping(), we need to disqualify an
InstrMapping if it would be impossible to repair one of the registers in the
instruction to match the mapping.
This change is needed in order to be able to define an instruction
mapping for G_SELECT for the AMDGPU target and will be tested
by test/CodeGen/AMDGPU/GlobalISel/regbankselect-select.mir
Reviewers: ab, qcolombet, t.p.northover, dsanders
Reviewed By: qcolombet
Subscribers: tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D49735
llvm-svn: 337882
Just some gardening here.
Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.
Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.
llvm-svn: 337848
When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location.
Fixes PR34169
Patch by Caroline Tice!
Differential Revision: https://reviews.llvm.org/D49434
llvm-svn: 337847
Before this, TCI contained all the call information for each Candidate.
This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.
A follow-up patch for the AArch64 outliner will demonstrate this.
llvm-svn: 337840
Having the missed remark code in the middle of `findCandidates` made the
function hard to follow. This yanks that out into a new function,
`emitNotOutliningCheaperRemark`.
llvm-svn: 337839
Just some simple gardening to improve clarity.
Before, we had something along the lines of
1) Create a std::vector of Candidates
2) Create an OutlinedFunction
3) Create a std::vector of pointers to Candidates
4) Copy those over to the OutlinedFunction and the Candidate list
Now, OutlinedFunctions create the Candidate pointers. They're still copied
over to the main list of Candidates, but it makes it a bit clearer what's
going on.
llvm-svn: 337838
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
Differential Revision: https://reviews.llvm.org/D45556
Patch by Hsiangkai Wang.
llvm-svn: 337799
This actually has nothing to do with the associative comdat sections
that aren't supported by GNU binutils ld.
Clarify the comments from SVN r335918 and use a separate flag for it.
Differential Revision: https://reviews.llvm.org/D49645
llvm-svn: 337757
RegScavenger::unprocess walks backward, so it should undo the effects
of defs before undoing effects of kills. Previously it did things in
the opposite order, leaving a register apparently unused (dead) in the
case where an instruction both used (killed) and defined a register.
Differential Revision: https://reviews.llvm.org/D42200
llvm-svn: 337735
This is used on an extract vector element index which is most cases is going to be an i32 or i64 and the element will be a valid element number. But it is possible to construct IR with a larger type and large out of range value.
llvm-svn: 337652
Summary:
Each of the four methods had a dozen lines and was doing almost exactly
the same thing: get the appropriate accelerator table kind and insert an
entry into it. I move this common logic to a helper function and make
these methods delegate to it.
This came up in the context of D49493, where I've needed to make adding
a string to a string pool slightly more complicated, and it seemed to
make sense to do it in one place instead of five.
To make this work I've needed to unify the interface of the AccelTable
data types, as some used to store DIE& and others DIE*. I chose to unify
to a reference as that's what the caller uses.
This technically isn't NFC, because it changes the StringPool used for
apple tables in the DWO case (now it uses the main file like DWARF v5
instead of the DWO file). However, that shouldn't matter, as DWO is not
a thing on apple targets (clang frontend simply ignores -gsplit-dwarf).
Reviewers: JDevlieghere, aprantl, probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49542
llvm-svn: 337562
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.
This fixes PR37826.
Reviewers: spatel, davide, efriedma, craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D49388
llvm-svn: 337560
Summary:
This patch makes us generate the debug_names section in response to some
user-facing commands (previously it was only generated if explicitly
selected via the -accel-tables option).
My goal was to make this work for DWARF>=5 (as it's an official part of
that standard), and also, as an extension, for DWARF<5 if one is
explicitly tuning for lldb as a debugger (because it brings a large
performance improvement there).
This is slightly complicated by the fact that the debug_names tables are
incompatible with the DWARF v4 type units (they assume that the type
units are in the debug_info section), and unfortunately, right now we
generate DWARF v4-style type units even for -gdwarf-5. For this reason,
I disable all accelerator tables if the user requested type unit
generation. I do this even for apple tables, as they have the same
problem (in fact generating type units for apple targets makes us crash
even before we get around to emitting the accelerator tables).
Reviewers: JDevlieghere, aprantl, dblaikie, echristo, probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49420
llvm-svn: 337544
Since DWARFv5 rnglists are self descriptive and have distinct encodings
for base-relative (offset_pair) and absolute (start_length) entries,
there's no need to use a base address specifier when describing a lone
address range in a section.
Use that, and improve the test coverage a bit here to include cases like
this and others.
llvm-svn: 337411
Summary:
If unfolding an SUnit results in both load or the operation using it which
already exist in the DAG, abort the unfold if they are already scheduled.
If not, make sure we don't add duplicate dependencies.
This fixes PR37916.
Reviewers: davide, eli.friedman, fhahn, bogner
Subscribers: MatzeB, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48666
llvm-svn: 337409
The presence of these symbols in the symbol table can cause symbol type
mismatch errors (or undefined symbol errors on emulated TLS targets)
and they can't be ICF'd anyway.
llvm-svn: 337338
Summary:
Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange
intervals when updating.
2 changes. The first to ensure that bogus SubRange Segments aren't propagated when
encountering Segments of the form [1234r, 1234d:0) when preparing to merge value
numbers. These can be removed in this case.
The second forces a shrinkToUses call if SubRanges end on the copy index
(instead of just the parent register).
V2: Addressed review comments, plus MIR test instead of ll test
Subscribers: MatzeB, qcolombet, nhaehnle
Differential Revision: https://reviews.llvm.org/D40308
Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05
llvm-svn: 337273
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.
Differential Revision: https://reviews.llvm.org/D49262
llvm-svn: 337258
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html
We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions,
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift"
operation which may also be a single hardware instruction.
Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working
on that (much larger patch), but then I concluded that we don't need it. At least as a first
step, we have all of the backend support necessary to match these ops...because it was required.
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are
likely all that we'll ever need.
There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly
necessary (as the test results here prove). That can be an efficiency improvement as a small
follow-up patch.
So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support.
Differential Revision: https://reviews.llvm.org/D49242
llvm-svn: 337221
trivially rematerializable.
We run into a case where machineLICM hoists a large number of live ranges
outside of a big loop because it thinks those live ranges are trivially
rematerializable. In regalloc, global splitting is tried out first for those
live ranges before they are spilled and rematerialized. Because the global
splitting algorithm is quadratic, increasing a lot of global splitting
candidates causes huge compile time increase (50s to 1400s on my local
machine when compiling a module).
However, we think for live ranges which are very large and are trivially
rematerialiable, it is better to just skip global splitting so as to save
compile time with little chance of sacrificing performance. We uses the
segment size of live range to indirectly evaluate whether the global
splitting of the live range can introduce high cost, and use an option
as a knob to adjust the size limit threshold.
Differential Revision: https://reviews.llvm.org/D49353
llvm-svn: 337186
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]
As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.
But the IR-optimal patter does not lower efficiently, so we want to undo it..
This handles the simple pattern.
There is a second pattern with predicate and constants inverted.
NOTE: we do not check uses here. we always do the transform.
Reviewers: spatel, craig.topper, RKSimon, javed.absar
Reviewed By: spatel
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D49266
llvm-svn: 337166
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.
For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.
```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```
Reviewers: jyknight
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48595
llvm-svn: 337161
For dsymutil we want to store offsets in the accelerator table entries
rather than DIE pointers. In addition, we need a way to communicate
which CU a DIE belongs to. This patch provides support for both of these
issues.
Differential revision: https://reviews.llvm.org/D49102
llvm-svn: 337158
This is almost the same as an existing IR canonicalization in instcombine,
so I'm assuming this is a good early generic DAG combine too.
The motivation comes from reduced bit-hacking for select-of-constants in IR
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html
The PPC and AArch tests show that those targets are already doing something
similar. x86 will be neutral in the minimal case and generally better when
this pattern is extended with other ops as shown in the signbit-shift.ll tests.
Note the asymmetry: we don't include the (extend (ifneg X)) transform because
it already exists in SimplifySelectCC(), and that is verified in the later
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the
general transform to use a shift is always a win because that's a single
instruction.
Alive proofs:
https://rise4fun.com/Alive/ysli
Name: if pos, get -1
%c = icmp sgt i16 %x, -1
%r = sext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = ashr i16 %n, 15
Name: if pos, get 1
%c = icmp sgt i16 %x, -1
%r = zext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = lshr i16 %n, 15
Differential Revision: https://reviews.llvm.org/D48970
llvm-svn: 337130
The MachineOutliner was doing an std::for_each from the call (inserted
before the outlined sequence) to the iterator at the end of the
sequence.
std::for_each needs the iterator past the end, so the last instruction
was not taken into account when propagating the liveness information.
This fixes the machine verifier issue in machine-outliner-disubprogram.ll.
Differential Revision: https://reviews.llvm.org/D49295
llvm-svn: 337090
Spectre variant #1 for x86.
There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.
I've split the design document out of this patch and will land it
separately once I update it to reflect the latest edits and updates to
the Google doc used in the RFC thread.
This patch is really just an initial step. It isn't quite ready for
prime time and is only exposed via debugging flags. It has two major
limitations currently:
1) It only supports x86-64, and only certain ABIs. Many assumptions are
currently hard-coded and need to be factored out of the code here.
2) It doesn't include any options for more fine-grained control, either
of which control flow edges are significant or which loads are
important to be hardened.
3) The code is still quite rough and the testing lighter than I'd like.
However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.
The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.
Many thanks to the *numerous* people who helped along the way here. For
this patch in particular, both Eric and Craig did a ton of review to
even have confidence in it as an early, rough cut at this functionality.
Differential Revision: https://reviews.llvm.org/D44824
llvm-svn: 336990
During the execution of long functions or functions that have a lot of
inlined code it could come to the situation where tracked value could be
transferred from one register to another. The transfer is recognized only if
destination register is a callee saved register and if source register is
killed. We do not salvage caller-saved registers since there is a great
chance that killed register would outlive it.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D44016
llvm-svn: 336978
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
llvm-svn: 336964
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
llvm-svn: 336929
and no use of DW_FORM_rnglistx with the DW_AT_ranges attribute.
Reviewer: aprantl
Differential Revision: https://reviews.llvm.org/D49214
llvm-svn: 336927
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
llvm-svn: 336868
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.
Reviewsers: samparker
Differential Revision: https://reviews.llvm.org/D48624
llvm-svn: 336800
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
llvm-svn: 336779
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
llvm-svn: 336774
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.
llvm-svn: 336656
This is prep for DWARF v5 range list emission. Emission of a single range list is moved
to a static helper function.
Reviewer: jdevlieghere
Differential Revision: https://reviews.llvm.org/D49098
llvm-svn: 336621
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.
I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D40092
llvm-svn: 336589
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.
As discussed later, that was worse at least for the code size,
and potentially for the performance, too.
https://rise4fun.com/Alive/Zmpl
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: spatel
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D48768
llvm-svn: 336585
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.
llvm-svn: 336576
When emitting the DWARF accelerator tables from dsymutil, we don't have
a DwarfDebug instance and we use a custom class to represent Dwarf
compile units. This patch adds an interface AccelTableWriterInfo to
abstract these from the Dwarf5AccelTableWriter, so we can have a custom
implementation for this in dsymutil.
Differential revision: https://reviews.llvm.org/D49031
llvm-svn: 336529
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.
This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.
Differential Revision: https://reviews.llvm.org/D48969
llvm-svn: 336492
As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D48825
llvm-svn: 336490
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.
This improves upon the existing insertReplacementDbgValues API by:
- Updating debug intrinsics in-place, while preventing use-before-def of
the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
common utility code.
Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:
- The no-op conversion. Applies when the values have the same width, or
have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
old one.
Testing:
- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
Debugify.
Differential Revision: https://reviews.llvm.org/D48676
llvm-svn: 336451
D48278
Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.
llvm-svn: 336426
The following code pattern:
mov %rax, %rcx
test %rax, %rax
%rax = ....
je throw_npe
mov(%rcx), %r9
mov(%rax), %r10
gets transformed into the following incorrect code after implicit null check pass:
mov %rax, %rcx
%rax = ....
faulting_load_op("movl (%rax), %r10", throw_npe)
mov(%rcx), %r9
For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization.
Patch by Surya Kumari Jangala!
Differential Revision: https://reviews.llvm.org/D48627
Reviewed By: skatkov
llvm-svn: 336241
Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.
For example, consider the following case where %bb1 and %bb2 are
removed:
bb1:
br i1 undef, label %bb3, label %bb4
bb2:
br i1 undef, label %bb4, label %bb3
bb3:
pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
[ pv1, %bb3 ], [ v0, %other ]
If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.
In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:
bb3:
pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]
The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.
I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.
Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle
Reviewed By: fhahn
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D48369
llvm-svn: 336109
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.
Thanks to @zvi for the original patch.
Differential Revision: https://reviews.llvm.org/D45806
llvm-svn: 336048
This adds functionality to the outliner that allows targets to
specify certain functions that should be outlined from by default.
If a target supports default outlining, then it specifies that in
its TargetOptions. In the case that it does, and the user hasn't
specified that they *never* want to outline, the outliner will
be added to the pass pipeline and will run on those default functions.
This is a preliminary patch for turning the outliner on by default
under -Oz for AArch64.
https://reviews.llvm.org/D48776
llvm-svn: 336040
This is a recommit of r335887, which was erroneously committed earlier.
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.
This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target default
outlining behaviour.
https://reviews.llvm.org/D48682
llvm-svn: 335986
Summary:
.debug_loc section is not supported for NVPTX target. If there is an
object whose location can change during its lifetime, we do not generate
debug location info for this variable.
Reviewers: echristo
Subscribers: jholewinski, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48730
llvm-svn: 335976
This is a recommit of r335879.
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.
This also removes -O0 from the outliner DWARF test.
llvm-svn: 335930
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.
Differential Revision: https://reviews.llvm.org/D48567
llvm-svn: 335918
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.
After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.
https://reviews.llvm.org/D48683
llvm-svn: 335887
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.
llvm-svn: 335886
This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091.
It relies on r335872 since that introduces the machine outliner
flags test. I meant to commit D48683 in that commit, but got mixed
up and committed D48682 instead. So, I'm reverting this and
r335872, since D48682 hasn't made it through review yet.
llvm-svn: 335882
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.
This also updates machine-outliner-flags to reflect the change
and improves the comment describing what that test does.
llvm-svn: 335879
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.
Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.
rdar://41530228
DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.
This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target-default
outlining behaviour.
llvm-svn: 335872
Remove unused ByteStreamer argument from function emitDebugLocValue.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D48590
llvm-svn: 335811
=== Generating the CG Profile ===
The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}
!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```
Differential Revision: https://reviews.llvm.org/D48105
llvm-svn: 335794
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.
Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.
llvm-svn: 335767
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc
can produce -0.0 where the original code does not:
#include <stdio.h>
int main(int argc) {
float x;
x = -0.8 * argc;
printf("%f\n", (float)((int)x));
return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out
0.000000
$ clang -O1 -mavx fp.c ; ./a.out
-0.000000
Ideally, we'd use IR/node flags to predicate the transform, but the IR parser
doesn't currently allow fast-math-flags on the cast instructions. So for now,
just use the function attribute that corresponds to clang's "-fno-signed-zeros"
option.
Differential Revision: https://reviews.llvm.org/D48085
llvm-svn: 335761
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.
This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.
https://bugs.llvm.org/show_bug.cgi?id=37573https://reviews.llvm.org/D47655
llvm-svn: 335758
It is legal for a PHI node not to have a live value in a predecessor
as long as the end of the predecessor is jointly dominated by an undef
value.
llvm-svn: 335607
This removes debug locations from ConstantSDNode and ConstantSDFPNode.
When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html
I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.
Differential Revision: https://reviews.llvm.org/D48468
llvm-svn: 335497
I thought I fixed this in r308673, but that fix was
very broken. The assumption that any frame index can be used
in place of another was more widespread than I realized.
Even when stack slot sharing was disabled, this was still
replacing frame index uses with a different ID with a different
stack slot.
Really fix this by doing the coloring per-stack ID, so all of
the coloring logically done in a separate namespace. This is a lot
simpler than trying to figure out how to change the color if
the stack ID is different.
llvm-svn: 335488
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
%c.0282 = and i32 %c.0282.in, 268435455
%a16 = lshr i64 32508, %x
%a17 = and i64 %a16, 1
%tobool = icmp eq i64 %a17, 0
%. = select i1 %tobool, i32 1, i32 2
%.286 = select i1 %tobool, i32 27, i32 26
%shr97 = lshr i32 %c.0282, %.
%shl98 = shl i32 %c.0282.in, %.286
%or99 = or i32 %shr97, %shl98
%shr100 = lshr i32 %d.0280, %.
%shl101 = shl i32 %d.0280, %.286
%or102 = or i32 %shr100, %shl101
store i32 %or99, i32* %ptr0
store i32 %or102, i32* %ptr1
ret void
}
...but I'm trying to kill the setcc bool math sooner rather than later.
By matching a larger pattern that includes both the low-bit mask and the trailing add/sub,
we can create a universally good fold because we always eliminate the condition code
intermediate value.
Here are Alive proofs for these (currently instcombine folds the 'add' variants, but
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp
Name: sub of zext cmp mask
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%z = zext i1 %c to i32
%r = sub i32 C1, %z
=>
%optional_cast = zext i8 %a to i32
%r = add i32 %optional_cast, C1-1
Name: add of zext cmp mask
%a = and i32 %x, 1
%c = icmp eq i32 %a, 0
%z = zext i1 %c to i8
%r = add i8 %z, C1
=>
%optional_cast = trunc i32 %a to i8
%r = sub i8 C1+1, %optional_cast
All of the tests look like improvements or neutral to me. But it is possible that x86
test+set+bitop is better than what we now show here. I suspect we could do better by
adding another fold for the 'sub' variants.
We start with select-of-constant in IR in the larger motivating test, so that's why I
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1
Name: true const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C2, i64 C1
=>
%z = zext i8 %a to i64
%r = sub i64 C2, %z
Name: false const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C1, i64 C2
=>
%z = zext i8 %a to i64
%r = add i64 C1, %z
Differential Revision: https://reviews.llvm.org/D48466
llvm-svn: 335433
With compilation fix.
Original commit message:
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.
This change does following two things on top:
1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
With that linker will be able to do -gc-sections on dead stack sizes sections.
Differential revision: https://reviews.llvm.org/D46874
llvm-svn: 335336
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.
This change does following two things on top:
1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
With that linker will be able to do -gc-sections on dead stack sizes sections.
Differential revision: https://reviews.llvm.org/D46874
llvm-svn: 335332
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.
llvm-svn: 335320
This patch adds support for generating a call graph profile from Branch Frequency Info.
The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
!llvm.module.flags = !{!0}
!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
Differential Revision: https://reviews.llvm.org/D48105
llvm-svn: 335306
Summary:
GCC and the binutils COFF linker do comdats differently from MSVC.
If we want to be ABI compatible, we have to do what they do, which is to
emit unique section names like ".text$_Z3foov" instead of short section
names like ".text". Otherwise, the binutils linker gets confused and
reports multiple definition errors when two object files from GCC and
Clang containing the same inline function are linked together.
The best description of the issue is probably at
https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to
have a good one in our tracker.
I fixed up the .pdata and .xdata sections needed everywhere other than
32-bit x86. GCC doesn't use associative comdats for those, it appears to
rely on the section name.
Reviewers: smeenai, compnerd, mstorsjo, martell, mati865
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48402
llvm-svn: 335286
Summary:
The logic for handling the sinking of COPY instructions was generating
different code when building with debug flags.
The original code did not take into consideration debug instructions. This
resulted in the registers in the DBG_VALUE instructions being treated as used,
and prevented the COPY from being sunk. This patch avoids analyzing debug
instructions when trying to sink COPY instructions.
This patch also creates a routine from the code in MachineSinking::SinkInstruction to
perform the logic of sinking an instruction along with its debug instructions.
This functionality is used in multiple places, including the code for sinking COPY instrs.
Reviewers: junbuml, javed.absar, MatzeB, bjope
Reviewed By: bjope
Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45637
llvm-svn: 335264
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.
Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.
In turn the same pattern with "or" opcode can reach DAG.
Differential Revision: https://reviews.llvm.org/D48301
llvm-svn: 335250
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.
This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.
Patch by: JesperAntonsson
Reviewers: aprantl, rnk, echristo, javed.absar
Reviewed By: aprantl
Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D48319
llvm-svn: 335214
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.
This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.
Differential Revision: https://reviews.llvm.org/D48029
llvm-svn: 335210
Summary:
Fixes PR36579.
For cases where we had e.g.
DBG_VALUE 42
[...]
DBG_VALUE undef
LiveDebugVariables would discard all undef DBG_VALUEs and then it would
look like the variable had the value 42 throughout the rest of the
function, which is incorrect.
With this patch we don't remove all undef DBG_VALUEs in LiveDebugVariables
so they will be kept after register allocation just like other DBG_VALUEs
which will yield more correct debug information.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: bjope, Ka-Ka, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48277
llvm-svn: 335205
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
- Since Pred is not deleted (BB is), the entry block does not need updating.
- The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
- isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
- eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
- adding some test owner as subscribers for the interesting tests modified:
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/AMDGPU/nested-loop-conditions.ll
test/CodeGen/AMDGPU/si-annotate-cf.ll
test/CodeGen/X86/hoist-spill.ll
test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.
Differential Revision: https://reviews.llvm.org/D48223
llvm-svn: 335167
Summary:
Found some regressions (infinite loop in DAGTypeLegalizer::RemapId)
after r334880. This patch makes sure that we do map a TableId to
itself.
Reviewers: niravd
Reviewed By: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48364
llvm-svn: 335141
Summary:
If we get an error building the SelectionDAG for inline assembly we try to continue and still build the DAG.
But if the return type for the inline assembly is a struct we end up crashing because we try to create an UNDEF node with a struct type which isn't valid.
Instead we need to create an UNDEF for each element of the struct and join them with merge_values.
This patch relies on single operand merge_values being handled gracefully by getMergeValues. If the return type is void there will be no VTs returned by ComputeValueVTs and now we just return instead of calling setValue. Hopefully that's ok, I assumed nothing would need to look up the mapped value for void node.
Fixes PR37359
Reviewers: rengolin, rovka, echristo, efriedma, bogner
Reviewed By: efriedma
Subscribers: craig.topper, llvm-commits
Differential Revision: https://reviews.llvm.org/D46560
llvm-svn: 335093
This patch covers up a fairly fundemental issue around remat and register allocation which shows up with psuedo instructions with more vreg uses than there are physical registers. This patch essentially just disables remat for STATEPOINTs which are the only case we've seen so far, but long term we need a better fix.
For STATEPOINTs specifically, this is a strict improvement. It unblocks progress towards enabling a currently off-by-default mode which integrates deopt bundle operand lowering with register allocator spilling so that we end up with smaller stack sizes and more optimally placed spills. Assming no other issues turn up during my next round of integration testing - which based on experience so far, is admittedly unlikely - we might finally be able to enable something I've been working towards in small bits and pieces for years now. :)
For psuedo ops in general, there are a couple of ideas for a "proper fix" discussed on the bug, but I'm far enough outside my knowledge area to not be able to see any of them through to a successful conclusion. If anyone wants to help out here, please do.
Differential Revision: https://reviews.llvm.org/D41098
llvm-svn: 335077
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.
This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.
llvm-svn: 335076
Summary:
Patch r323922 changed the sigil for physical registers to '$', instead of '%'.
An error message was missed during this change, and reports the wrong sigil.
This patch corrects that diagnostic and the tests that check that error string.
Reviewers: zer0, bjope
Reviewed By: bjope
Subscribers: bjope, thegameg, plotfi, llvm-commits
Differential Revision: https://reviews.llvm.org/D48086
llvm-svn: 335066
Summary:
Add WasmEHFuncInfo and routines to calculate and fill in this struct to
keep track of unwind destination information. This will be used in
other EH related passes.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D48263
llvm-svn: 335005
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47909
llvm-svn: 334996
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nhaehnle
Differential Revision: https://reviews.llvm.org/D48289
llvm-svn: 334984
Relanding after fixing expensive check from modifying tables.
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.
This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.
In some cases this is a 1000x speedup to compilation.
Reviewers: jyknight, echristo, bogner, tra
Reviewed By: bogner
Subscribers: dberris, grandinj, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47959
llvm-svn: 334880
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai
Reviewed By: rampitec, nhaehnle
Subscribers: tpr, nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47918
llvm-svn: 334876
Modify ExpandStrictFPOp(...) to handle nodes that have scalar
operands.
Also, add a Strict FMA test and do some other light cleanup in the
Strict FP code.
Differential Revision: https://reviews.llvm.org/D48149
llvm-svn: 334863
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47954
llvm-svn: 334862
When coalescing a small register into a subregister of a larger register,
if the larger register is rematerialized, the function updateRegDefUses
can add an <undef> flag to the rematerialized definition (since it's
treating it as only definining the coalesced subregister). While with that
assumption doing so is not incorrect, make sure to remove the flag later
on after the call to updateRegDefUses.
llvm-svn: 334845
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: efriedma, wdng, tpr
Differential Revision: https://reviews.llvm.org/D48057
llvm-svn: 334769
This is r334750 (which was reverted in r334754) with a fix for an
uninitialized variable that was caught by msan.
Original commit message:
> If a copy bundle happens to involve overlapping registers, we can end
> up with emitting the copies in an order that ends up clobbering some
> of the subregisters. Since instructions in the copy bundle
> semantically happen at the same time, this is incorrect and we need to
> make sure we order the copies such that this doesn't happen.
llvm-svn: 334756
Summary: A FMF constraint is added to FADD with unsafe still available as the fallback
Reviewers: spatel, wristow, arsenm, hfinkel
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D48180
llvm-svn: 334753
WebAssembly doesn't support more than one function per section
and we rely on function sections being unique. This change ignores
the section provided by the function to avoid two functions being
in the same section.
Without this change the object writer produces the following
error for this test:
LLVM ERROR: section already has a defining function: baz
Differential Revision: https://reviews.llvm.org/D48178
llvm-svn: 334752
If a copy bundle happens to involve overlapping registers, we can end
up with emitting the copies in an order that ends up clobbering some
of the subregisters. Since instructions in the copy bundle
semantically happen at the same time, this is incorrect and we need to
make sure we order the copies such that this doesn't happen.
Differential Revision: https://reviews.llvm.org/D48154
llvm-svn: 334750
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.
This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.
In some cases this is a 1000x speedup to compilation.
Reviewers: jyknight, echristo, bogner, tra
Reviewed By: bogner
Subscribers: dberris, grandinj, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47959
llvm-svn: 334729
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.
The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is
used by MachineCombiner, so I'll file a bug report to follow-up.
llvm-svn: 334608
Add a helper function to expand constrained FP operations as needed.
Note that the Strict POWI operation is not handled in this patch since
the format is slightly different from the others.
Differential Revision: https://reviews.llvm.org/D47491
llvm-svn: 334603
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.
Patch by Chris January
Differential Revision: https://reviews.llvm.org/D47783
llvm-svn: 334523
Apparently, MachineInstr class definition as well as pretty much all of
the machine passes assume that the only kind of MachineInstr's operands
that is variadic for variadic opcodes is explicit non-definitions.
In particular, this assumption is made by MachineInstr::defs(), uses(),
and explicit_uses() methods, as well as by MachineCSE pass.
The assumption is incorrect judging from at least TableGen backend
implementation, that recognizes variable_ops in OutOperandList, and the
very existence of G_UNMERGE_VALUES generic opcode, or ARM load multiple
instructions, all of which have variadic defs.
In particular, MachineCSE pass breaks MIR with CSE'able G_UNMERGE_VALUES
instructions in it.
This commit implements MachineInstr::getNumExplicitDefs() similar to
pre-existing MachineInstr::getNumExplicitOperands(), fixes
MachineInstr::defs(), uses(), and explicit_uses(), and fixes MachineCSE
pass.
As the issue addressed seems to affect only machine passes that could be
ran mid-GlobalISel pipeline at the moment, the other passes aren't fixed
by this commit, like MachineLICM: that could be done on per-pass basis
when (if ever) they get adopted for GlobalISel.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D45640
llvm-svn: 334520
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: nhaehnle, wdng
Differential Revision: https://reviews.llvm.org/D47911
llvm-svn: 334514
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.
Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.
Differential Revision: https://reviews.llvm.org/D47725
llvm-svn: 334497
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.
Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.
Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely changed since the selects
are eliminated. Not sure if it's considered better
or not.
llvm-svn: 334440
Codeview references to unnamed structs and unions are expected to refer to the
complete type definition instead of a forward reference so Visual Studio can
resolve the type properly.
Differential Revision: https://reviews.llvm.org/D32498
llvm-svn: 334382
This patch started off much more general and ambitious, but it's been a nightmare
seeing all the ways x86 vector codegen can go wrong.
So the code is still structured to allow extending easily, but it's currently
limited in several ways:
1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.
The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary
intermediate cast.
There's a clear regression in the last test (sgt_zero_fp_select) because we longer
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit
setcc from a sign-extended operand and remove it.
Differential Revision: https://reviews.llvm.org/D47330
llvm-svn: 334378
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding
llvm-svn: 334343
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D47910
llvm-svn: 334306
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.
Differential Revision: https://reviews.llvm.org/D47878
llvm-svn: 334268
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.
Reviewers: spatel, arsenm, hfinkel, javed.absar
Reviewed By: spatel
Subscribers: nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47388
llvm-svn: 334242
This avoids regressions in a future AMDGPU change
to make v4i16/v4f16 legal. For these types, build_vector
is implemented as bitcasted operations on v2i32. This
combine was creating v4i16s out of what would have been
already been a v2i32 build_vector, creating a mess
of nodes that never get cleaned up.
I'm not sure this is the right condition to check.
I initially tried just checking for the legality of the
new build_vector. This works for my case, but breaks dozens
of x86 tests. A Mips test seems to show some improvement
or at least a neutral change. I don't want to think
about how long it would take to analyze the set of
different x86 vector operations impacted.
Test included in future commit.
llvm-svn: 334218
Summary:
When the branch folder hoist code into a predecessor it adjust live-in's
in the blocks it hoist code from. However it fail to handle hoisted code
that contain a defed register that originally is live-in in the block
through a super register.
This is fixed by replacing the live-in handling code with calls to
utility functions in LivePhysRegs.
Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47529
llvm-svn: 334163
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.
Reviewers: spatel, hfinkel, arsenm
Reviewed By: spatel
Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D47749
llvm-svn: 334113
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.
Differential Revision: https://reviews.llvm.org/D47633
llvm-svn: 334099
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678
We may not have throughput info because it's not specified in the model
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.
Differential Revision: https://reviews.llvm.org/D47723
llvm-svn: 334055
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.
llvm-svn: 334049
Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
Reviewers: spatel, hfinkel
Reviewed By: spatel
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D47389
llvm-svn: 334037
When legalizing illegal FP load results, this was
for some reason dropping the invariant and dereferencable
memory flags. There doesn't seem to be any reason for this,
and the equivalent isn't done for integer loads.
Fixes an issue in a future AMDGPU commit where some identical
loads fail to merge because one of the loads ends up
dropping the flags.
llvm-svn: 334020
RegAlloc keeps a insertion-time ordered map of evictee information,
but we only use membership. Replace MapVector with contextually
equivalent DenseMap which is smaller and faster.
llvm-svn: 333981
Start by emitting remarks for very basic unsupported cases such as
irreducible CFGs and EHFunclets. The end goal is to be able to cover all
the cases where we give up with an explanation.
llvm-svn: 333972
We already output true and false in the printer, but the parser isn't able to
read it.
Differential Revision: https://reviews.llvm.org/D47424
llvm-svn: 333970
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
This is setting up to fix bug 37573 cleanly.
This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.
Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.
Also, as a side-effect, it makes the outliner a bit cleaner.
https://bugs.llvm.org/show_bug.cgi?id=37573
llvm-svn: 333952
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.
Reviewers: efriedma, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46505
llvm-svn: 333943
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47685
llvm-svn: 333939
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.
Target that uses these opcodes are changed in order to ensure their behavior doesn't change.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D47422
llvm-svn: 333748
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.
Fixes PR36719
Differential Revision: https://reviews.llvm.org/D47425
llvm-svn: 333747
Summary:
`getEHScopeMembership()` function is used not only for funclet-based
EHs; they apply to all EH schemes that use the scoped IR
(catchpad/cleanuppad/...). D47005 (rL333045) changed some of the uses of
the term 'funclet' to 'EH scopes' in case they apply to all scoped EH,
and this fixes more of them. For `FuncletLayout` pass, I left it as is
because the pass is only used for funclet-based EH.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47611
llvm-svn: 333711
Summary:
This lowers exception catching-related instructions:
1. Lowers `wasm.catch` intrinsic to `catch` instruction
2. Removes `catchpad` and `cleanuppad` instructions; they are not
necessary after isel phase. (`MachineBasicBlock::isEHFuncletEntry()` or
`MachineBasicBlock::isEHPad()` can be used instead.)
3. Lowers `catchret` and `cleanupret` instructions to pseudo `catchret`
and `cleanupret` instructions in isel, which will be replaced with other
instructions in `WebAssemblyExceptionPrepare` pass.
4. Adds 'WebAssemblyExceptionPrepare` pass, which is for running various
transformation for EH. Currently this pass only replaces `catchret` and
`cleanupret` instructions into appropriate wasm instructions to make
this patch successfully run until the end.
Currently this does not handle lowering of intrinsics related to LSDA
info generation (`wasm.landingpad.index` and `wasm.lsda`), because they
cannot be tested without implementing `EHStreamer`'s wasm-specific
handlers. They are marked as TODO, which is needed to make isel pass.
Also this does not generate `try` and `end_try` markers yet, which will
be handled in later patches.
This patch is based on the first wasm EH proposal.
(https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md)
Reviewers: dschuff, majnemer
Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D44090
llvm-svn: 333705
As noted by Adrian on llvm-commits, PrintHTMLEscaped and PrintEscaped in
StringExtras did not conform to the LLVM coding guidelines. This commit
rectifies that.
llvm-svn: 333669
This patch extends the MCSchedModel API with new methods that can be used to
obtain the latency and reciprocal througput information for an MCInst.
Scheduling models have recently gained the ability to resolve variant scheduling
classes associated with MCInst objects. Before, models were only able to resolve
a variant scheduling class from a MachineInstr object.
This patch is mainly required by D47374 to avoid regressing a pair of x86
specific -print-schedule tests for btver2. Patch D47374 introduces a new variant
class to teach the btver scheduling model (x86 target) how to correctly compute
the latency profile for some zero-idioms using the new scheduling predicates.
The new methods added by this patch would be mainly used by llc when flag
-print-schedule is specified. In particular, tests that contain inline assembly
require that code is parsed at code emission stage into a sequence of MCInst.
That forces the print-schedule functionality to query the latency/rthroughput
information for MCInst instructions too. If we don't expose this new API, then
we lose "-print-schedule" test coverage as soon as variant scheduling classes
are added to the x86 models.
The tablegen SubtargetEmitter changes teaches how to query latency profile
information using a object that derives from TargetSubtargetInfo. Note that this
should really have been part of r333286. To avoid code duplication, the logic
that "resolves" variant scheduling classes for MCInst, has been moved to a
common place in MC. That logic is used by the "resolveVariantSchedClass" methods
redefined in override by the tablegen'd GenSubtargetInfo classes.
Differential Revision: https://reviews.llvm.org/D47536
llvm-svn: 333650
This commit adds a simple verifier that tracks type indices being
touched by legalization rules' builders.
Every target will now have an opportunity to call
LegalizerInfo::verify(...) at the end of its derived LegalizerInfo's
constructor and check there are no obvious mistakes like checking only
first type for an opcode that has more than one type index and therefore
implicitly declaring any type for the second (and higher) type index
legal.
The check is only ran in assert builds and should have very minor
performance impact in assert builds and none in release builds.
This commit does not add LegalizerInfo::verify(...) calls to
target-specific legalizers, look for separate commits for that.
This commit also doesn't make the verification errors fatal, only
produces an error message, look for a later commit that does.
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333576
There seems to be no real reason to have these separate copies.
The existing implementations just copy each other for x86.
For Mips there is a subtle difference, which is just a bug
since it changes based on the context where which one was called.
Dropping this version, all tests pass. If I try to merge them
to match the removed version, a test fails.
llvm-svn: 333440
Summary:
Avoid assert/crash during liveness calculation in situations
where the incoming machine function has statically unreachable BBs.
Second attempt at submitting; this version of the change includes
a revised testcase.
Fixes PR37130.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47372
llvm-svn: 333416
Previously, this pass would look at the (static) set returned by
getCallPreservedMask() and add those back as preserved in the case when
isSafeForNoCSROpt() returns false.
A problem is that a target may have to save some registers even when NoCSROpt
takes place. For instance, on SystemZ, the return register is needed upon
return from a function.
Furthermore, getCallPreservedMask() only includes the registers that the
target actually wishes to emit save/restore instructions for. This means that
subregs and (fully saved) superregs are missing.
This patch instead takes the (dynamic) set returned by target for the
function from determineCalleeSaves() and then adds sub/super regs to build
the set to be used when building the RegMask for the function.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D46315
llvm-svn: 333261
When a GEP with all zero indices is converted to bitcast, its DI wasn't
copied over to the newly created instruction. This patch fixes that bug.
Patch by Kareem Ergawy!
Differential Revision: https://reviews.llvm.org/D47347
llvm-svn: 333235
In addPhysRegDeps, subregister entries of the defined register were previously
not removed from Uses or Defs, which resulted in extra redundant edges for
subregs around the register definition.
This is principally NFC (in very rare cases some node got a different height).
This makes the DAG more readable and efficient in some cases.
Review: Andy Trick
https://reviews.llvm.org/D46838
llvm-svn: 333165
When a bitcast is being sunk in -codegenprepare pass, its DI wasn't
copied over to the newly created instruction. This patch fixes that
bug.
Patch by Kareem Ergawy!
Differential Revision: https://reviews.llvm.org/D47282
llvm-svn: 333133
by replacing DenseMap with IndexedMap for LLTs within MRI, as
benchmarked by cross-compiling sqlite3 amalgamation for AArch64
on x86 machine.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D46809
llvm-svn: 333125
Summary:
There are functions using the term 'funclet' to refer to both
1. an EH scopes, the structure of BBs that starts with
catchpad/cleanuppad and ends with catchret/cleanupret, and
2. a small function that gets outlined in AsmPrinter, which is the
original meaning of 'funclet'.
So far the two have been the same thing; EH scopes are always outlined
in AsmPrinter as funclets at the end of the compilation pipeline. But
now wasm also uses scope-based EH but does not outline those, so we now
need to correctly distinguish those two use cases in functions.
This patch splits `MachineBasicBlock::isFuncletEntry` into
`isFuncletEntry` and `isEHScopeEntry`, and
`MachineFunction::hasFunclets` into `hasFunclets` and `hasEHScopes`, in
order to distinguish the two different use cases. And this also changes
some uses of the term 'funclet' to 'scope' in `getFuncletMembership` and
change the function name to `getEHScopeMembership` because this function
is not about outlined funclets but about EH scope memberships.
This change is in the same vein as D45559.
Reviewers: majnemer, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D47005
llvm-svn: 333045
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.
In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.
To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.
Differential Revision: https://reviews.llvm.org/D47173
llvm-svn: 333015
In DWARF v5, the DWO ID is in the (split/skeleton) CU header, not an
attribute on the CU DIE.
This changes the size of those headers, so use the parsed size whenever
we have one, for simplicitly.
Differential Revision: https://reviews.llvm.org/D47158
llvm-svn: 333004
This is the FP sibling of D43141 with the corresponding IR change in rL327212.
We can't propagate undef here because if a variable operand is a NaN, these
binops must propagate NaN. Neither global nor node-level fast-math makes a
difference. If we have 'nnan', I think later folds can turn the NaN into undef.
The tests in X86/fp-undef.ll are meant to be the definitive verification for
these folds - everything reduces identically now.
The other test changes are collateral damage. They may need to be altered to
preserve their intent.
Differential Revision: https://reviews.llvm.org/D47026
llvm-svn: 332920
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.
Differential Revision: https://reviews.llvm.org/D47156
llvm-svn: 332905
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated.
Differential Revision: https://reviews.llvm.org/D46528
llvm-svn: 332904
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.
This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.
As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.
I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.
Differential Revision: https://reviews.llvm.org/D47116
llvm-svn: 332895
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47050
llvm-svn: 332749
This is a revert of the changes from https://reviews.llvm.org/D46265;
the new test introduced (test/CodeGen/X86/PR37310.mir) causes buildbot
failures.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47061
llvm-svn: 332742
Summary:
Avoid assert/crash during liveness calculation in situations where the
incoming machine function has statically unreachable BBs.
Fixes PR37130.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46265
llvm-svn: 332707
This reapplies commits: r330271, r330592, r330779.
[DEBUG] Initial adaptation of NVPTX target for debug info emission.
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
llvm-svn: 332689
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.
Differential Revision: https://reviews.llvm.org/D46921
llvm-svn: 332685
Previously we emitted 20-byte SHA1 hashes. This is overkill
for identifying debug info records, and has the negative side
effect of making object files bigger and links slower. By
using only the last 8 bytes of a SHA1, we get smaller object
files and ~10% faster links.
This modifies the format of the .debug$H section by adding a new
value for the hash algorithm field, so that the linker will still
work when its object files have an old format.
Differential Revision: https://reviews.llvm.org/D46855
llvm-svn: 332669
Summary:
- Add wasm personality function
- Re-categorize the existing `isFuncletEHPersonality()` function into
two different functions: `isFuncletEHPersonality()` and
`isScopedEHPersonality(). This becomes necessary as wasm EH uses scoped
EH instructions (catchswitch, catchpad/ret, and cleanuppad/ret) but not
outlined funclets.
- Changed some callsites of `isFuncletEHPersonality()` to
`isScopedEHPersonality()` if they are related to scoped EH IR-level
stuff.
Reviewers: majnemer, dschuff, rnk
Subscribers: jfb, sbc100, jgravelle-google, eraman, JDevlieghere, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45559
llvm-svn: 332667
r332057 introduced distance() for ranges. Based on post-commit feedback,
this renames distance() to size(). The new size() is also only enabled
when the operation is O(1).
Differential Revision: https://reviews.llvm.org/D46976
llvm-svn: 332551
As part of merging stores we check that fusing the nodes does not
cause a cycle due to one candidate store being indirectly dependent on
another store (this may happen via chained memory copies). This is
done by searching if a store is a predecessor to another store's
value.
Prune the search at the candidate search's root node which is a
predecessor to all candidate stores. This reduces the
size of the subgraph searched in large basic blocks.
Reviewers: jyknight
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46955
llvm-svn: 332490