This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261070
This apparently comes up when the register allocator decides that a
variable will become undef along a certain path.
Also improve the error message we emit when we can't map from LLVM
register number to CV register number.
llvm-svn: 261016
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
This is an updated version which fixes a bug that happened with
uses tied to an earlyclobber operand which end at an unusual slotindex.
If two definitions write to independent subregisters then they can be
put in any order. LiveIntervalAnalysis::handleMove() did not support
this previously because it looks like moving a definition of a vreg past
another one.
This is a modified version of a patch proposed (two years ago) by
Vincent Lejeune! This version does not touch the read-undef flags and is
extended for the case of moving a subregister def behind all uses - this
can happen for subregister defs that are completely unused.
Differential Revision: http://reviews.llvm.org/D9067
llvm-svn: 260906
This function was basically useless, since volatile memacesses or MIs with
unmodelled sideffects become global memory objects, and the other little
checks are also done elsewhere.
Reviewed by Andy Trick
http://reviews.llvm.org/D16881
llvm-svn: 260899
This requirement was a huge hack to keep LiveVariables alive because it
was optionally used by TwoAddressInstructionPass and PHIElimination.
However we have AnalysisUsage::addUsedIfAvailable() which we can use in
those passes.
llvm-svn: 260806
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.
fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2. This prevents selection of
native f16 conversion instructions from f32 or f64. Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.
Reviewers: ab
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D17221
llvm-svn: 260769
The code change is simple enough: instead of attaching an anonymous SDLoc to splatted
vector constants, use the scalar constant's existing SDLoc since that is what is passed
into getConstant() as a param. But this changes instruction scheduling, so I'll explain
why that happens.
The motivation for this patch starts near:
http://reviews.llvm.org/rL258833
...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'.
But when I made that change locally, several x86 codegen tests wiggled.
It turns out that the lack of SDLoc consistency in getConstant() changes the way
ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG
scheduler algorithms use IROrder for tie-breaking.
Differential Revision: http://reviews.llvm.org/D16972
llvm-svn: 260582
Rather than storing type units in a vector and emitting them at the end
of code generation, emit them immediately and destroy them, reclaiming the
memory we were using for their DIEs.
In one benchmark carried out against Chromium's 50 largest (by bitcode
file size) translation units, total peak memory consumption with type units
decreased by median 17%, or by 7% when compared against disabling type units.
Tested using check-{llvm,clang}, the GDB 7.5 test suite (with
'-fdebug-types-section') and by eyeballing llvm-dwarfdump output on those
Chromium translation units with split DWARF both disabled and enabled, and
verifying that the only changes were to addresses and abbreviation ordering.
Differential Revision: http://reviews.llvm.org/D17118
llvm-svn: 260578
If there were wrapper functions with no instructions of their own in the
inlining tree, we would fail to emit InlineSite records for them.
llvm-svn: 260571
The rational for this change is that LLVMBuild cannot express conditional
dependencies. Therefore, when we start optionally using GlobalISel library for
say AArch64, without that change, all the tools that use the AArch64 library
would need to explicitly link with GlobalISel when we ask for it.
This does not scale.
Instead, we will set the dependencies between the target and GlobalISel and if
we did not ask to build GlobalISel, the library will just be empty.
Thanks to Chris Bieneman and Mehdi Animi for the idea.
llvm-svn: 260566
If two definitions write to independent subregisters then they can be
put in any order. LiveIntervalAnalysis::handleMove() did not support
this previously because it looks like moving a definition of a vreg past
another one.
This is a modified version of a patch proposed (two years ago) by
Vincent Lejeune! This version does not touch the read-undef flags and is
extended for the case of moving a subregister def behind all uses - this
can happen for subregister defs that are completely unused.
Differential Revision: http://reviews.llvm.org/D9067
llvm-svn: 260565
We actually need that information only for generic instructions, therefore it
would be nice not to have to pay the extra memory consumption for all
instructions. Especially because a typed non-generic instruction does not make
sense.
The question is then, is it possible to have that information in a union or
something?
My initial thought was that we could have a derived class GenericMachineInstr
with additional information, but in practice it makes little to no sense since
generic MachineInstrs are likely turned into non-generic ones by just switching
the opcode. In other words, we don't want to go through the process of creating
a new, non-generic MachineInstr, object each time we do this switch. The memory
benefit probably is not worth the extra compile time.
Another option would be to keep the type of the MachineInstr in a side table.
This would induce an extra indirection though.
Anyway, I will file a PR to discuss about it and remember we need to come back
to it at some point.
llvm-svn: 260558
If a class has hidden visibility all derived classes and all classes
that have it as a member must have hidden visibility too. That may
be fixable here but requires changes to quite a lot of debug info
classes.
This is also one of the things that GCC enforces aggressively while
clang ignores it, making testing more annoying than necessary.
llvm-svn: 260529
For now, generic virtual registers will not have a register class. We may want
to change that. For instance, if we want to use all the methods from
TargetRegisterInfo with generic virtual registers, we need to either have some
sort of generic register classes that do what we want, or teach those methods
how to deal with nullptr register class.
Although the latter seems easy enough to do, we may still want to differenciate
generic register classes from nullptr to catch cases where nullptr gets
introduced by a bug of some sort.
Anyway, I will file a PR to keep track of that.
llvm-svn: 260474
Summary:
Refactor common value, scope, and label tracking logic out of DwarfDebug
into a common base class called DebugHandlerBase.
Update an old LLVM IR test case to avoid an assertion in LexicalScopes.
Reviewers: dblaikie, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16931
llvm-svn: 260432
I reinvented this functionality in http://reviews.llvm.org/D16828 because it was
hidden away as a static function. The changes in x86 are not based on a complete
audit. I suspect there are other possible uses there, and there are almost certainly
more potential users in other targets.
llvm-svn: 260295
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
llvm-svn: 260109
LiveRangeEdit::eliminateDeadDef is used to remove dead define instructions
after rematerialization. To remove a VNI for a vreg from its LiveInterval,
LiveIntervals::removeVRegDefAt is used. However, after non-PHI VNIs are all
removed, PHI VNI are still left in the LiveInterval. Such unused vregs will
be kept in RegsToSpill[] at the end of InlineSpiller::reMaterializeAll and
spiller will allocate stackslot for them.
The fix is to get rid of unused reg by checking whether it has non-dbg
reference instead of whether it has non-empty interval.
llvm-svn: 259895
Only single and double FP immediates are correctly printed by
MachineInstr::print() during debug output. Half float type goes to
APFloat::convertToDouble() and hits assertion it is not a double
semantics. This diff prints half machine operands correctly.
This cannot currently be hit by any in-tree target.
Patch by Stanislav Mekhanoshin
llvm-svn: 259857
This patch corresponds to review:
http://reviews.llvm.org/D16847
There are some files in glibc that use the output operand modifier even though
it was deprecated in GCC. This patch just adds support for it to prevent issues
with such files.
llvm-svn: 259798
This patch implements softening of long double type (ppcf128) on ppc32
architecture and enables operations for this type for soft float.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D15811
llvm-svn: 259791
This is mostly about having shorter lines and standardizing on one
interface, but it also avoids some needless indirection.
No functional change.
llvm-svn: 259697
Summary:
In r257979, I added code to ensure that we wouldn't merge DebugLocEntries if
the pieces they describe overlap. Unfortunately, I failed to cover the case,
where there may have multiple active Expressions in the entry, in which case we
need to make sure that no two values overlap before we can perform the merge.
This fixed PR26148.
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D16742
llvm-svn: 259696
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.
The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.
The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.
The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.
llvm-svn: 259691
Recommited, after some fixing with test cases.
Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll
Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259673
The register coalescer can rematerialize constants that define
more of a register than the copy it is going to replace was going
to do.
This is valid in the case the register was undef before the
copy happened.
This patch makes sure that all the subranges defined by the new
rematerialization instructions have at least a dead def.
Review: http://reviews.llvm.org/D16693
llvm-svn: 259614
CodeView requires us to accurately describe the extent of the inlined
code. We did this by grabbing the next debug location in source order
and using *that* to denote where we stopped inlining. However, this is
not sufficient or correct in instances where there is no next debug
location or the next debug location belongs to the start of another
function.
To get this correct, use the end symbol of the function to denote the
last possible place the inlining could have stopped at.
llvm-svn: 259548
This directive emits the binary annotations that describe line and code
deltas in inlined call sites. Single-stepping through inlined frames in
windbg now works.
llvm-svn: 259535
When rematerializing a computation by replacing the copy, use the copy's
location. The location of the copy is more representative of the
original program.
This partially fixes PR10003.
llvm-svn: 259469
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
Changed emitting offset of macinfo entry into compiler unit DIE to use "addSectionLabel" method rather than explicitly calculating size/offset of macro entry.
Differential Revision: http://reviews.llvm.org/D16292
llvm-svn: 259358
Those commits created an artificial edge from a cleanup to a synthesized
catchswitch in order to get the MSVC personality routine to execute
cleanups which don't cleanupret and are not wrapped by a catchswitch.
This worked well enough but is not a complete solution in situations
where there the cleanup infinite loops.
However, the real deal breaker behind this approach comes about from a
degenerate case where the cleanup is post-dominated by unreachable *and*
throws an exception. This ends poorly because the catchswitch will
inadvertently catch the exception.
Because of this we should go back to our previous behavior of not
executing certain cleanups (identical behavior with the Itanium ABI
implementation in clang, GCC and ICC).
N.B. I think this could be salvaged by making the catchpad rethrow the
exception and properly transforming throwing calls in the cleanup into
invokes.
llvm-svn: 259338
Summary:
There are three parts to inlined call frames:
1. The inlinee line subsection
2. The inline site symbol record
3. The function ids referenced by both
This change starts by emitting function ids (3) for all subprograms and
emitting the base inline site symbol record (2). The actual line numbers
in (2) use an encoded format that will come next, along with the inlinee
line subsection.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16333
llvm-svn: 259217
The buildSchedGraph() was in need of reworking as the AA features had been
added on top of earlier code. It was very difficult to understand, and buggy.
There had been found cases where scheduling dependencies had actually been
missed (see r228686).
AliasChain, RejectMemNodes, adjustChainDeps() and iterateChainSucc() have
been removed. There are instead now just the four maps from Value to SUs, which
have been renamed to Stores, Loads, NonAliasStores and NonAliasLoads.
An unknown store used to become the AliasChain, but now becomes a store mapped
to 'unknownValue' (in Stores). What used to be PendingLoads is instead the
list of SUs mapped to 'unknownValue' in Loads.
RejectMemNodes and adjustChainDeps() used to be a safety-net for everything.
The SU maps were sometimes cleared and SUs were put in RejectMemNodes, where
adjustChainDeps() would look. Instead of this, a more straight forward approach
is used in maintaining the SU maps without clearing them and simply letting
them grow over time. Instead of the cutt-off in adjustChainDeps() search, a
reduction of maps will be done if needed (see below).
Each SUnit either becomes the BarrierChain, or is put into one of the maps. For
each SUnit encountered, all the information about previous ones are still
available until a new BarrierChain is set, at which point the maps are cleared.
For huge regions, the algorithm becomes slow, therefore the maps will get
reduced at a threshold (current default is 1000 nodes), by a fraction (default 1/2).
These values can be tuned by use of CL options in case some test case shows that
they need to be changed (-dag-maps-huge-region and -dag-maps-reduction-size).
There has not been any considerable change observed in output quality or compile
time. There may now be more DAG edges inserted than before (i.e. if A->B->C,
then A->C is not needed). However, in a comparison run there were fewer total
calls to AA, and a somewhat improved compile time, which means this seems to
be not a problem.
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259201
This reverts commit r259117.
The LineInfo constructor is defined in the codeview library and we have
to link against it now. Doing that isn't trivial, so reverting for now.
llvm-svn: 259126
Adds a new family of .cv_* directives to LLVM's variant of GAS syntax:
- .cv_file: Similar to DWARF .file directives
- .cv_loc: Similar to the DWARF .loc directive, but starts with a
function id. CodeView line tables are emitted by function instead of
by compilation unit, so we needed an extra field to communicate this.
Rather than overloading the .loc direction further, we decided it was
better to have our own directive.
- .cv_stringtable: Emits the codeview string table at the current
position. Currently this just contains the filenames as
null-terminated strings.
- .cv_filechecksums: Emits the file checksum table for all files used
with .cv_file so far. There is currently no support for emitting
actual checksums, just filenames.
This moves the line table emission code down into the assembler. This
is in preparation for implementing the inlined call site line table
format. The inline line table format encoding algorithm requires knowing
the absolute code offsets, so it must run after the assembler has laid
out the code.
David Majnemer collaborated on this patch.
llvm-svn: 259117
While legalizing a 64-bit shift left by 1, the following occurs:
We split the shift operand in half: a high half and a low half.
We then create an ADDC with the low half and a ADDE with the high half +
the carry bit from the ADDC.
This is problematic if X is any_ext'd because the high half computation
is now undef + undef + carry bit and there is no way to ensure that the
two undef values had the same bitwise representation. This results in
the lowest bit in the high half turning into garbage.
Instead, do not try to turn shifts into arithmetic during type
legalization.
This fixes PR26350.
llvm-svn: 259065
Re-commit of r258951 after fixing layering violation.
The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.
In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 259035
Summary:
findBetterNeighborChains does not handle volatile or indexed stores.
However, it did not check when adding stores to ChainedStores.
Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D16463
llvm-svn: 259024
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 258951
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861