This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
Summary:
D30400 has enabled tADC and tSBC instructions to be unglued, thereby allowing CPSR to remain live between Thumb1 scheduling units.
Most Thumb1 instructions have an OptionalDef for CPSR; but the scheduler ignored the OptionalDefs, and could unwittingly insert a flag-setting instruction in between an ADDS and the corresponding ADC.
Reviewers: javed.absar, atrick, MatzeB, t.p.northover, jmolloy, rengolin
Reviewed By: javed.absar
Subscribers: rogfer01, efriedma, aemerson, rengolin, llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D31081
llvm-svn: 301106
getRawData exposes the internal type of the APInt class directly to its users. Ideally we wouldn't expose such an implementation detail.
This patch fixes a few of the easy cases by using truncate, extract, or a rotate.
llvm-svn: 301105
Summary:
Some targets need to be able to do more complex rendering than just adding an
operand or two to an instruction. For example, it may need to insert an
instruction to extract a subreg first, or it may need to perform an operation
on the operand.
In SelectionDAG, targets would create SDNode's to achieve the desired effect
during the complex pattern predicate. This worked because SelectionDAG had a
form of garbage collection that would take care of SDNode's that were created
but not used due to a later predicate rejecting a match. This doesn't translate
well to GlobalISel and the churn was wasteful.
The API changes in this patch enable GlobalISel to accomplish the same thing
without the waste. The API is now:
InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const;
where Root is the root of the match. The return value can be omitted to
indicate that the predicate failed to match, or a function with the signature
ComplexRendererFn can be returned. For example:
return OptionalComplexRendererFn(
[=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); });
adds two immediate operands to the rendered instruction. Immed and ShVal are
captured from the predicate function.
As an added bonus, this also reduces the amount of information we need to
provide to GIComplexOperandMatcher.
Depends on D31418
Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar
Reviewed By: ab
Subscribers: dberris, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31761
llvm-svn: 301079
In dwo files the fixed offset can be used - if the dwos are linked into
a dwp, the dwo consumer must use the dwp tables to find out where the
original range of the debug_info was and resolve the "section relative"
value relative to that original range - effectively
avoiding/reimplementing the relocation handling.
llvm-svn: 301072
Since Split DWARF needs to name the actual .dwo file that is generated,
it can't be known at the time the llvm::Module is produced as it may be
merged with other Modules before the object is generated and that object
may be generated with any name.
By passing the Split DWARF file name when LLVM is producing object code
the .dwo file name in the object file can match correctly.
The support for Split DWARF for implicit modules remains the same -
using metadata to store the dwo name and dwo id so that potentially
multiple skeleton CUs referring to different dwo files can be generated
from one llvm::Module.
llvm-svn: 301062
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
llvm-svn: 301047
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.
Differential Revision: https://reviews.llvm.org/D32330
llvm-svn: 301040
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.
This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 301019
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:
MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c
llvm-svn: 300940
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300930
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300913
Associate the version-when-defined with definitions of standard DWARF
constants. Identify the "vendor" for DWARF extensions.
Use this information to verify FORMs in .debug_abbrev are defined as
of the DWARF version specified in the associated unit.
Removed two tests that had specified DWARF v1 (which essentially does
not exist).
Differential Revision: http://reviews.llvm.org/D30785
llvm-svn: 300875
This enables use after free and uninit memory checking for memory
returned by a recycler. SelectionDAG currently relies on the opcode of a
free'd node being ISD::DELETED_NODE, so poke a hole in the asan poison
for SDNode opcodes. This means that we won't find some issues, but only
in SDag.
llvm-svn: 300868
This will become asan errors once the patch lands that poisons the
memory after free. The x86 change is a hack, but I don't see how to
solve this properly at the moment.
llvm-svn: 300867
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
llvm-svn: 300864
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
Adds MVT::ElementCount to represent the length of a
vector which may be scalable, then adds helper functions
that work with it.
Patch by Graham Hunter.
Differential Revision: https://reviews.llvm.org/D32019
llvm-svn: 300842
This patch adds a few helper functions to obtain new vector
value types based on existing ones without needing to care
about whether they are scalable or not.
I've confined their use to a few common locations right now,
and targets that don't have scalable vectors should never
need to care about these.
Patch by Graham Hunter.
Differential Revision: https://reviews.llvm.org/D32017
llvm-svn: 300838
- introduced in r300522 and found via the Swift LLDB testsuite.
The fix is to set the location kind to memory whenever an FrameIndex
location is emitted.
rdar://problem/31707602
llvm-svn: 300793
- introduced in r300522 and found via the Swift LLDB testsuite.
The fix is to set the location kind to memory whenever an FrameIndex
location is emitted.
rdar://problem/31707602
llvm-svn: 300790
I've changed one of the tests to not fold away, but we didn't and still don't do the transform
that the comment claims we do (and I don't know why we'd want to do that).
Follow-up to:
https://reviews.llvm.org/rL300725https://reviews.llvm.org/rL300763
llvm-svn: 300772
This allows forming more 'not' ops, so we get improvements for ISAs that have and-not.
Follow-up to:
https://reviews.llvm.org/rL300725
llvm-svn: 300763
This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207.
We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param.
llvm-svn: 300758
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in
SimplifyDemandedBits().
Some notes for reference:
1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions.
Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since
they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards
improving the vector codegen in that patch without writing any actual new code.
Differential Revision: https://reviews.llvm.org/D32230
llvm-svn: 300725
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
llvm-svn: 300664
This showed up in r300535/r300537, which were reverted in r300538 due to
some of the introduced tests in there failing on some bots, due to the
non-determinism fixed in this commit.
Re-committing r300535/r300537 will add 2 tests for the change in this
commit.
llvm-svn: 300663
Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.
Differential Revision: http://reviews.llvm.org/D32102
llvm-svn: 300583
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.
Reviewers: jyknight, bogner, javed.absar, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32086
llvm-svn: 300561
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:
remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)
Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.
llvm-svn: 300538
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
llvm-svn: 300535
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
- describe a variable in a register
- consist of only a DW_OP_reg
2. Memory location descriptions
- describe the address of a variable
3. Implicit location descriptions
- describe the value of a variable
- end with DW_OP_stack_value & friends
The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident. This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.
This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.
There are two major changes in this patch:
I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.
When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8
DBG_VALUE, RAX --> DW_OP_reg0 +0
DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0
All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.
https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>
Differential Revision: https://reviews.llvm.org/D31439
llvm-svn: 300522
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.
This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.
Differential Revision: https://reviews.llvm.org/D32126
llvm-svn: 300480
This is a version of D32090 that unifies all of the
`getInstrProf*SectionName` helper functions. (Note: the build failures
which D32090 would have addressed were fixed with r300352.)
We should unify these helper functions because they are hard to use in
their current form. E.g we recently introduced more helpers to fix
section naming for COFF files. This scheme doesn't totally succeed at
hiding low-level details about section naming, so we should switch to an
API that is easier to maintain.
This is not an NFC commit because it fixes llvm-cov's testing support
for COFF files (this falls out of the API change naturally). This is an
area where we lack tests -- I will see about adding one as a follow up.
Testing: check-clang, check-profile, check-llvm.
Differential Revision: https://reviews.llvm.org/D32097
llvm-svn: 300381
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.
llvm-svn: 300367
Instructions CALLSEQ_START..CALLSEQ_END and their target dependent
counterparts keep data like frame size, stack adjustment etc. These
data are accessed by getOperand using hard coded indices. It is
error prone way. This change implements the access by special methods,
which improve readability and allow changing data representation without
massive changes of index values.
Differential Revision: https://reviews.llvm.org/D31953
llvm-svn: 300196
The use of a DenseMap in precomputeTriangleChains does not cause
non-determinism, even though it is iterated over, as the only thing the
iteration does is to insert entries into a new DenseMap, which is not iterated.
Comment only change.
llvm-svn: 300088
The current heuristic is triggered on `InFlightCount > BufferLimit`
which isn't really helpful on in-order cores where BufferLimit is zero.
Note that we already get latency hiding effects for in order cores
by instructions staying in the pending queue on stalls; The additional
latency scheduling heuristics only have minimal effects after that while
occasionally increasing register pressure too much resulting in extra
spills.
My motivation here is additional spills/reloads ending up in a loop in
464.h264ref / BlockMotionSearch function resulting in a 4% overal
regression on an in order core. rdar://30264380
llvm-svn: 300083
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
Differential Revision: https://reviews.llvm.org/D31548
llvm-svn: 300032
Not clearing was causing non-deterministic compiles for large files. Addresses
for MachineBasicBlocks would end up colliding and we would lay out a block that
we assumed had been pre-computed when it had not been.
llvm-svn: 300022
If you run llc -stop-after=codegenprepare and feed the resulting MIR
to llc -start-after=codegenprepare, you'll have an empty machine
function since we haven't run any isel yet. Of course, this only works
if the MIRParser believes you that this is okay.
This is essentially a revert of r241862 with a fix for the problem it
was papering over.
llvm-svn: 299975
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299949
Use the same handling in the generic legalizer code as for the other
libcalls (G_FREM, G_FPOW).
Enable it on ARM for float and double so we can test it.
llvm-svn: 299931
Summary: Legalize only if the type is marked as Legal or Custom. If not, return Unsupported as LegalizerHelper is not able to handle non-power-of-2 types right now.
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, kristof.beyls, javed.absar, ab
Reviewed By: kristof.beyls, ab
Subscribers: dberris, rovka, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31711
llvm-svn: 299929
A fix for the bug reported in PR30911.
The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as
the last node to be unscheduled will gain access to the CallResource
register. But when a node is being picked, only CALLSEQ_END nodes are
checked against the CallResource and have their chains evaluated.
This then means that other CALLSEQ_BEGIN nodes can be scheduled
before the existing call sequence has been finalised. This patch adds
a check against the FrameSetup nodes in DelayForLiveRegs to prevent
this from happening.
Differential Revision: https://reviews.llvm.org/D31536
llvm-svn: 299926
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
llvm-svn: 299925
The math works out where it can actually be counter-productive. The probability
calculations correctly handle the case where the alternative is 0 probability,
rely on those calculations.
Includes a test case that demonstrates the problem.
llvm-svn: 299892
Qin may be large, and Succ may be more frequent than BB. Take these both into
account when deciding if tail-duplication is profitable.
llvm-svn: 299891
Merging identical blocks when it doesn't reduce fallthrough. It is common for
the blocks created from critical edge splitting to be identical. We would like
to merge these blocks whenever doing so would not reduce fallthrough.
llvm-svn: 299890
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.
The problematic assumptions include:
- That the pointer size used for the stack is the same size as
the code size pointer, which is also the maximum sized pointer.
- That 0 is an invalid, non-dereferencable pointer value.
These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.
llvm-svn: 299888
Summary:
For SETCC we aren't calculating the KnownZero bits at all. I've copied the code from computeKnownZero over for this.
For AssertZExt we were only setting KnownZero for bits that were demanded. But the upper bits are zero whether they were demanded or not.
I'm interested in fixing this because my belief is the first part of the ISD::AND handling code in SimplifyDemandedBits largely exists because of these two bugs. In that code we go to computeKnownBits for the LHS and optimize a RHS constant. Because computeKnownBits handles SETCC and AssertZExt correctly we get better information sometimes than when we call SimplifyDemandedBits on the LHS later. With these two issues fixed in SimplifyDemandedBits I was able to remove that computeKnownBits call and still pass all X86 tests. I'll submit that change in a separate patch.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31715
llvm-svn: 299839
The original instruction might get legalized and erased and expanded
into intermediate instructions and the intermediate instructions might
fail legalization. This end up in reporting GISelFailure on the erased
instruction.
Instead report GISelFailure on the intermediate instruction which failed
legalization.
Reviewed by: ab
llvm-svn: 299802
This reverts commit r299766. This change appears to have broken the MIPS
buildbots. Reverting while I investigate.
Revert "[mips] Remove usage of debug only variable (NFC)"
This reverts commit r299769. Follow up commit.
llvm-svn: 299788
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.
The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.
Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.
By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.
Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".
This patch enables the MIPS backend to take either form for vector types.
Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur
Differential Revision: https://reviews.llvm.org/D27845
llvm-svn: 299766
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.
Differential Revision: https://reviews.llvm.org/D30596
llvm-svn: 299723
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299699
Since the BUILD_VECTOR has already been checked by
isBuildVectorOfConstantSDNodes() in SelectionDAG::getNode() for a
SIGN_EXTEND_INREG, it can be assumed that Op is always either undef or a
ConstantSDNode, and Ops.size() will always equal VT.getVectorNumElements().
llvm-svn: 299647
This is a follow-on to r299096 which added support for fmadd.
Subtract does not have the case where with two multiply operands we commute in
order to fuse with the multiply with the fewer uses.
llvm-svn: 299572
Summary:
Use an explicit work queue instead, to avoid accidentally
causing stack overflows for input with very large CFGs.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D31681
llvm-svn: 299569
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401
It's likely that other targets will want to enable this hook for scalar transforms,
and there are probably other patterns that can use bitwise logic to reduce comparisons.
Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).
Differential Revision: https://reviews.llvm.org/D31483
llvm-svn: 299542
When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with
constant operands, a new BUILD_VECTOR node will be created transformed
constants.
Llvm-stress found a case where the new BUILD_VECTOR had constant operands
of an illegal type, because the (legal) element type is in fact not a legal
scalar type.
This patch changes this so that the new BUILD_VECTOR has the same operand
type as the old one.
Review: Eli Friedman, Nirav Dave
https://bugs.llvm.org//show_bug.cgi?id=32422
llvm-svn: 299540
Decouple this setting from EnableIRPA.
To support function calls on AMDGPU, it is necessary to
report the global register usage throughout the kernel's
call graph, so callees need to be handled first.
llvm-svn: 299487
If an instruction has a true dependency, it makes sense for to use that
register for any undef read operands in the same instruction (we'll have
to wait for that register to become available anyway). This logic
was already implemented. However, the code would then still try to
revisit that instruction and break the dependency (and always fail,
since by definition a true dependency has to be live before the
instruction). Avoid revisiting such instructions as a performance
optimization. No functional change.
Differential Revision: https://reviews.llvm.org/D30173
llvm-svn: 299467
Summary:
Lift the restrictions that prevented the tree walking introduced in the
previous change and add support for patterns like:
(G_ADD (G_MUL (G_SEXT $src1), (G_SEXT $src2)), $src3) -> SMADDWrrr $dst, $src1, $src2, $src3
Also adds support for G_SEXT and G_ZEXT to support these cases.
One particular aspect of this that I should draw attention to is that I've
tried to be overly conservative in determining the safety of matches that
involve non-adjacent instructions and multiple basic blocks. This is intended
to be used as a cheap initial check and we may add a more expensive check in
the future. The current rules are:
* Reject if any instruction may load/store (we'd need to check for intervening
memory operations.
* Reject if any instruction has implicit operands.
* Reject if any instruction has unmodelled side-effects.
See isObviouslySafeToFold().
Reviewers: t.p.northover, javed.absar, qcolombet, aditya_nandakumar, ab, rovka
Reviewed By: ab
Subscribers: igorb, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30539
llvm-svn: 299430
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.
Reviewers: jmolloy, mcrosier, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, aemerson, rengolin, mcrosier
Differential Revision: https://reviews.llvm.org/D28680
llvm-svn: 299379
This reverts commit r299047 which is incorrect because the
simplification may result in incorrect propogation of undefs to users of
the folded shuffle.
Thanks to Andrea Di Biagio for pointing this out.
llvm-svn: 299368
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.
Differential Revision: https://reviews.llvm.org/D31565
llvm-svn: 299362
The code already allowed vector types in via "isInteger" (which might want
a more specific name), so use splat-friendly constant predicates to match
those types.
llvm-svn: 299304
This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros.
Fixes PR32484.
llvm-svn: 299291
REG_SEQUENCE falls into the same category as COPY for operands mapping:
- They don't have MCInstrDesc with register constraints
- The input variable could use whatever register classes
- It is possible to have register class already assigned to the operands
In particular, given REG_SEQUENCE are always target specific because of
the subreg indices. Those indices must apply to the register class of
the definition of the REG_SEQUENCE and therefore, the target must set a
register class to that definition. As a result, the generic code can
always use that register class to derive a valid mapping for a
REG_SEQUENCE.
llvm-svn: 299285
(and (setlt X, 0), (setlt Y, 0)) --> (setlt (and X, Y), 0)
We have 7 similar folds, but this one got away. The fact that the
x86 test with a branch didn't change is probably a separate bug. We
may also be missing this and the related folds in instcombine.
llvm-svn: 299252
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.
Followup to D25691.
Differential Revision: https://reviews.llvm.org/D31311
llvm-svn: 299219
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.
Differential Revision: https://reviews.llvm.org/D31249
llvm-svn: 299201
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.
The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.
The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.
Differential Revision: https://reviews.llvm.org/D31169
llvm-svn: 299096
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.
Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.
SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.
Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).
Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed. For now, use
'isEnabled' as a crude replacement.
This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'. Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.
This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.
Differential Revision: https://reviews.llvm.org/D31405
llvm-svn: 299093
Turns out integerPartWidth only explicitly defines the width of the tc functions in the APInt class. Functions that aren't used by APInt implementation itself. Many places in the code base already assume APInt is made up of 64-bit pieces. Explicitly assuming 64-bit here doesn't make that situation much worse. A full audit would need to be done if it ever changes.
llvm-svn: 299059
Instantiation of the MachineVerifierPass through
PassInfo::getNormalCtor would yield a segfault since the default
constructor of the MachineVerifierPass takes a reference to nullptr.
Patch by Simone Pellegrini.
Differential Revision: https://reviews.llvm.org/D31387
llvm-svn: 298987
Properly propagate the FMF from the LLVM IR to this flag.
This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.
Differential Revision: https://reviews.llvm.org/D31165
llvm-svn: 298961
Deal with case that initial node is deleted during dag-combine leading
to an assertional failure in promoteIntShiftOp.
Fixes PR32420.
Reviewers: spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31403
llvm-svn: 298931
Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from
being used.
Fixes PR32340 and PR32345.
Reviewers: hfinkel, dbabokin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31148
llvm-svn: 298923
This patch enables schedulers to specify instructions that
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.
Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744
llvm-svn: 298885
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality,
we can replace memcmp calls with inline code that is both smaller and faster.
Differential Revision: https://reviews.llvm.org/D31290
llvm-svn: 298775
If we have an array of a user-defined aggregates for which there was an
ODR violation, then the array size will not necessarily match the number
of elements times the size of the element.
Fixes PR32383
llvm-svn: 298750
Summary:
For the following CFG:
A->B
B->C
A->C
If there is another edge B->D, then ABC should not be considered as triangle.
Reviewers: davidxl, iteratee
Reviewed By: iteratee
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D31310
llvm-svn: 298661
Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31225
llvm-svn: 298656
The old candidate collection method in the outliner caused some very large
regressions in compile time on large tests. For MultiSource/Benchmarks/7zip it
caused a 284.07 s or 1156% increase in compile time. On average, using the
SingleSource/MultiSource tests, it caused an average increase of 8 seconds in
compile time (something like 1000%).
This commit replaces that candidate collection method with a new one which
only visits each node in the tree once. This reduces the worst compile time
increase (still 7zip) to a 0.542 s overhead (22%) and the average compile time
increase on SingleSource and MultiSource to 0.018 s (4%).
llvm-svn: 298648
It is not guaranteed that the memory used for MachineBasicBlocks in
the previous MachineFunction hasn't been freed, so holding on to a
pointer to the last function's isn't correct. Particularly I have
observed the sret.ll testcase failing because the first BasicBlock in
the new function happened to be allocated to the exact same memory as
the previously saved and (deleted) PrevInstBB.
llvm-svn: 298642
Also add an assertion for the case that there are multiple FI
expressions with a DW_OP_LLVM_fragment; which should violate internal
constraints in DbgVariable.
llvm-svn: 298518
This patch changes the behavior of IRTranslating intrinsics where we
now create VREG + G_CONSTANT for ConstantInt values. We already do this
for FloatingPoint values. This makes it easier for the backends to
select code and it won't have to de-duplicate creation+selection of
constants.
Reviewed by: ab
llvm-svn: 298473
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.
llvm-svn: 298472
until the rest of the expression is known.
This is still an NFC refactoring in preparation of a subsequent bugfix.
This reapplies r298388 with a bugfix for non-physical frame registers.
llvm-svn: 298471
Quentin points out that r298358 would cause us to emit different code
with debug info. That's a big no-no; also erase the instructions that
only live thanks to DBG_VALUE users.
Adrian explained how this is an existing problem and an OK thing to do:
clang has allocas for all variables so shouldn't be affected at -O0, but
swift uses a bit of inlineasm to explicitly keep values live for the
purpose of debug info quality. I'm not sure there is a better scheme.
llvm-svn: 298460
MI can represent fallthrough to layout successor blocks, and our
post-isel representation uses that extensively.
We might as well use it too, to avoid translating and carrying along
unnecessary branches.
llvm-svn: 298459
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
block.
This fixes http://llvm.org/PR32353
llvm-svn: 298448
A bool is represented by a single byte, which the ARM ABI requires to be either
0 or 1. So we cannot use G_ANYEXT when legalizing the type.
llvm-svn: 298439
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.
llvm-svn: 298389
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().
This patch adds the needed handling, along with a test case.
Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077
llvm-svn: 298357
In doing so, clean up the MD5 interface a little. Most
existing users only care about the lower 8 bytes of an MD5,
but for some users that care about the upper and lower,
there wasn't a good interface. Furthermore, consumers
of the MD5 checksum were required to handle endianness
details on their own, so it seems reasonable to abstract
this into a nicer interface that just gives you the right
value.
Differential Revision: https://reviews.llvm.org/D31105
llvm-svn: 298322
and mark the methods as protected.
Besides reducing the surface area of DwarfExpression, this is in
preparation for an upcoming bugfix in the DwarfExpression
implementation, for which it will be necessary to defer emitting
register operations until the rest of the expression is known.
NFC
llvm-svn: 298309
Move the check for "MF->hasWinCFI()" up into the calculation of the
shouldEmitMoves boolean, rather than putting it in the early returning
if. This ensures that endFunction doesn't try to emit .seh_* directives
for leaf functions.
llvm-svn: 298276
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.
Differential Revision: https://reviews.llvm.org/D31039
llvm-svn: 298254
We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result.
Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes.
This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases.
Differential Revision: https://reviews.llvm.org/D31074
llvm-svn: 298226
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).
Preserve existing tests by adding COPYs.
In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.
llvm-svn: 298224
The MIR printer dumps a string that describe the register mask of a function.
A static predefined list of register masks matches a static list of strings.
However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails.
This patch adds support to custom register mask printing and dumping.
Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic.
As such this data needs to be dumped and parsed back to the Machine Register Info.
Differential Revision: https://reviews.llvm.org/D30971
llvm-svn: 298207
Let targets specialize the pass with the register class so we can get a
parameterless default constructor and can put the pass into the pass
registry to enable testing with -run-pass=.
llvm-svn: 298184
This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section
pointing to the metadata argument's section. The effect of that is a reverse dependency
between sections for the linker GC.
!associated does not change the behavior of global-dce. The global
may also need to be added to llvm.compiler.used.
Since SHF_LINK_ORDER is per-section, !associated effectively enables
fdata-sections for the affected globals, the same as comdats do.
Differential Revision: https://reviews.llvm.org/D29104
llvm-svn: 298157
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects. This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.
It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.
Differential Revision: https://reviews.llvm.org/D29845
llvm-svn: 298156
Summary:
Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case.
This change was motivated from D26524. Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP.
Reviewers: jmolloy, qcolombet, mcrosier, javed.absar
Reviewed By: qcolombet
Subscribers: rengolin, llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D27853
llvm-svn: 298114
- This fixes a bug where subregister incompatible with the vregs register
class where used.
- Implement the case where multiple copies are necessary to cover a
given lanemask.
Differential Revision: https://reviews.llvm.org/D30438
llvm-svn: 298025
This fixes two problems when VirtRegMap encounters bundles:
- When substituting a vreg subregister def with an actual register the
internal read flag must be cleared.
- Removing an identity COPY from a bundle needs to use
removeFromBundle() and a newly introduced function to update
SlotIndexes.
No testcase here, because none of the in-tree targets trigger this,
however an upcoming commit of mine will need this and the testcase there
will trigger this.
Differential Revision: https://reviews.llvm.org/D30925
llvm-svn: 298024
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D31052
llvm-svn: 298010
Citing http://bugs.llvm.org/show_bug.cgi?id=32288
The DWARF generated by LLVM includes this location:
0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
to assume the DWARF consumer knows which part of a register
logically holds the value (low bytes, high bytes, how many bytes,
etc) for a primitive value like an integer.
This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.
(This reapplies r297960 with two additional testcase updates).
rdar://problem/31069390
https://reviews.llvm.org/D31010
llvm-svn: 297965
Citing http://bugs.llvm.org/show_bug.cgi?id=32288
The DWARF generated by LLVM includes this location:
0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
to assume the DWARF consumer knows which part of a register
logically holds the value (low bytes, high bytes, how many bytes,
etc) for a primitive value like an integer.
This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.
rdar://problem/31069390
https://reviews.llvm.org/D31010
llvm-svn: 297960
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.
(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.
The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.
Updated tests:
test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll
Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489
llvm-svn: 297930
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%
llvm-svn: 297925
Currently, we create a G_CONSTANT for every "synthetic" integer
constant operand (for instance, for the G_GEP offset).
Instead, share the G_CONSTANTs we might have created by going through
the ValueToVReg machinery.
When we're emitting synthetic constants, we do need to get Constants from
the context. One could argue that we shouldn't modify the context at
all (for instance, this means that we're going to use a tad more memory
if the constant wasn't used elsewhere), but constants are mostly
harmless. We currently do this for extractvalue and all.
For constant fcmp, this does mean we'll emit an extra COPY, which is not
necessarily more optimal than an extra materialized constant.
But that preserves the current intended design of uniqued G_CONSTANTs,
and the rematerialization problem exists elsewhere and should be
resolved with a single coherent solution.
llvm-svn: 297875
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.
Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.
llvm-svn: 297871
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.
llvm-svn: 297869
It makes the output function layout more predictable; the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.
getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything. Rename it and extract the MBB creation logic out of it.
A couple tests were sensitive to the order. Update them.
llvm-svn: 297868
This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom.
In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent.
Differential Revision: https://reviews.llvm.org/D30965
llvm-svn: 297860
Using the module ID here is wrong for a couple of reasons:
1) The module ID is not persisted, so we can end up with different
object file contents given the same input file (for example if the same
file is accessed via different paths).
2) With ThinLTO the module ID field may contain the path to a bitcode file,
which is incorrect, as the .file argument is supposed to contain the path to
a source file.
Differential Revision: https://reviews.llvm.org/D30584
llvm-svn: 297853
Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.
Reviewers: aprantl, probinson, MatzeB, rob.lougher
Reviewed By: aprantl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D30226
llvm-svn: 297805
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.
Patch by Moritz Angermann!
Differential Revision: https://reviews.llvm.org/D30770
llvm-svn: 297804
Summary:
<1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES
instruction for them.
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar
Reviewed By: qcolombet
Subscribers: dberris, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D30948
llvm-svn: 297792
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.
Depends on D30046, D29712
Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D30089
llvm-svn: 297782
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.
ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.
At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.
Differential Revision: https://reviews.llvm.org/D29639
llvm-svn: 297780
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.
Differential Revision: https://reviews.llvm.org/D30708
llvm-svn: 297716
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.
Differential Revision: https://reviews.llvm.org/D28566
llvm-svn: 297715
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
llvm-svn: 297698
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
This reverts commit r242302. External type refs of this form were
never used by any LLVM frontend so this is effectively dead code.
(They were introduced to support clang module debug info, but in the
end we came up with a better design that doesn't use this feature at
all.)
rdar://problem/25897929
Differential Revision: https://reviews.llvm.org/D30917
llvm-svn: 297684
The previous algorithm for RegUsageInfoCollector had pretty bad
performance on architectures with a lot of registers that alias
a lot one another, because we potentially iterate for every register
over all the aliasing registers. This costs even more if the function
is small and doesn't define a lot of registers.
This patch changes the algorithm to one that while iterating over
all the registers it will iterate over the aliasing registers only
if the register itself is defined.
This should be faster based on the assumption that only a subset
of the whole LLVM registers set is actually defined in the function.
Differential Revision: https://reviews.llvm.org/D30880
llvm-svn: 297673
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.
Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.
llvm-svn: 297653
We don't need to check whether the fallback path is enabled to return
false. Just do that all the time on error cases, the caller knows (or
at least should know!) how to handle the failing case.
llvm-svn: 297535
The problem can occur in presence of subregs. If we are swapping two
instructions defining different subregs of the same register we will
get a new liveout from a block. We need to preserve value number for
block's liveout for successor block's livein to match.
Differential Revision: https://reviews.llvm.org/D30558
llvm-svn: 297534
Summary: No test case as none of the in-tree targets with GlobalISel support has this condition.
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab
Reviewed By: qcolombet
Subscribers: dberris, rovka, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D30786
llvm-svn: 297512
Summary:
We don’t actually use LegalizerInfo in Legalizer pass, it’s just passed
as an argument.
In order to check if an instruction is legal or not, we need to get LegalizerInfo
by calling `MI.getParent()->getParent()->getSubtarget().getLegalizerInfo()`.
Instead, make LegalizerInfo accessible in LegalizerHelper.
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, kristof.beyls
Reviewed By: qcolombet
Subscribers: dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D30838
llvm-svn: 297491
Summary:
Depends on D30379
This improves the state of things for the sub class of operation.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30436
llvm-svn: 297482
Summary: As per title. This is extracted from D29872 and I threw SADDO in.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30379
llvm-svn: 297479
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.
Differential Revision: https://reviews.llvm.org/D30780
llvm-svn: 297458
Summary: This essentially does the same transform as for ADC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30417
llvm-svn: 297416
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.
The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken. And it still gets it wrong if the variable is at SP
directly.
llvm-svn: 297410
Summary: This essentially does the same transform as for SUBC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30437
llvm-svn: 297404
As discussed in the review thread for rL297026, this is actually 2 changes that
would independently fix all of the test cases in the patch:
1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before
foldBinopIntoSelect() as a matter of efficiency.
I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().
I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).
Differential Revision:
https://reviews.llvm.org/D30741
llvm-svn: 297384
This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator.
Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes.
Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released.
llvm-svn: 297319
This helps in cases involving bitfields where an AND is exposed by
legalization.
Differential Revision: https://reviews.llvm.org/D30472
llvm-svn: 297249
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 297241
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".
llvm-svn: 297226
Some intrinsics take metadata parameters. These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.
llvm-svn: 297209
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.
Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.
The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.
llvm-svn: 297208
For vector operands, the `select` instruction supports both vector and
non-vector conditions. The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).
Make it possible to express the full range of G_SELECTs.
llvm-svn: 297207
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.
Also skip them when applying the mapping. Otherwise, we could end
up with mappings that we can't apply.
While there, duplicate an assert to distinguish between the two
error conditions.
llvm-svn: 297201
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
llvm-svn: 297200
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
rdar://30839288
llvm-svn: 297197
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.
Somehow, this change causes the build to need Attributes.gen before it's been
generated.
llvm-svn: 297188
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 297177
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.
llvm-svn: 297100
This is known incomplete and not called in the right order relative to
other folds, but that's the current behavior. I'm just trying to clean
this up before making actual functional changes to make the patch smaller.
The logic here should mimic the IR equivalents that are in InstSimplify's
simplifyDivRem().
llvm-svn: 297086
Some late additions to DWARF v5 were not in Dwarf.def; also one form
was redefined. Add the new cases to relevant switches in different
parts of LLVM. Replace DW_FORM_ref_sup with DW_FORM_ref_sup[4,8].
I did not add support for DW_FORM_strx3/addrx3 other that defining the
constants. We don't have any infrastructure to support these.
Differential Revision: http://reviews.llvm.org/D30664
llvm-svn: 297085
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
llvm-svn: 297081
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.
Differential Revision: https://reviews.llvm.org/D30556
llvm-svn: 297054
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.
So now we detect these degenerate cases and emit real casts instead.
llvm-svn: 297051
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.
llvm-svn: 297042
Refactoring of duplicated code and more fixes to follow.
This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html
Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.
The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.
llvm-svn: 297026
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.
For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.
Reviewers: dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29702
llvm-svn: 296998
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.
We're missing a couple of shuffle combines that will be added in a future patch for review.
Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.
Differential Revision: https://reviews.llvm.org/D30549
llvm-svn: 296985
This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator| support rvalue references similar to add/sub.
llvm-svn: 296981
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.
This is part of the ongoing process to obsolete D24480. The motivation is to
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
PowerPC is an obvious winner for this kind of transform because it has fast and complete
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).
x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86
changes just shows that that code is a mess of special-case holes. We may be able to remove
some of that logic now.
My guess is that other targets will want to enable this hook for most cases. The likely
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to
math/logic, but the general transform for any 2 constants needs one more instruction
(multiply or 'and').
ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.
Differential Revision: https://reviews.llvm.org/D30537
llvm-svn: 296977
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).
When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).
This fixes the fp128 test failures with expensive checks for X86.
I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.
Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon
Reviewed By: chh
Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D29265
llvm-svn: 296964
We can now end up in situations where we initiate LiveIntervalUnion
queries with different SubRanges against the same register unit, so the
assert() no longer holds in all cases. Just recalculate now when we know
the cache is out of date.
llvm-svn: 296928
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.
I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.
llvm-svn: 296921
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
llvm-svn: 296862
A call should never modify the stack pointer, but some backends are
not so sure about this and never list SP in the regmask. For the
purposes of LiveDebugValues we assume a call never clobbers SP. We
already have a similar workaround in DbgValueHistoryCalculator (which
we hopefully can retire soon).
This fixes the availabilty of local ASANified variables on AArch64.
rdar://problem/27757381
llvm-svn: 296847
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:
1) The post-dominators marked 50% are actually taken 56% (This shrinks with
longer chains)
2) The chains are statically correlated. Branch probabilities have a very
U-shaped distribution.
[http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
If the branches in a chain are likely to be from the same side of the
distribution as their predecessor, but are independent at runtime, this
transformation is profitable. (Because the cost of being wrong is a small
fixed cost, unlike the standard triangle layout where the cost of being
wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
branch was taken positively influences whether the next branch will be
taken
We believe that 2 and 3 are common enough to justify the small margin in 1.
The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.
llvm-svn: 296845
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below
```
extern int bar();
extern int baz();
int foo(int x, int y) {
if (x != y)
return bar();
else
return baz();
}
```
, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:
```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
%cmp = icmp ne i32 %x, %y, !dbg !14
br i1 %cmp, label %if.then, label %if.else, !dbg !16
if.then: ; preds = %entry
%call = tail call i32 (...) @bar() #3, !dbg !17
br label %return, !dbg !18
if.else: ; preds = %entry
%call1 = tail call i32 (...) @baz() #3, !dbg !19
br label %return, !dbg !20
return: ; preds = %if.else, %if.then
%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
ret i32 %retval.0, !dbg !21
}
!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)
```
As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.
Reviewers: atrick, bogner, andreadb, craig.topper, aprantl
Reviewed By: andreadb
Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29813
llvm-svn: 296825
After several smaller patches to get most of the core improvements
finished up, this patch is a straight move and header fixup of
the source.
Differential Revision: https://reviews.llvm.org/D30266
llvm-svn: 296810
This bug was introduced with:
https://reviews.llvm.org/rL296699
There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.
The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().
llvm-svn: 296768
If dominator tree is not calculated or is invalidated, set corresponding
pointer in the pass state to nullptr. Such pointer value will indicate
that operations with dominator tree are not allowed. In particular, it
allows to skip verification for such pass state. The dominator tree is
not calculated if the machine dominator pass was skipped, it occures in
the case of entities with linkage available_externally.
The change fixes some test fails observed when expensive checks
are enabled.
Differential Revision: https://reviews.llvm.org/D29280
llvm-svn: 296742
Surprisingly, one of the three interference checks in LiveRegMatrix was
using the main live range instead of the apropriate subregister range
resulting in unnecessarily conservative results.
llvm-svn: 296722
Summary:
This can be used to optimize large multiplications after legalization.
Depends on D29565
Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29587
llvm-svn: 296711
Until now, we've had to use -global-isel to enable GISel. But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.
This first step adds an override for the target to explicitly define its
level of support. For AArch64, do that using
a new command-line option (I know..):
-aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.
Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!
While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is
such a cold path that in practice that shouldn't matter at all.
llvm-svn: 296710
This is part of the ongoing attempt to improve select codegen for all targets and select
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().
I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).
The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105
Differential Revision: https://reviews.llvm.org/D30502
llvm-svn: 296699
Take DW_FORM_implicit_const attribute value into account when profiling
DIEAbbrevData.
Currently if we have two similar types with implicit_const attributes and
different values we end up with only one abbrev in .debug_abbrev section.
For example consider two structures: S1 with implicit_const attribute ATTR
and value VAL1 and S2 with implicit_const ATTR and value VAL2.
The .debug_abbrev section will contain only 1 related record:
[N] DW_TAG_structure_type DW_CHILDREN_yes
DW_AT_ATTR DW_FORM_implicit_const VAL1
// ....
This is incorrect as struct S2 (with VAL2) will use abbrev record with VAL1.
With this patch we will have two different abbreviations here:
[N] DW_TAG_structure_type DW_CHILDREN_yes
DW_AT_ATTR DW_FORM_implicit_const VAL1
// ....
[M] DW_TAG_structure_type DW_CHILDREN_yes
DW_AT_ATTR DW_FORM_implicit_const VAL2
// ....
llvm-svn: 296691
- We only need the information from the base class, not the additional
details in the LiveInterval class.
- Spread more `const`
- Some code cleanup
llvm-svn: 296684
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped. This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.
This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.
Supersedes D28388
Fixes PR26328
Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29668
llvm-svn: 296683
This patch adds a MachineSSA pass that coalesces blocks that branch
on the same condition.
Committing on behalf of Lei Huang.
Differential Revision: https://reviews.llvm.org/D28249
llvm-svn: 296670
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.
This fixes PR32108.
Reviewers: jyknight, hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30506
llvm-svn: 296668
DWARF may define a default lower-bound for arrays in languages defined
in a particular DWARF version. But the logic to suppress an
unnecessary lower-bound attribute was looking at the hard-coded
default DWARF version, not the version that had been requested.
Also updated the list with all languages defined in DWARF v5.
Differential Revision: http://reviews.llvm.org/D30484
llvm-svn: 296652
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 296651
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.
If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).
We need to erase them, as we know they aren't necessary (in addition to
being incorrect).
We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.
llvm-svn: 296552
Iterating on the use-list we're modifying doesn't work: after the first
iteration, the use-list iterator will point to a MachineOperand
referencing the new register. This caused us to skip the other uses to
replace.
Instead, use MRI.replaceRegWith(), which accounts for this behavior.
llvm-svn: 296551
Requesting DWARF v5 will now get you the new compile-unit and
type-unit headers. llvm-dwarfdump will also recognize them.
Differential Revision: http://reviews.llvm.org/D30206
llvm-svn: 296514
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
llvm-svn: 296486
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which functions have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.
This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied.
Patch by: James Henderson
Differential Revision: https://reviews.llvm.org/D29023
llvm-svn: 296483
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 296474
Summary:
With this change ImplicitNullCheck optimization uses alias analysis
and can use load/store memory access for implicit null check if there
are other load/store before but memory accesses do not alias.
Patch by Serguei Katkov!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30331
llvm-svn: 296440
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.
This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.
The outliner is run like so:
clang -mno-red-zone -mllvm -enable-machine-outliner file.c
Patch by Jessica Paquette<jpaquette@apple.com>!
rdar://29166825
Differential Revision: https://reviews.llvm.org/D26872
llvm-svn: 296418
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296416
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.
This makes the code cleaner and more general, but otherwise
has NFC.
llvm-svn: 296415
This was reverted because it was breaking some builds, and
because of incorrect error code usage. Since the CL was
large and contained many different things, I'm resubmitting
it in pieces.
This portion is NFC, and consists of:
1) Renaming classes to follow a consistent naming convention.
2) Fixing the const-ness of the interface methods.
3) Adding detailed doxygen comments.
4) Fixing a few instances of passing `const BinaryStream& X`. These
are now passed as `BinaryStreamRef X`.
llvm-svn: 296394
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.
This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.
Differential Revision: https://reviews.llvm.org/D30176
llvm-svn: 296381
Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions.
Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie
Reviewed By: dblaikie
Subscribers: dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D30026
llvm-svn: 296371
This pattern is essentially a i16 load from p+1 address:
%p1.i16 = bitcast i8* %p to i16*
%p2.i8 = getelementptr i8, i8* %p, i64 2
%v1 = load i16, i16* %p1.i16
%v2.i8 = load i8, i8* %p2.i8
%v2 = zext i8 %v2.i8 to i16
%v1.shl = shl i16 %v1, 8
%res = or i16 %v1.shl, %v2
Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.
llvm-svn: 296336
Summary:
While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object.
To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler.
The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation.
Fixes PR30284.
Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30242
llvm-svn: 296260
r296215, "[PDB] General improvements to Stream library."
r296217, "Disable BinaryStreamTest.StreamReaderObject temporarily."
r296220, "Re-enable BinaryStreamTest.StreamReaderObject."
r296244, "[PDB] Disable some tests that are breaking bots."
r296249, "Add static_cast to silence -Wc++11-narrowing."
std::errc::no_buffer_space should be used for OS-oriented errors for socket transmission.
(Seek discussions around llvm/xray.)
I could substitute s/no_buffer_space/others/g, but I revert whole them ATM.
Could we define and use LLVM errors there?
llvm-svn: 296258
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296252
This adds various new functionality and cleanup surrounding the
use of the Stream library. Major changes include:
* Renaming of all classes for more consistency / meaningfulness
* Addition of some new methods for reading multiple values at once.
* Full suite of unit tests for reader / writer functionality.
* Full set of doxygen comments for all classes.
* Streams now store their own endianness.
* Fixed some bugs in a few of the classes that were discovered
by the unit tests.
llvm-svn: 296215
This is part of a larger effort to get the Stream code moved
up to Support. I don't want to do it in one large patch, in
part because the changes are so big that it will treat everything
as file deletions and add, losing history in the process.
Aside from that though, it's just a good idea in general to
make small changes.
So this change only changes the names of the Stream related
source files, and applies necessary source fix ups.
llvm-svn: 296211
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.
llvm-svn: 296190
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.
llvm-svn: 296182
All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm
(i.e. a ConstantInt), so that's what the selector needs to look for.
llvm-svn: 296176
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.
Differential Revision: https://reviews.llvm.org/D30114
llvm-svn: 296167
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296149
The motivation for filling out these select-of-constants cases goes back to D24480,
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531https://reviews.llvm.org/rL159230
The idea is that we should always canonicalize patterns like this to a select-of-constants
in IR because that's the smallest IR and the best for value tracking. Note that we currently
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151
As this patch shows, most targets generate better machine code for simple ext/add/not ops
rather than a select of constants. So the follow-up steps to make this less of a patchwork
of special-case folds and missing IR canonicalization:
1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2 Have InstCombine canonicalize in the other direction (create more selects).
Differential Revision: https://reviews.llvm.org/D30180
llvm-svn: 296137
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.
The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.
The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
%1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
%1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.
Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.
Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.
Depends on D29711
Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar
Reviewed By: rovka
Subscribers: aemerson, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D29712
llvm-svn: 296131
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296060
We were stopping the translation of the parent block when the
translation of an instruction failed, but we were still trying to
translate the other blocks of the parent function.
Don't do that.
llvm-svn: 296047
Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.
Differential Revision: https://reviews.llvm.org/D30308
llvm-svn: 296018
Having more fine-grained information on the specific construct that
caused us to fallback is valuable for large-scale data collection.
We still have the fallback warning, that's also used for FastISel.
We still need to remove the fallback warning, and teach FastISel to also
emit remarks (it currently has a combination of the warning, stats, and
debug prints: the remarks could unify all three).
The abort-on-fallback path could also be better handled using remarks:
one could imagine a "-Rpass-error", analoguous to "-Werror", which would
promote missed/failed remarks to errors. It's not clear whether that
would be useful for other remarks though, so we're not there yet.
llvm-svn: 296013
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.
Differential Revision: https://reviews.llvm.org/D29835
llvm-svn: 296009
Since LoopInfo is not available in machine passes as universally as in IR
passes, using the same approach for OptimizationRemarkEmitter as we did for IR
will run LoopInfo and DominatorTree unnecessarily. (LoopInfo is not used
lazily by ORE.)
To fix this, I am modifying the approach I took in D29836. LazyMachineBFI now
uses its client passes including MachineBFI itself that are available or
otherwise compute them on the fly.
So for example GreedyRegAlloc, since it's already using MBFI, will reuse that
instance. On the other hand, AsmPrinter in Justin's patch will generate DT,
LI and finally BFI on the fly.
(I am of course wondering now if the simplicity of this approach is even
preferable in IR. I will do some experiments.)
Testing is provided by an updated version of D29837 which requires Justin's
patch to bring ORE to the AsmPrinter.
Differential Revision: https://reviews.llvm.org/D30128
llvm-svn: 295996
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.
Differential Revision: https://reviews.llvm.org/D26722
llvm-svn: 295803
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64. The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.
The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used. One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.
This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, mcrosier, sebpop, llvm-commits
Differential Revision: https://reviews.llvm.org/D28813
llvm-svn: 295746
- Fix doxygen comments (do not repeat documented name, remove definition
comment if there is already one at the declaration, add \p, ...)
- Add some const modifiers
- Use range based for
llvm-svn: 295688
Summary:
Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of
```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```
, but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue.
Reviewers: qcolombet, aprantl, craig.topper, MatzeB
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29902
llvm-svn: 295684
- Adapt MachineBasicBlock::getName() to have the same behavior as the IR
BasicBlock (Value::getName()).
- Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in
the CodeGen library.
- MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region").
- MachineRegionInfo should depend on MachineDominatorTree,
MachinePostDominatorTree and MachineDominanceFrontier instead of their
respective IR versions.
- Since there were no tests for this, add a X86 MIR test.
Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com>
llvm-svn: 295518
This fixes PR31381, which caused an assertion and/or invalid debug info.
This affects debug variables that have multiple fragments in the MMI
side (i.e.: in the stack frame) table.
rdar://problem/30571676
llvm-svn: 295486
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).
This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.
The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).
Differential Revision: https://reviews.llvm.org/D29454
llvm-svn: 295451
Summary:
This is an issue both with regular and Thin LTO. When we link together
a DICompileUnit that is marked NoDebug (e.g when compiling with -g0
but applying an AutoFDO profile, which requires location tracking
in the compiler) and a DICompileUnit with debug emission enabled,
we can have failures during dwarf debug generation. Specifically,
when we have inlined from the NoDebug compile unit into the debug
compile unit, we can fail during construction of the abstract and
inlined scope DIEs. This is because the SPMap does not include NoDebug
CUs (they are skipped in the debug_compile_units_iterator).
This patch fixes the failures by skipping locations from NoDebug CUs
when extracting lexical scopes.
Reviewers: dblaikie, aprantl
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D29765
llvm-svn: 295384
Resubmit -r295314 with PowerPC and AMDGPU tests updated.
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295336
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295314
For the hard float calling convention, we just use the D registers.
For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.
For pure soft float targets, we still bail out because we don't support the
libcalls yet.
llvm-svn: 295295
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.
Differential Revision: https://reviews.llvm.org/D29856
llvm-svn: 295262
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.
llvm-svn: 295255
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:
A B
|\ /|
| \ / |
| X |
| / \ |
|/ \|
C D
would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.
because of this we can tail duplicate to extend existing trellises.
As an example consider the following CFG:
B D F H
/ \ / \ / \ / \
A---C---E---G---Ret
Where A,C,E,G are all small (Currently 2 instructions).
The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.
The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret
define void @straight_test(i32 %tag) {
entry:
br label %test1
test1: ; A
%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
call void @a()
br label %test2
test2: ; C
%tagbit2 = and i32 %tag, 2
%tagbit2eq0 = icmp eq i32 %tagbit2, 0
br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
call void @b()
br label %test3
test3: ; E
%tagbit3 = and i32 %tag, 4
%tagbit3eq0 = icmp eq i32 %tagbit3, 0
br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
call void @c()
br label %test4
test4: ; G
%tagbit4 = and i32 %tag, 8
%tagbit4eq0 = icmp eq i32 %tagbit4, 0
br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
call void @d()
br label %exit
exit:
ret void
}
here is the layout after D27742:
straight_test: # @straight_test
; ... Prologue elided
; BB#0: # %entry ; A (merged with test1)
; ... More prologue elided
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_2
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_3
b .LBB0_4
.LBB0_2: # %optional1 ; B (copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_4
.LBB0_3: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_5
b .LBB0_6
.LBB0_4: # %optional2 ; D (copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_5: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
b .LBB0_7
.LBB0_6: # %optional3 ; F (copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit ; Ret
ld 30, 96(1) # 8-byte Folded Reload
addi 1, 1, 112
ld 0, 16(1)
mtlr 0
blr
The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.
This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.
Here is the resulting concrete layout:
straight_test: # @straight_test
; BB#0: # %entry ; A (merged with test1)
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_4
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_5
.LBB0_2: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_3: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
bne 0, .LBB0_7
b .LBB0_8
.LBB0_4: # %optional1 ; B (Copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_2
.LBB0_5: # %optional2 ; D (Copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_3
.LBB0_6: # %optional3 ; F (Copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit
Differential Revision: https://reviews.llvm.org/D28522
llvm-svn: 295223
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.
This fixes PR311956.
Differential Revision: https://reviews.llvm.org/D29961
llvm-svn: 295213
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.
This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.
Reviewers: zvi, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29926
llvm-svn: 295152
Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.
Reviewers: hans, iteratee, haicheng
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29153
llvm-svn: 295105
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.
llvm-svn: 295103
Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.
This fixes PR31935.
llvm-svn: 295093
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV
llvm-svn: 295081
And use it in MachineOptimizationRemarkEmitter. A test will follow on top of
Justin's changes to enable MachineORE in AsmPrinter.
The approach is similar to the IR-level pass. It's a bit simpler because BPI
is immutable at the Machine level so we don't need to make that lazy.
Because of this, a new function mapping is introduced (BPIPassTrait::getBPI).
This function extracts BPI from the pass. In case of the lazy pass, this is
when the calculation of the BFI occurs. For Machine-level, this is the
identity function.
Differential Revision: https://reviews.llvm.org/D29836
llvm-svn: 295072
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.
rdar://30495920
llvm-svn: 294982
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.
llvm-svn: 294970
Summary:
Keep a vector of LocInfos around; one for each call to EmitInlineAsm.
Since each call to EmitInlineAsm creates a new buffer in the inline asm
SourceMgr, we can use the buffer number to map to the right LocInfo.
Reviewers: rengolin, grosbach, rnk, echristo
Reviewed By: rnk
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D29769
llvm-svn: 294947
Before this patch compile time was about 21s (see below). After this patch
we have less than 2s (see bellow).
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
DAGCombiner - trunk
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.685s
DAGCombiner + Speed patch
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.655s
MachineCombiner w/o Speed patch
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m21.614s
MachineCombiner + Speed patch
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.593s
The test spill_fdiv.ll is attached to D29627
D29627 should be closed.
llvm-svn: 294936
The bug was introduced with:
https://reviews.llvm.org/rL294863
...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should
still be able to fold this setcc/compare.
llvm-svn: 294924
I don't know if anything other than x86 vectors is affected by this change, but this may allow
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.
The original motivation starts with a bug for DSE of masked stores that seems completely unrelated,
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210
Differential Revision: https://reviews.llvm.org/D29687
llvm-svn: 294863
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.
Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.
Reviewers: dberris, echristo, iteratee, kbarton, hfinkel
Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D29742
llvm-svn: 294781
The patch comes in 2 parts:
1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.
2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.
Fix for PR30760
Differential Revision: https://reviews.llvm.org/D29568
llvm-svn: 294749
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.
I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.
Having said that though, I think we should probably fix legalize vector ops to log what its doing.
Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc
Reviewed By: RKSimon
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29554
llvm-svn: 294711
LLVM defines `PTHREAD_LIB` which is used by AddLLVM.cmake and various projects
to correctly link the threading library when needed. Unfortunately
`PTHREAD_LIB` is defined by LLVM's `config-ix.cmake` file which isn't installed
and therefore can't be used when configuring out-of-tree builds. This causes
such builds to fail since `pthread` isn't being correctly linked.
This patch attempts to fix that problem by renaming and exporting
`LLVM_PTHREAD_LIB` as part of`LLVMConfig.cmake`. I renamed `PTHREAD_LIB`
because It seemed likely to cause collisions with downstream users of
`LLVMConfig.cmake`.
llvm-svn: 294690
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen. Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.
Reviewers: bogner, MatzeB, qcolombet
Subscribers: mcrosier, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D29680
llvm-svn: 294605
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.
This change adds an SSP-specific DiagnosticInfo class and uses of it to the Stack Protection code. A subsequent change to clang will cause the remarks to be emitted when enabled.
Patch by: James Henderson
Differential Revision: https://reviews.llvm.org/D29023
llvm-svn: 294590
It'll usually be immediately legalized back to a libcall, but occasionally
something can be done with it so we'd just as well enable that flexibility from
the start.
llvm-svn: 294530
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].
llvm-svn: 294519
Fixed test.
Summary:
Enables source location in diagnostic messages from the backend. This
is after parsing, during finalization. This requires the SourceMgr, the
inline assembly string buffer, and DiagInfo to still be alive after
EmitInlineAsm returns.
This patch creates a single SourceMgr for inline assembly inside the
AsmPrinter. MCContext gets a pointer to this SourceMgr. Using one
SourceMgr per call to EmitInlineAsm would make it difficult for
MCContext to figure out in which SourceMgr the SMLoc is located, while a
single SourceMgr can figure it out if it has multiple buffers.
The Str argument to EmitInlineAsm is copied into a buffer and owned by
the inline asm SourceMgr. This ensures that DiagHandlers won't print
garbage. (Clang emits a "note: instantiated into assembly here", which
refers to this string.)
The AsmParser gets destroyed before finalization, which means that the
DiagHandlers the AsmParser installs into the SourceMgr will be stale.
Restore the saved DiagHandlers.
Since now we're using just one SourceMgr for multiple inline asm
strings, we need to tell the AsmParser which buffer it needs to parse
currently. Hand a buffer id -- returned from SourceMgr::
AddNewSourceBuffer -- to the AsmParser.
Reviewers: rnk, grosbach, compnerd, rengolin, rovka, anemet
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29441
llvm-svn: 294458
It caused undefined behavior in VarLoc. As far as I investigated,
- VarLoc::VarLoc() treats negative offset value as InvalidKind.
Consider the case that (int64_t)MI.getOperand(1).getImm() is negative and whether it satisfies ((uint64_t)Offset < (1ULL << 32)).
- Comparison operators in VarLoc behave undefined since VarLoc::Loc.Hash is uninitialized in case of InvalidKind.
I guess Offset (in VarLoc) could be made aware of signed, but I am not sure.
So I have reverted it for now.
llvm-svn: 294447
Summary:
Enables source location in diagnostic messages from the backend. This
is after parsing, during finalization. This requires the SourceMgr, the
inline assembly string buffer, and DiagInfo to still be alive after
EmitInlineAsm returns.
This patch creates a single SourceMgr for inline assembly inside the
AsmPrinter. MCContext gets a pointer to this SourceMgr. Using one
SourceMgr per call to EmitInlineAsm would make it difficult for
MCContext to figure out in which SourceMgr the SMLoc is located, while a
single SourceMgr can figure it out if it has multiple buffers.
The Str argument to EmitInlineAsm is copied into a buffer and owned by
the inline asm SourceMgr. This ensures that DiagHandlers won't print
garbage. (Clang emits a "note: instantiated into assembly here", which
refers to this string.)
The AsmParser gets destroyed before finalization, which means that the
DiagHandlers the AsmParser installs into the SourceMgr will be stale.
Restore the saved DiagHandlers.
Since now we're using just one SourceMgr for multiple inline asm
strings, we need to tell the AsmParser which buffer it needs to parse
currently. Hand a buffer id -- returned from SourceMgr::
AddNewSourceBuffer -- to the AsmParser.
Reviewers: rnk, grosbach, compnerd, rengolin, rovka, anemet
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29441
llvm-svn: 294433