Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Teach the combiner helper how to flatten concat_vectors of build_vectors
into a build_vector.
Add this combine as part of AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69071
llvm-svn: 375066
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
This adds the initial plumbing to support optimisation remarks in
the IR hardware-loop pass.
I have left a todo in a comment where we can improve the reporting,
and will iterate on that now that we have this initial support in.
Differential Revision: https://reviews.llvm.org/D68579
llvm-svn: 374980
In LiveDebugVariables.cpp:
Prior to this patch, UserValues were grouped into linked list chains. Each
chain was the union of two sets: { A: Matching Source variable } or
{ B: Matching virtual register }. A ptr to the heads (or 'leaders')
of each of these chains were kept in a map with the { Source variable } used
as the key (set A predicate) and another with { Virtual register } as key
(set B predicate).
There was a search through the chains in the function getUserValue looking for
UserValues with matching { Source variable, Complex expression, Inlined-at
location }. Essentially searching for a subset of A through two interleaved
linked lists of set A and B. Importantly, by design, the subset will only
contain one or zero elements here. That is to say a UserValue can be uniquely
identified by the tuple { Source variable, Complex expression, Inlined-at
location } if it exists.
This patch removes the linked list and instead uses a DenseMap to map
the tuple { Source variable, Complex expression, Inlined-at location }
to UserValue ptrs so that the getUserValue search predicate is this map key.
The virtual register map now maps a vreg to a SmallVector<UserVal *> so that
set B is still available for quick searches.
Reviewers: aprantl, probinson, vsk, dblaikie
Reviewed By: aprantl
Subscribers: russell.gallop, gbedwell, bjope, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D68816
llvm-svn: 374979
Similar to r374970, but I don't have a test for this.
PromoteTargetBoolean is intended to be use for legalizing an
operand that needs to be promoted. It picks its type based on
the return from getSetccResultType and is intended to be used
when we have freedom to pick the new type. But the return type
we need for WidenVecOp_SETCC is completely determined by the
type of the input node.
llvm-svn: 374972
PromoteTargetBoolean calls getSetccResultType to get the return
type. But we were passing it the setcc result type rather than the
setcc input type. This causes an issue on X86 with avx512vl where
the setcc result type for vXf16 vectors is vXi16 while the
result type for vXi16 vectors is vXi1.
There's really no guarantee that getSetccResultType is the type
we need here. So now we just grab the extend type from
getExtendForContent and extend to the original result VT of the
node we're splitting.
llvm-svn: 374970
Examples:
i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1
This is a small generalization of a fold requested in PR43650:
https://bugs.llvm.org/show_bug.cgi?id=43650
The sign-bit of the condition operand can be used as a mask for the true operand:
https://rise4fun.com/Alive/paT
Note that we already handle some of the patterns (isNegative + scalar) because
there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd().
It doesn't use any TLI hooks, so I can't easily rip out that code even though we're
duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(),
so it should not cause problems for targets that prefer select over shift.
Also worth noting: I thought we could generalize this further to include the case where
the true operand of the select is not constant, but Alive says that may allow poison to
pass through where it does not in the original select form of the code.
Differential Revision: https://reviews.llvm.org/D68949
llvm-svn: 374902
Summary:
Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.
At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.
As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.
In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: aprantl
Subscribers: jyknight, fedor.sergeev, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67492
llvm-svn: 374881
Summary:
DWARF's DW_OP_entry_value operation has two operands; the first is a
ULEB128 operand that specifies the size of the second operand, which is
a DWARF block. This means that we need to be able to pre-calculate and
emit the size of DWARF expressions before emitting them. There is
currently no interface for doing this in DwarfExpression, so this patch
introduces that.
When implementing this I initially thought about running through
DwarfExpression's emission two times; first with a temporary buffer to
emit the expression, in order to being able to calculate the size of
that emitted data. However, DwarfExpression is a quite complex state
machine, so I decided against that, as it seemed like the two runs could
get out of sync, resulting in incorrect size operands. Therefore I have
implemented this in a way that we only have to run DwarfExpression once.
The idea is to emit DWARF to a temporary buffer, for which it is
possible to query the size. The data in the temporary buffer can then be
emitted to DwarfExpression's main output.
In the case of DIEDwarfExpression, a temporary DIE is used. The values
are all allocated using the same BumpPtrAllocator as for all other DIEs,
and the values are then transferred to the real value list. In the case
of DebugLocDwarfExpression, the temporary buffer is implemented using a
BufferByteStreamer which emits to a buffer in the DwarfExpression
object.
Reviewers: aprantl, vsk, NikolaPrica, djtodoro
Reviewed By: aprantl
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67768
llvm-svn: 374879
This patch kills off a significant user of the "IsIndirect" field of
DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is
techncally redundant as it can be expressed by the DIExpression of a
DBG_VALUE inst, and it isn't helpful to have two ways of expressing
things.
Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref
to the insts DIExpression. There should now be no appearences of
IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter,
which is ensured by an assertion in LDVImpl::handleDebugValue. This means
we also get to delete the IsIndirect handling in LiveDebugVariables. Tests
can be upgraded by for example swapping the following IsIndirect=True
DBG_VALUE:
DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo)
With one where the indirection is in the DIExpression, by _appending_
a deref:
DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref)
Which both mean the same thing.
Most of the test changes in this patch are updates of that form; also some
changes in how the textual assembly printer handles these insts.
Differential Revision: https://reviews.llvm.org/D68945
llvm-svn: 374877
This changes the 32-element SmallVector to a std::vector. When building
a RelWithDebInfo clang-8 binary, the average size of the vector was
~10000, so it does not seem very beneficial or practical to use a small
vector for that.
The DWARFBytes SmallVector grows in the same way as Comments, so perhaps
that also should be changed to a purely dynamically allocated structure,
but that requires some more code changes, so I let that remain as a
SmallVector for now.
llvm-svn: 374871
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
Summary:
This addresses a bug in collectCallSiteParameters() where call site
immediates would be truncated from int64_t to unsigned.
This fixes PR43525.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: aprantl
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D68869
llvm-svn: 374770
Add an extra parameter so the backend can take the alignment into
consideration.
Differential Revision: https://reviews.llvm.org/D68400
llvm-svn: 374763
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst.
llvm-svn: 374716
Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists.
Differential Revision: https://reviews.llvm.org/D68620
llvm-svn: 374600
The exciting code is actually already enough to handle the splitting
of vector arguments but we were lacking a test case.
This commit adds a test case for vector argument lowering involving
splitting and enable the related support in call lowering.
llvm-svn: 374589
Teach buildMerge how to deal with scalar to vector kind of requests.
Prior to this patch, buildMerge would issue either a G_MERGE_VALUES
when all the vregs are scalars or a G_CONCAT_VECTORS when the destination
vreg is a vector.
G_CONCAT_VECTORS was actually not the proper instruction when the source
vregs were scalars and the compiler would assert that the sources must
be vectors. Instead we want is to issue a G_BUILD_VECTOR when we are
in this situation.
This patch fixes that.
llvm-svn: 374588
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.
llvm-svn: 374555
In GISel we have both G_CONSTANT and G_FCONSTANT, but because
in GISel we don't really have a concept of Float vs Int value
the only difference between the two is where the data originates
from.
What both G_CONSTANT and G_FCONSTANT return is just a bag of bits
with the constant representation in it.
By making getConstantVRegVal() return G_FCONSTANTs bit representation
as well we allow ConstantFold and other things to operate with
G_FCONSTANT.
Adding tests that show ConstantFolding to work on mixed G_CONSTANT
and G_FCONSTANT sources.
Differential Revision: https://reviews.llvm.org/D68739
llvm-svn: 374458
This reverses the scalar canonicalization proposed in D63382.
Pre: isPowerOf2(C1)
%r = select i1 %cond, i32 C1, i32 0
=>
%z = zext i1 %cond to i32
%r = shl i32 %z, log2(C1)
https://rise4fun.com/Alive/Z50
x86 already tries to fold this pattern, but it isn't done
uniformly, so we still see a diff. AArch64 probably should
enable the TLI hook to benefit too, but that's a follow-on.
llvm-svn: 374397
The default promotion for the add_sat/sub_sat nodes currently does:
1. ANY_EXTEND iN to iM
2. SHL by M-N
3. [US][ADD|SUB]SAT
4. L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.
Differential Revision: https://reviews.llvm.org/D68643
llvm-svn: 374373
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 374340
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.
This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.
Differential revision: https://reviews.llvm.org/D67350
llvm-svn: 374301
Summary:
Visual Studio doesn't like it while stepping. It kicks you out of the
source view of the file being stepped through and tries to fall back to
the disassembly view.
Fixes PR43530
The fix is incomplete, because it's possible to have a basic block with
no source locations at all. In this case, we don't emit a .cv_loc, but
that will result in wrong stepping behavior in the debugger if the
layout predecessor of the location-less BB has an unrelated source
location. We could try harder to find a valid location that dominates or
post-dominates the current BB, but in general it's a dataflow problem,
and one still might not exist. I left a FIXME about this.
As an alternative, we might want to consider having the middle-end check
if its emitting codeview and get it to stop using line zero.
Reviewers: akhuang
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68747
llvm-svn: 374267
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.
As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.
This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.
Differential Revision: https://reviews.llvm.org/D68419
llvm-svn: 374261
Add own version of the mathematical constants from the upcoming C++20 `std::numbers`.
Differential revision: https://reviews.llvm.org/D68257
llvm-svn: 374207
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 374085
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.
Reviewers: aprantl, vsk, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D66955
llvm-svn: 374068
* Adds a TypeSize struct to represent the known minimum size of a type
along with a flag to indicate that the runtime size is a integer multiple
of that size
* Converts existing size query functions from Type.h and DataLayout.h to
return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
to uint64_t) so that most existing code 'just works' as if the return
values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
supported instructions used with scalable vectors can be constructed
in IR.
Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen
Reviewed By: rovka, sdesmalen
Differential Revision: https://reviews.llvm.org/D53137
llvm-svn: 374042
Summary:
When getValueInMiddleOfBlock happens to be called for a basic block
that has no incoming value at all, an IMPLICIT_DEF is inserted in that
block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at
the top of its basic block or it will likely not reach the use that
the caller intends to insert.
Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204
Reviewers: arsenm, rampitec
Subscribers: jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68183
llvm-svn: 374040
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.
Patch by Dwight Guth <dwight.guth@runtimeverification.com>!
Reviewed By: lebedev.ri, paquette, rnk
Differential Revision: https://reviews.llvm.org/D67855
llvm-svn: 373976
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.
Fixes PR43217
Differential Revision: https://reviews.llvm.org/D68544
llvm-svn: 373871
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.
Reviewers: RKSimon, spatel, efriedma
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68359
llvm-svn: 373863
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 373850
This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.
Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well.
Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant".
llvm-svn: 373814
This reverts r371177 (git commit f879c68755)
It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.
See the PR for a test case to use when relanding.
llvm-svn: 373805
Outlining from noreturn functions doesn't do the correct thing right now. The
outliner should respect that the caller is marked noreturn. In the event that
we have a noreturn function, and the outlined code is in tail position, the
outliner will not see that the outlined function should be tail called. As a
result, you end up with a regular call containing a return.
Fixing this requires that we check that all candidates live inside noreturn
functions. So, for the sake of correctness, don't outline from noreturn
functions right now.
Add machine-outliner-noreturn.mir to test this.
llvm-svn: 373791
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node. Make sure this assumption actually holds.
Fixes a "Node emitted out of order - early" assertion on the testcase.
This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.
Differential Revision: https://reviews.llvm.org/D68068
llvm-svn: 373782
This is a trivial point fix. Terminator instructions aren't scheduled, so
we shouldn't expect to be able to remap them.
This doesn't affect Hexagon and PPC because their terminators are always
hardware loop backbranches that have no register operands.
llvm-svn: 373762
Rather than having a mixture of location-state shared between DBG_VALUEs
and VarLoc objects in LiveDebugValues, this patch makes VarLoc the
master record of variable locations. The refactoring means that the
transfer of locations from one place to another is always a performed by
an operation on an existing VarLoc, that produces another transferred
VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once
all locations are known. As a plus, there is now only one method where
DBG_VALUEs can be created.
The test case added covers a circumstance that is now impossible to
express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt,
previously it would have been restored-from-spill as a direct DBG_VALUE.
We now don't lose this information along the way, as VarLocs always
refer back to the "original" non-transfer DBG_VALUE, and we can always
work out whether a location was "originally" indirect.
Differential Revision: https://reviews.llvm.org/D67398
llvm-svn: 373727
When transfering variable locations from one place to another,
LiveDebugValues immediately creates a DBG_VALUE representing that
transfer. This causes trouble if the variable location should
subsequently be invalidated by a loop back-edge, such as in the added
test case: the transfer DBG_VALUE from a now-invalid location is used
as proof that the variable location is correct. This is effectively a
self-fulfilling prophesy.
To avoid this, defer the insertion of transfer DBG_VALUEs until after
analysis has completed. Some of those transfers are still sketchy, but
we don't propagate them into other blocks now.
Differential Revision: https://reviews.llvm.org/D67393
llvm-svn: 373720
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.
In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.
Differential Revision: https://reviews.llvm.org/D68397
llvm-svn: 373666
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.
This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.
Fixes expensive-checks build.
llvm-svn: 373629
Brings this struct in line with the RangeSpan class so they might
eventually be used by common template code for generating range/loc
lists with less duplicate code.
llvm-svn: 373540
This is an effort to make RangeSpan and DebugLocStream::Entry more
similar to share code for their emission (to reuse the more complicated
code for using (& choosing when to use) base address selection entries,
etc).
It didn't seem like this struct was worth the complexity of
encapsulation - when the members could be initialized by the ctor to any
value (no validation) and the type is assignable (so there's no
mutability or other constraint being implemented by its interface).
llvm-svn: 373533
As noted on PR41772, the static analyzer reports that the MachineMemOperand::print partial wrappers set a number of args to null pointers that were then dereferenced in the actual implementation.
It turns out that these wrappers are not being used at all (hence why we're not seeing any crashes), so I'd like to propose we just get rid of them.
Differential Revision: https://reviews.llvm.org/D68208
llvm-svn: 373484
This was reverted in r373454 due to breaking the expensive-checks bot.
This version addresses that by omitting the addSuccessorWithProb() call
when omitting the range check.
> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131
llvm-svn: 373477
Summary:
This extends the PeelingModuloScheduleExpander to generate prolog and epilog code,
and correctly stitch uses through the prolog, kernel, epilog DAG.
The key concept in this patch is to ensure that all transforms are *local*; only a
function of a block and its immediate predecessor and successor. By defining the problem in this way
we can inductively rewrite the entire DAG using only local knowledge that is easy to
reason about.
For example, we assume that all prologs and epilogs are near-perfect clones of the
steady-state kernel. This means that if a block has an instruction that is predicated out,
we can redirect all users of that instruction to that equivalent instruction in our
immediate predecessor. As all blocks are clones, every instruction must have an equivalent in
every other block.
Similarly we can make the assumption by construction that if a value defined in a block is used
outside that block, the only possible user is its immediate successors. We maintain this
even for values that are used outside the loop by creating a limited form of LCSSA.
This code isn't small, but it isn't complex.
Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet;
I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns
that we don't produce. Those still need a bit more investigation. In the meantime we
(Google) are happy with the code produced by this on our downstream SMS implementation,
and believe it generates correct code.
Subscribers: mgorny, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68205
llvm-svn: 373462
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Function> directly and if not assert will fire for us.
llvm-svn: 373449
This is modeled after the same functionality for jump tables, which was
added in r357067.
Differential revision: https://reviews.llvm.org/D68131
llvm-svn: 373431
Summary:
PHIElimination modifies CFG and marks MachineDominatorTree as preserved. Therefore, it the CFG changes it should also update the MDT, when available. This patch teaches PHIElimination to recalculate MDT when necessary.
This fixes the `tailmerging_in_mbp.ll` test failure discovered after switching to generic DomTree verification algorithm in MachineDominators in D67976.
Reviewers: arsenm, hliao, alex-t, rampitec, vpykhtin, grosser
Reviewed By: rampitec
Subscribers: MatzeB, wdng, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68154
llvm-svn: 373377
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks.
The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).
Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).
Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch.
Differential Revision: https://reviews.llvm.org/D67557
llvm-svn: 373343
Summary:
This patch implements Machine PostDominator Tree verification and ensures that the verification doesn't fail the in-tree tests.
MPDT verification can be enabled using `verify-machine-dom-info` -- the same flag used by Machine Dominator Tree verification.
Flipping the flag revealed that MachineSink falsely claimed to preserve CFG and MDT/MPDT. This patch fixes that.
Reviewers: arsenm, hliao, rampitec, vpykhtin, grosser
Reviewed By: hliao
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68235
llvm-svn: 373341
SelectionDAG has a bunch of machinery to defer this to selection time
for some reason. Just directly emit a copy during IRTranslator. The
x86 usage does somewhat questionably check hasFP, which could depend
on the whole function being at minimum translated.
This does lose the convergent bit if the callsite had it, which may be
a problem. We also lose that in general for intrinsics, which may also
be a problem.
llvm-svn: 373294
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
Summary:
It seems we missed that the target hook can't query the known-bits for the
inputs to a target instruction. Fix that oversight
Reviewers: aditya_nandakumar
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67380
llvm-svn: 373264
Existing clients are converted to use MachineModuleInfoWrapperPass. The
new interface is for defining a new pass manager API in CodeGen.
Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm
Reviewed By: arsenm, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D64183
llvm-svn: 373240
This adds support for lowering variadic musttail calls. To do this, we have
to...
- Detect a musttail call in a variadic function before attempting to lower the
call's formal arguments. This is done in the IRTranslator.
- Compute forwarded registers in `lowerFormalArguments`, and add copies for
those registers.
- Restore the forwarded registers in `lowerTailCall`.
Because there doesn't seem to be any nice way to wrap these up into the outgoing
argument handler, the restore code in `lowerTailCall` is done separately.
Also, irritatingly, you have to make sure that the registers don't overlap with
any passed parameters. Otherwise, the scheduler doesn't know what to do with the
extra copies and asserts.
Add call-translator-variadic-musttail.ll to test this. This is pretty much the
same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to
base this off of, but the idea is the same.
Differential Revision: https://reviews.llvm.org/D68043
llvm-svn: 373226
trigger stack protectors. Fixes PR42238.
Add test coverage for llvm.memset, as proxy for all llvm.mem*
intrinsics. There are two issues here: (1) they could be lowered to a
libc call, which could be intercepted, and do Bad Stuff; (2) with a
non-constant size, they could overwrite the current stack frame.
The test was mostly written by Matt Arsenault in r363169, which was
later reverted; I tweaked what he had and added the llvm.memset part.
Differential Revision: https://reviews.llvm.org/D67845
llvm-svn: 373220
"Captured" and "relevant to Stack Protector" are not the same thing.
This reverts commit f29366b1f5.
aka r363169.
Differential Revision: https://reviews.llvm.org/D67842
llvm-svn: 373216
Summary:
Previously IntrinsicInfo::size was an unsigned what can't represent the
64 bit value used by MemoryLocation::UnknownSize.
Reviewers: jmolloy
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68219
llvm-svn: 373214
ISD::SADDO uses the suggested sequence described in the section §2.4 of
the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for
(non-zero) positive.
Differential Revision: https://reviews.llvm.org/D47927
llvm-svn: 373187
We need to propagate this information from the IR in order to be able to safely
do tail call optimizations on the intrinsics during legalization. Assuming
it's safe to do tail call opt without checking for the marker isn't safe because
the mem libcall may use allocas from the caller.
This adds an extra immediate operand to the end of the intrinsics and fixes the
legalizer to handle it.
Differential Revision: https://reviews.llvm.org/D68151
llvm-svn: 373140
Summary: This is a cleanup patch for MachineDominatorTree. It would be an NFC, except for replacing custom DomTree verification with the generic one.
Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao
Reviewed By: arsenm
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67976
llvm-svn: 373101
Abandon describing of loaded values due to safety concerns. Loaded
values are described as derefed memory location at caller point.
At callee we can unintentionally change that memory location which
would lead to different entry being printed value before and after
the memory location clobbering. This problem is described in
llvm.org/PR43343.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D67717
llvm-svn: 373089
Summary:
An erroneously negated if-statement by an earlier (March 2019) bugfix left phi replacement/simplification under optimizeMemoryInst() in CodeGenPrepare largely inactivated. The error was found when csmith found that the same assert as in the original bug report could still be triggered in a different way. This patch fixes the bugfix. The original bug was:
https://bugs.llvm.org/show_bug.cgi?id=41052
... and the previous fix was D59358.
Reviewers: aprantl, skatkov
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67838
llvm-svn: 373084
This caused severe compile-time regressions, see PR43455.
> Modern processors predict the targets of an indirect branch regardless of
> the size of any jump table used to glean its target address. Moreover,
> branch predictors typically use resources limited by the number of actual
> targets that occur at run time.
>
> This patch changes the semantics of the option `-max-jump-table-size` to limit
> the number of different targets instead of the number of entries in a jump
> table. Thus, it is now renamed to `-max-jump-table-targets`.
>
> Before, when `-max-jump-table-size` was specified, it could happen that
> cluster jump tables could have targets used repeatedly, but each one was
> counted and typically resulted in tables with the same number of entries.
> With this patch, when specifying `-max-jump-table-targets`, tables may have
> different lengths, since the number of unique targets is counted towards the
> limit, but the number of unique targets in tables is the same, but for the
> last one containing the balance of targets.
>
> Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 373060
This patch emits the function descriptor csect for functions with definitions
under both 32-bit/64-bit mode on AIX.
Differential Revision: https://reviews.llvm.org/D66724
llvm-svn: 373009
Summary:
Previously the case
EBB
| \_
| |
| TBB
| /
FBB
was treated as a valid triangle also when TBB and FBB was the same basic
block. This could then lead to an invalid CFG when we removed the edge
from EBB to TBB, since that meant we would also remove the edge from EBB
to FBB.
Since TBB == FBB is quite a degenerated case of a triangle, we now
don't treat it as a valid triangle anymore, and thus we will avoid the
trouble with updating the CFG.
Reviewers: efriedma, dmgreen, kparzysz
Reviewed By: efriedma
Subscribers: bjope, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67832
llvm-svn: 372943
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.
Differential Revision: https://reviews.llvm.org/D67121
llvm-svn: 372935
When checking for tail call eligibility, we should use the correct CCAssignFn
for each argument, rather than just checking if the caller/callee is varargs or
not.
This is important for tail call lowering with varargs. If we don't check it,
then basically any varargs callee with parameters cannot be tail called on
Darwin, for one thing. If the parameters are all guaranteed to be in registers,
this should be entirely safe.
On top of that, not checking for this could potentially make it so that we have
the wrong stack offsets when checking for tail call eligibility.
Also refactor some of the stuff for CCAssignFnForCall and pull it out into a
helper function.
Update call-translator-tail-call.ll to show that we can now correctly tail call
on Darwin. Also add two extra tail call checks. The first verifies that we still
respect the caller's stack size, and the second verifies that we still don't
tail call when a varargs function has a memory argument.
Differential Revision: https://reviews.llvm.org/D67939
llvm-svn: 372897
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address. Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.
This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table. Thus, it is now renamed to `-max-jump-table-targets`.
Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.
Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 372893
We might be able to do better on the example in the test,
but in general, we should not scalarize a splatted vector
binop if there are other uses of the binop. Otherwise, we
can end up with code as we had - a scalar op that is
redundant with a vector op.
llvm-svn: 372886
Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future.
Committed on behalf of @hvdijk (Harald van Dijk)
Differential Revision: https://reviews.llvm.org/D66138
llvm-svn: 372882
Summary:
This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class.
The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly.
Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots.
Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao
Reviewed By: hliao
Subscribers: hliao, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #amdgpu, #llvm
Differential Revision: https://reviews.llvm.org/D67974
llvm-svn: 372874
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Differential Revision: https://reviews.llvm.org/D67564
llvm-svn: 372866
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CallInst> directly and if not assert will fire for us.
llvm-svn: 372720
Summary:
The functions different in two ways:
- getLLVMRegNum could return both "eh" and "other" dwarf register
numbers, while getLLVMRegNumFromEH only returned the "eh" number.
- getLLVMRegNum asserted if the register was not found, while the second
function returned -1.
The second distinction was pretty important, but it was very hard to
infer that from the function name. Aditionally, for the use case of
dumping dwarf expressions, we needed a function which can work with both
kinds of number, but does not assert.
This patch solves both of these issues by merging the two functions into
one, returning an Optional<unsigned> value. While the same thing could
be achieved by adding an "IsEH" argument to the (renamed)
getLLVMRegNumFromEH function, it seemed better to avoid the confusion of
two functions and put the choice of asserting into the hands of the
caller -- if he checks the Optional value, he can safely process
"untrusted" input, and if he blindly dereferences the Optional, he gets
the assertion.
I've updated all call sites to the new API, choosing between the two
options according to the function they were calling originally, except
that I've updated the usage in DWARFExpression.cpp to use the "safe"
method instead, and added a test case which would have previously
triggered an assertion failure when processing (incorrect?) dwarf
expressions.
Reviewers: dsanders, arsenm, JDevlieghere
Subscribers: wdng, aprantl, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67154
llvm-svn: 372710
We were miscompiling switch value comparisons with the wrong signedness, which
shows up when we have things like switch case values with i1 types, which end up
being legalized incorrectly.
Fixes PR43383
llvm-svn: 372675
This came up in the x86-specific:
https://bugs.llvm.org/show_bug.cgi?id=43239
...but it is a general problem for the BreakFalseDeps pass.
Dependencies may be broken by adding some other instruction,
so that should be avoided if the overall goal is to minimize size.
Differential Revision: https://reviews.llvm.org/D67363
llvm-svn: 372628
This intrinsics should be shift by immediate, but gcc allows any
i32 scalar and clang needs to match that. So we try to detect the
non-constant case and move the data from an integer register to an
MMX register.
Previously this was done by creating a v2i32 build_vector and
bitcast in SelectionDAGBuilder. This had to be done early since
v2i32 isn't a legal type. The bitcast+build_vector would be DAG
combined to X86ISD::MMX_MOVW2D which isel will turn into a
GPR->MMX MOVD.
This commit just moves the whole thing to lowering and emits
the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The
test changes just seem to be due to nodes being linearized in a
different order.
llvm-svn: 372535
Recommit: fix asan errors.
The way MachinePipeliner uses these target hooks is stateful - we reduce trip
count by one per call to reduceLoopCount. It's a little overfit for hardware
loops, where we don't have to worry about stitching a loop induction variable
across prologs and epilogs (the induction variable is implicit).
This patch introduces a new API:
/// Analyze loop L, which must be a single-basic-block loop, and if the
/// conditions can be understood enough produce a PipelinerLoopInfo object.
virtual std::unique_ptr<PipelinerLoopInfo>
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const;
The return value is expected to be an implementation of the abstract class:
/// Object returned by analyzeLoopForPipelining. Allows software pipelining
/// implementations to query attributes of the loop being pipelined.
class PipelinerLoopInfo {
public:
virtual ~PipelinerLoopInfo();
/// Return true if the given instruction should not be pipelined and should
/// be ignored. An example could be a loop comparison, or induction variable
/// update with no users being pipelined.
virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0;
/// Create a condition to determine if the trip count of the loop is greater
/// than TC.
///
/// If the trip count is statically known to be greater than TC, return
/// true. If the trip count is statically known to be not greater than TC,
/// return false. Otherwise return nullopt and fill out Cond with the test
/// condition.
virtual Optional<bool>
createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
SmallVectorImpl<MachineOperand> &Cond) = 0;
/// Modify the loop such that the trip count is
/// OriginalTC + TripCountAdjust.
virtual void adjustTripCount(int TripCountAdjust) = 0;
/// Called when the loop's preheader has been modified to NewPreheader.
virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0;
/// Called when the loop is being removed.
virtual void disposed() = 0;
};
The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while
allowing the target to hold its own state across all calls. This API, in
particular the disjunction of creating a trip count check condition and
adjusting the loop, improves the code quality in ModuloSchedule.cpp.
llvm-svn: 372463
We currently always set the HasCalls on MFI during translation and legalization if
we're handling a call or legalizing to a libcall. However, if that call is later
optimized to a tail call then we don't need the flag. The flag being set to true
causes frame lowering to always save and restore FP/LR, which adds unnecessary code.
This change does the same thing as SelectionDAG and ports over some code that scans
instructions after selection, using TargetInstrInfo to determine if target opcodes
are known calls.
Code size geomean improvements on CTMark:
-O0 : 0.1%
-Os : 0.3%
Differential Revision: https://reviews.llvm.org/D67868
llvm-svn: 372443
Summary:
After the switch in SimplifyDemandedBits, it tries to create a
constant when possible. If the original node is a TargetConstant
the default in the switch will call computeKnownBits on the
TargetConstant which will succeed. This results in the
TargetConstant becoming a Constant. But TargetConstant exists to
avoid being changed.
I've fixed the two cases that relied on this in tree by explicitly
making the nodes constant instead of target constant. The Sparc
case is an old bug. The Mips case was recently introduced now that
ImmArg on intrinsics gets turned into a TargetConstant when the
SelectionDAG is created. I've removed the ImmArg since it lowers
to generic code.
Reviewers: arsenm, RKSimon, spatel
Subscribers: jyknight, sdardis, wdng, arichardson, hiraditya, fedor.sergeev, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67802
llvm-svn: 372409
The insertion of an unconditional branch during FastISel can differ depending on
building with or without debug information. This happens because FastISel::fastEmitBranch
emits an unconditional branch depending on the size of the current basic block
without distinguishing between debug and non-debug instructions.
This patch fixes this issue by ignoring debug instructions when getting the size
of the basic block.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: ormris, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67703
llvm-svn: 372389
The way MachinePipeliner uses these target hooks is stateful - we reduce trip
count by one per call to reduceLoopCount. It's a little overfit for hardware
loops, where we don't have to worry about stitching a loop induction variable
across prologs and epilogs (the induction variable is implicit).
This patch introduces a new API:
/// Analyze loop L, which must be a single-basic-block loop, and if the
/// conditions can be understood enough produce a PipelinerLoopInfo object.
virtual std::unique_ptr<PipelinerLoopInfo>
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const;
The return value is expected to be an implementation of the abstract class:
/// Object returned by analyzeLoopForPipelining. Allows software pipelining
/// implementations to query attributes of the loop being pipelined.
class PipelinerLoopInfo {
public:
virtual ~PipelinerLoopInfo();
/// Return true if the given instruction should not be pipelined and should
/// be ignored. An example could be a loop comparison, or induction variable
/// update with no users being pipelined.
virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0;
/// Create a condition to determine if the trip count of the loop is greater
/// than TC.
///
/// If the trip count is statically known to be greater than TC, return
/// true. If the trip count is statically known to be not greater than TC,
/// return false. Otherwise return nullopt and fill out Cond with the test
/// condition.
virtual Optional<bool>
createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
SmallVectorImpl<MachineOperand> &Cond) = 0;
/// Modify the loop such that the trip count is
/// OriginalTC + TripCountAdjust.
virtual void adjustTripCount(int TripCountAdjust) = 0;
/// Called when the loop's preheader has been modified to NewPreheader.
virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0;
/// Called when the loop is being removed.
virtual void disposed() = 0;
};
The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while
allowing the target to hold its own state across all calls. This API, in
particular the disjunction of creating a trip count check condition and
adjusting the loop, improves the code quality in ModuloSchedule.cpp.
llvm-svn: 372376
If an instruction had multiple subregister defs, and one of them was
undef, this would improperly conclude all other lanes are
killed. There could still be other defs of those read-undef lanes in
other operands. This would improperly remove register uses from
CurrentVRegUses, so the visitation of later operands would not find
the necessary register dependency. This would also mean this would
fail or not depending on how different subregister def operands were
ordered.
On an undef subregister def, scan the instruction for other
subregister defs and avoid killing those.
This possibly should be deferring removing anything from
CurrentVRegUses until the entire instruction has been processed
instead.
llvm-svn: 372362
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation.
The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).
Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).
I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches.
Differential Revision: https://reviews.llvm.org/D67557
llvm-svn: 372333
As commented on D67557 we have a lot of uses of depth checks all using magic numbers.
This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly.
Differential Revision: https://reviews.llvm.org/D67711
llvm-svn: 372315
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
DIFlagBlockByRefStruct is an unused DIFlag that originally was used by
clang to express (Objective-)C block captures in debug info. For the
last year Clang has been emitting complex DIExpressions to describe
block captures instead, which makes all the code supporting this flag
redundant.
This patch removes the flag and all supporting "dead" code, so we can
reuse the bit for something else in the future.
Since this only affects debug info generated by Clang with the block
extension this mostly affects Apple platforms and I don't have any
bitcode compatibility concerns for removing this. The Verifier will
reject debug info that uses the bit and thus degrade gracefully when
LTO'ing older bitcode with a newer compiler.
rdar://problem/44304813
Differential Revision: https://reviews.llvm.org/D67453
llvm-svn: 372272
Summary:
`DAGCombiner::visitADDLikeCommutative()` already has a sibling fold:
`(add X, Carry) -> (addcarry X, 0, Carry)`
This fold, as suggested by @efriedma, helps recover from //some//
of the regressions of D62266
Reviewers: efriedma, deadalnix
Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62392
llvm-svn: 372259
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
Reviewers: omjavaid, eli.friedman, thegameg, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D66935
llvm-svn: 372204
The filename in the RemarkStreamer should be optional to allow clients
to stream remarks to memory or to existing streams.
This introduces a new overload of `setupOptimizationRemarks`, and avoids
enforcing the presence of a filename at different places.
llvm-svn: 372195
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
r371901 was overeager and widenScalarDst() and the like in the legalizer
attempt to increment the insert point given in order to add new instructions
after the currently legalizing inst. In cases where the insertion point is not
exactly the current instruction, then callers need to de-compensate for the
behaviour by decrementing the insertion iterator before calling them. It's not
a nice state of affairs, for now just undo the problematic parts of the change.
llvm-svn: 372050
For some reason we sometimes insert new instructions one instruction before
the first non-PHI when legalizing. This can result in having non-PHI
instructions before PHIs, which mean that PHI elimination doesn't catch them.
Differential Revision: https://reviews.llvm.org/D67570
llvm-svn: 371901
Because memory intrinsics are handled differently than other calls, we need to
check them for tail call eligiblity in the legalizer. This allows us to still
inline them when it's beneficial to do so, but also tail call when possible.
This adds simple tail calling support for when the intrinsic is followed by a
return.
It ports the attribute checks from `TargetLowering::isInTailCallPosition` into
a similarly-named function in LegalizerHelper.cpp. The target-specific
`isUsedByReturnOnly` hook is not ported here.
Update tailcall-mem-intrinsics.ll to show that GlobalISel can now tail call
memory intrinsics.
Update legalize-memcpy-et-al.mir to have a case where we don't tail call.
Differential Revision: https://reviews.llvm.org/D67566
llvm-svn: 371893
This was added to support fp128 on x86-64, but appears to be
unneeded now. This may be because the FR128 register class
added back then was merged with the VR128 register class later.
llvm-svn: 371815
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.
Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.
llvm-svn: 371800
This testcase is invalid, and caught by the verifier. For the verifier
to catch it, the live interval computation needs to complete. Remove
the assert so the verifier catches this, which is less confusing.
In this testcase there is an undefined use of a subregister, and lanes
which aren't used or defined. An equivalent testcase with the
super-register shrunk to have no untouched lanes already hit this
verifier error.
llvm-svn: 371792
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context.
Differential Revision: https://reviews.llvm.org/D66318
llvm-svn: 371786
This adds support for lowering sibling calls with outgoing arguments.
e.g
```
define void @foo(i32 %a)
```
Support is ported from AArch64ISelLowering's `isEligibleForTailCallOptimization`.
The only thing that is missing is a full port of
`TargetLowering::parametersInCSRMatch`. So, if we're using swiftself,
we'll never tail call.
- Rename `analyzeCallResult` to `analyzeArgInfo`, since the function is now used
for both outgoing and incoming arguments
- Teach `OutgoingArgHandler` about tail calls. Tail calls use frame indices for
stack arguments.
- Teach `lowerFormalArguments` to set the bytes in the caller's stack argument
area. This is used later to check if the tail call's parameters will fit on
the caller's stack.
- Add `areCalleeOutgoingArgsTailCallable` to perform the eligibility check on
the callee's outgoing arguments.
For testing:
- Update call-translator-tail-call to verify that we can now tail call with
outgoing arguments, use G_FRAME_INDEX for stack arguments, and respect the
size of the caller's stack
- Remove GISel-specific check lines from speculation-hardening.ll, since GISel
now tail calls like the other selectors
- Add a GISel test line to tailcall-string-rvo.ll since we can tail call in that
test now
- Add a GISel test line to tailcall_misched_graph.ll since we tail call there
now. Add specific check lines for GISel, since the debug output from the
machine-scheduler differs with GlobalISel. The dependency still holds, but
the output comes out in a different order.
Differential Revision: https://reviews.llvm.org/D67471
llvm-svn: 371780
The X86 decision assumes the compare will produce a result in an XMM
register, but that can't happen for an fp128 compare since those
go to a libcall the returns an i32. Pass the VT so X86 can check
the type.
llvm-svn: 371775
This code was changed to accomodate fp128 being softened to itself
during type legalization on x86-64. This was done in order to create
libcalls while having fp128 as a legal type. We're now doing the
libcall creation during LegalizeDAG and the type legalization changes
to enable the old behavior have been removed. So this change to
SelectionDAGBuilder is no longer needed.
llvm-svn: 371771
In MVE, as of rL371218, we are attempting to sink chains of instructions such as:
%l1 = insertelement <8 x i8> undef, i8 %l0, i32 0
%broadcast.splat26 = shufflevector <8 x i8> %l1, <8 x i8> undef, <8 x i32> zeroinitializer
In certain situations though, we can end up breaking the dominance relations of
instructions. This happens when we sink the instruction into a loop, but cannot
remove the originals. The Use is updated, which might in fact be a Use from the
second instruction to the first.
This attempts to fix that by reversing the order of instruction that are sunk,
and ensuring that we update the uses on new instructions if they have already
been sunk, not the old ones.
Differential Revision: https://reviews.llvm.org/D67366
llvm-svn: 371743
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, JDevlieghere, alexshap, rupprecht, jhenderson
Subscribers: sdardis, nemanjai, hiraditya, kbarton, jakehehrlich, jrtc27, MaskRay, atanasyan, jsji, seiya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D67499
llvm-svn: 371742
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
Up to now, we've decided whether to sink address calculations using GEPs or
normal arithmetic based on the useAA hook, but there are other reasons GEPs
might be preferred. So this patch splits the two questions, with a default
implementation falling back to useAA.
llvm-svn: 371721
Current implementation of estimating divisions loses precision since it
estimates reciprocal first and does multiplication. This patch is to re-order
arithmetic operations in the last iteration in DAGCombiner to improve the
accuracy.
Reviewed By: Sanjay Patel, Jinsong Ji
Differential Revision: https://reviews.llvm.org/D66050
llvm-svn: 371713
This was previously used to turn fp128 operations into libcalls
on X86. This is now done through op legalization after r371672.
This restores much of this code to before r254653.
llvm-svn: 371709
First we were asserting that the ValNo of a VA was the wrong value. It doesn't actually
make a difference for us in CallLowering but fix that anyway to silence the assert.
The bigger issue was that after fixing the assert we were generating invalid MIR
because the merging/unmerging of values split across multiple registers wasn't
also implemented for memory locs. This happens when we run out of registers and
have to pass the split types like i128 -> i64 x 2 on the stack. This is do-able, but
for now just fall back.
llvm-svn: 371693
Emit debug entry values using standard DWARF5 opcodes when the debugger
tuning is set to lldb.
Differential Revision: https://reviews.llvm.org/D67410
llvm-svn: 371666
If there are multiple dead defs of the same virtual register, these
are required to be split into multiple virtual registers with separate
live intervals to avoid a verifier error.
llvm-svn: 371640
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
This fixes a crash in tail call translation caused by assume and lifetime_end
intrinsics.
It's possible to have instructions other than a return after a tail call which
will still have `Analysis::isInTailCallPosition` return true. (Namely,
lifetime_end and assume intrinsics.)
If we emit a tail call, we should stop translating instructions in the block.
Otherwise, we can end up emitting an extra return, or dead instructions in
general. This makes the verifier unhappy, and is generally unfortunate for
codegen.
This also removes the code from AArch64CallLowering that checks if we have a
tail call when lowering a return. This is covered by the new code now.
Also update call-translator-tail-call.ll to show that we now properly tail call
in the presence of lifetime_end and assume.
Differential Revision: https://reviews.llvm.org/D67415
llvm-svn: 371572
Add support for sibcalling calls whose calling convention differs from the
caller's.
- Port over `CCState::resultsCombatible` from CallingConvLower.cpp into
CallLowering. This is used to verify that the way the caller and callee CC
handle incoming arguments matches up.
- Add `CallLowering::analyzeCallResult`. This is basically a port of
`CCState::AnalyzeCallResult`, but using `ArgInfo` rather than `ISD::InputArg`.
- Add `AArch64CallLowering::doCallerAndCalleePassArgsTheSameWay`. This checks
that the calling conventions are compatible, and that the caller and callee
preserve the same registers.
For testing:
- Update call-translator-tail-call.ll to show that we can now handle this.
- Add a GISel line to tailcall-ccmismatch.ll to show that we will not tail call
when the regmasks don't line up.
Differential Revision: https://reviews.llvm.org/D67361
llvm-svn: 371570
This can only happen on X86 when fp128 is a legal type, but we
go through softening to generate libcalls. This causes fp128 to
be softened to fp128 instead of an integer type. This can be
removed if D67128 lands.
llvm-svn: 371493
This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context.
Note that this patch is NFC unless the experimental flag is set.
The basic strategy I plan on taking is:
introduce infrastructure and a flag for testing (this patch)
Audit uses of isVolatile, and apply isAtomic conservatively*
piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection
flip the flag at the end (with minimal diffs)
Work through todo list identified in (2) and (3) exposing performance ops
(*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there.
We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL).
Differential Revision: https://reviews.llvm.org/D66309
llvm-svn: 371441
If analyzeBranch fails, on some targets, the out parameters point to
some blocks in the function. But we can't use that information, so make
sure to clear it out. (In some places in IfConversion, we assume that
any block with a TrueBB is analyzable.)
The change to the testcase makes it trigger a bug on builds without this
fix: IfConvertDiamond tries to perform a followup "merge" operation,
which isn't legal, and we somehow end up with a branch to a deleted MBB.
I'm not sure how this doesn't crash the compiler.
Differential Revision: https://reviews.llvm.org/D67306
llvm-svn: 371434
Reapply with fix to reduce resources required by the compiler - use
unsigned[2] instead of std::pair. This causes clang and gcc to compile
the generated file multiple times faster, and hopefully will reduce
the resource requirements on Visual Studio also. This fix is a little
ugly but it's clearly the same issue the previous author of
DFAPacketizer faced (the previous tables use unsigned[2] rather uglily
too).
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371399
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
........
Reverted as this is causing "compiler out of heap space" errors on MSVC 2017/19 NDEBUG builds
llvm-svn: 371393
Loosely based on DAGCombiner version, but this part is slightly simpler in
GlobalIsel because all address calculation is performed by G_GEP. That makes
the inc/dec distinction moot so there's just pre/post to think about.
No targets can handle it yet so testing is via a special flag that overrides
target hooks.
llvm-svn: 371384
Summary:
After tailduplication, we have redundant copies. We can remove these
copies in machine-cp if it's safe to, i.e.
```
$reg0 = OP ...
... <<< No read or clobber of $reg0 and $reg1
$reg1 = COPY $reg0 <<< $reg0 is killed
...
<RET>
```
will be transformed to
```
$reg1 = OP ...
...
<RET>
```
Differential Revision: https://reviews.llvm.org/D65267
llvm-svn: 371359
Summary:
Add zero-materializing XORs to X86's describeLoadedValue() hook in order
to produce call site values.
I have had to change the defs logic in collectCallSiteParameters() a bit
to be able to describe the XORs. The XORs implicitly define $eflags,
which would cause them to never be considered, due to a guard condition
that I->getNumDefs() is one. I have changed that condition so that we
now only consider instructions where a forwarded register overlaps with
the instruction's single explicit define. We still need to collect the implicit
defines of other forwarded registers to remove them from the work list.
I'm not sure how to move towards supporting instructions with multiple
explicit defines, cases where forwarded register are implicitly defined,
and/or cases where an instruction produces values for multiple forwarded
registers. Perhaps the describeLoadedValue() hook should take a register
argument, and we then leave it up to the hook to describe the loaded
value in that register? I have not yet encountered a situation where
that would be necessary though.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: vsk
Subscribers: ychen, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67225
llvm-svn: 371333
Summary:
This changes the ParamLoadedValue pair which the describeLoadedValue()
hook returns so that MachineOperand objects are returned instead of
pointers.
When describing call site values we may need to describe operands which
are not part of the instruction. One such example is zero-materializing
XORs on x86, which I have implemented support for in a child revision.
Instead of having to return a pointer to an operand stored somewhere
outside the instruction, start returning objects directly instead, as
that simplifies the code.
The MachineOperand class only holds POD members, and on x86-64 it is 32
bytes large. That combined with copy elision means that the overhead of
returning a machine operand object from the hook does not become very
large.
I benchmarked this on a 8-thread i7-8650U machine with 32 GB RAM. The
benchmark consisted of building a clang 8.0 binary configured with:
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DLLVM_USE_SANITIZER=Address \
-DCMAKE_CXX_FLAGS="-Xclang -femit-debug-entry-values -stdlib=libc++"
The average wall clock time increased by 4 seconds, from 62:05 to
62:09, which is an 0.1% increase.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: vsk
Subscribers: hiraditya, ychen, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67261
llvm-svn: 371332
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.
Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
APInt::getLowBitsSet(VTSize, Scale - 1)
This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.
Reviewers: RKSimon, bevinh, leonardchan, spatel
Reviewed By: RKSimon
Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67071
llvm-svn: 371309
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.
This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.
Patch by: leonardchan, bjope
Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
Reviewed By: leonardchan
Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57836
llvm-svn: 371308
Summary:
The value operand in DW_OP_plus_uconst/DW_OP_constu value can be
large (it uses uint64_t as representation internally in LLVM).
This means that in the uint64_t to int conversions, previously done
by DwarfExpression::addMachineRegExpression, could lose information.
Also, the negation done in "-Offset" was undefined behavior in case
Offset was exactly INT_MIN.
To avoid the above problems, we now avoid transformation like
[Reg, DW_OP_plus_uconst, Offset] --> [DW_OP_breg, Offset]
and
[Reg, DW_OP_constu, Offset, DW_OP_plus] --> [DW_OP_breg, Offset]
when Offset > INT_MAX.
And we avoid to transform
[Reg, DW_OP_constu, Offset, DW_OP_minus] --> [DW_OP_breg,-Offset]
when Offset > INT_MAX+1.
The patch also adjusts DwarfCompileUnit::constructVariableDIEImpl
to make sure that "DW_OP_constu, Offset, DW_OP_minus" is used
instead of "DW_OP_plus_uconst, Offset" when creating DIExpressions
with negative frame index offsets.
Notice that this might just be the tip of the iceberg. There
are lots of fishy handling related to these constants. I think both
DIExpression::appendOffset and DIExpression::extractIfOffset may
trigger undefined behavior for certain values.
Reviewers: sdesmalen, rnk, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: jholewinski, aprantl, hiraditya, ychen, uabelho, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67263
llvm-svn: 371304
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371198
If a stack spill location is overwritten by another spill instruction,
any variable locations pointing at that slot should be terminated. We
cannot rely on spills always being restored to registers or variable
locations being moved by a DBG_VALUE: the register allocator is entitled
to spill a value and then forget about it when it goes out of liveness.
To address this, scan for memory writes to spill locations, even those we
don't consider to be normal "spills". isSpillInstruction and
isLocationSpill distinguish the two now. After identifying spill
overwrites, terminate the open range, and insert a $noreg DBG_VALUE for
that variable.
Differential Revision: https://reviews.llvm.org/D66941
llvm-svn: 371193
Summary:
Fix a bug of not update the jump table and recommit it again.
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D63972
llvm-svn: 371177
This patch reuses the MIR vreg renamer from the MIRCanonicalizerPass to cleanup
names of vregs in a MIR file for MIR test authors. I found it useful when
writing a regression test for a globalisel failure I encountered recently and
thought it might be useful for other folks as well.
Differential Revision: https://reviews.llvm.org/D67209
llvm-svn: 371121
Now that we look through copies, it's possible to visit registers that
have a register class constraint but not a type constraint. Avoid looking
through copies when this occurs as the SrcReg won't be able to determine
it's bit width or any known bits.
Along the same lines, if the initial query is on a register that doesn't
have a type constraint then the result is a default-constructed KnownBits,
that is, a 1-bit fully-unknown value.
llvm-svn: 371116
Recommit basic sibling call lowering (https://reviews.llvm.org/D67189)
The issue was that if you have a return type other than void, call lowering
will emit COPYs to get the return value after the call.
Disallow sibling calls other than ones that return void for now. Also
proactively disable swifterror tail calls for now, since there's a similar issue
with COPYs there.
Update call-translator-tail-call.ll to include test cases for each of these
things.
llvm-svn: 371114
The code was incorrectly counting the number of identical instructions,
and therefore tried to predicate an instruction which should not have
been predicated. This could have various effects: a compiler crash,
an assembler failure, a miscompile, or just generating an extra,
unnecessary instruction.
Instead of depending on TargetInstrInfo::removeBranch, which only
works on analyzable branches, just remove all branch instructions.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and
https://bugs.llvm.org/show_bug.cgi?id=41121 .
Differential Revision: https://reviews.llvm.org/D67203
llvm-svn: 371111
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)
At this point, it is very restricted. It does not handle
- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.
This patch adds
- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.
Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.
For testing, this patch
- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
as expected.
Differential Revision: https://reviews.llvm.org/D67189
........
This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll
llvm-svn: 371051
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)
At this point, it is very restricted. It does not handle
- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.
This patch adds
- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.
Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.
For testing, this patch
- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
as expected.
Differential Revision: https://reviews.llvm.org/D67189
llvm-svn: 370996
Moving MIRCanonicalizerPass vreg renaming code to MIRVRegNamerUtils so that it
can be reused in another pass (ie planing to write a standalone mir-namer pass).
I'm going to write a mir-namer pass so that next time someone has to author a
test in MIR, they can use it to cleanup the naming and make it more readable by
having the numbered vregs swapped out with named vregs.
Differential Revision: https://reviews.llvm.org/D67114
llvm-svn: 370985
Apologies, due to a git SNAFU this fix (dump doesn't exist and silence unused variables) stayed in my index rather than applying to rL370893.
llvm-svn: 370894
This is the beginnings of a reimplementation of ModuloScheduleExpander. It works
by generating a single-block correct pipelined kernel and then peeling out the
prolog and epilogs.
This patch implements kernel generation as well as a validator that will
confirm the number of phis added is the same as the ModuloScheduleExpander.
Prolog and epilog peeling will come in a different patch.
Differential Revision: https://reviews.llvm.org/D67081
llvm-svn: 370893
When comparing variable locations, LiveDebugValues currently considers only
the machine location, ignoring any DIExpression applied to it. This is a
problem because that DIExpression can do pretty much anything to the machine
location, for example dereferencing it.
This patch adds DIExpressions to that comparison; now variables based on the
same register/memory-location but with different expressions will compare
differently, and be dropped if we attempt to merge them between blocks. This
reduces variable coverage-range a little, but only because we were producing
broken locations.
Differential Revision: https://reviews.llvm.org/D66942
llvm-svn: 370877
On release builds, 'MI' isn't used by anything (it's already inserted into a
block by BuildMI), while on non-release builds it's used by a LLVM_DEBUG
statement. Mark as explicitly used to avoid the warning.
llvm-svn: 370870
Similar to the issue with G_ZEXT that was fixed earlier, this is a quick
to fall back if the source type is not exactly half of the dest type.
Fixes the clang-cmake-aarch64-lld bot build.
llvm-svn: 370847
Now that we have the infrastructure to support s128 types as parameters
we can expand these to libcalls.
Differential Revision: https://reviews.llvm.org/D66185
llvm-svn: 370823
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.
Support for splitting for return types not yet implemented.
Differential Revision: https://reviews.llvm.org/D66180
llvm-svn: 370822
Summary:
Simplify the right shift of the intermediate result (given
in four parts) by using funnel shift.
There are some impact on lit tests, but that seems to be
related to register allocation differences due to how FSHR
is expanded on X86 (giving a slightly different operand order
for the OR operations compared to the old code).
Reviewers: leonardchan, RKSimon, spatel, lebedev.ri
Reviewed By: RKSimon
Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67036
llvm-svn: 370813
Emitting a schedule is really hard. There are lots of corner cases to take care of; in fact, of the 60+ SWP-specific testcases in the Hexagon backend most of those are testing codegen rather than the schedule creation itself.
One issue is that to test an emission corner case we must craft an input such that the generated schedule uses that corner case; sometimes this is very hard and convolutes testcases. Other times it is impossible but we want to test it anyway.
This patch adds a simple test pass that will consume a module containing a loop and generate pipelined code from it. We use post-instr-symbols as a way to annotate instructions with the stage and cycle that we want to schedule them at.
We also provide a flag that causes the MachinePipeliner to generate these annotations instead of actually emitting code; this allows us to generate an input testcase with:
llc < %s -stop-after=pipeliner -pipeliner-annotate-for-testing -o test.mir
And run the emission in isolation with:
llc < test.mir -run-pass=modulo-schedule-test
llvm-svn: 370705
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340https://bugs.llvm.org/show_bug.cgi?id=42697
As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.
This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.
Differential Revision: https://reviews.llvm.org/D66687
llvm-svn: 370668
The missing line added by this patch ensures that only spilt variable
locations are candidates for being restored from the stack. Otherwise,
register or constant-value information can be interpreted as a spill
location, through a union.
The added regression test replicates a scenario where this occurs: the
stack load from [rsp] causes the register-location DBG_VALUE to be
"restored" to rsi, when it should be left alone. See PR43058 for details.
Un x-fail a test that was suffering from this from a previous patch.
Differential Revision: https://reviews.llvm.org/D66895
llvm-svn: 370648