Also avoid some pointless use of auto! Because that's friendlier to
readers and avoids several types accidentally resolving to unnecessary
references here (MachineInstr *&, unsigned &).
llvm-svn: 278894
Rather than doing a funny dance that relies on dereferencing end() not
crashing, add some API to MachineInstrBundleIterator to get a non-const
version of the iterator.
llvm-svn: 278870
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.
Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.
llvm-svn: 278866
The current MachineBasicBlock might be the last block, so FallThru may
be past the end(). Use getNextNode(), which will convert to nullptr,
rather than &*++, which is invalid if we reach the end().
llvm-svn: 278858
Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra
unconditional branch.
Differential Revision: https://reviews.llvm.org/D22521
llvm-svn: 278840
The pipeliner was generating an invalid Phi name for an operand
in the epilog block, which caused an assert in the live variable
analysis pass. The fix is to the code that generates new Phis
in the epilog block. In this case, there is an existing Phi that
needs to be reused rather than creating a new Phi instruction.
Differential Revision: https://reviews.llvm.org/D23513
llvm-svn: 278805
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.
Differential Revision: https://reviews.llvm.org/D23445
llvm-svn: 278799
in debug info using their stack slots instead of as an indirection of param reg + 0
offset. This is done by detecting FrameIndexSDNodes in SelectionDAG and generating
FrameIndexDbgValues for them. This ultimately generates DBG_VALUEs with stack
location operands.
Differential Revision: http://reviews.llvm.org/D23283
llvm-svn: 278703
This adds two new utility functions findLoopControlBlock and findLoopPreheader
to MachineLoop and MachineLoopInfo. These functions are refactored and taken
from the Hexagon target as they are target independent; thus this is intendend to
be a non-functional change.
Differential Revision: https://reviews.llvm.org/D22959
llvm-svn: 278661
Remove all ilist_iterator to pointer casts. There were two reasons for
casts:
- Checking for an uninitialized (i.e., null) iterator. I added
MachineInstrBundleIterator::isValid() to check for that case.
- Comparing an iterator against the underlying pointer value while
avoiding converting the pointer value to an iterator. This is
occasionally necessary in MachineInstrBundleIterator, since there is
an assertion in the constructors that the underlying MachineInstr is
not bundled (but we don't care about that if we're just checking for
pointer equality).
To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.
- The implicit constructors now use enable_if to exclude
const-iterator => non-const-iterator conversions from overload
resolution (previously it was a compiler error on instantiation, now
it's SFINAE).
- The == and != operators are now global (friends), and are not
templated.
- MachineInstrBundleIterator has overloads to compare against both
const_pointer and const_reference. This avoids the implicit
conversions to MachineInstrBundleIterator that assert, instead just
checking the address (and I added unit tests to confirm this).
Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore. It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.
llvm-svn: 278478
To fix PR28014, this patch restricts tail merging to blocks that belong to the
same loop after MBP.
Differential Revision: https://reviews.llvm.org/D23191
llvm-svn: 278463
It's sharing the integer G_CONSTANT for now since I don't *think* it creates
any ambiguity (even on weird archs). If that turns out wrong we can create a
G_PTRCONSTANT or something.
llvm-svn: 278423
Check MachineInstr::isDebugValue for the same instruction as we're
calling isSchedBoundary, avoiding the possibility of dereferencing
end().
This is a functionality change even when I!=end(). Matthias had a look
and agrees this is the right resolution (as opposed to checking for
end()).
This is triggered by a huge number of tests, but they happen to
magically pass right now. I found this because WIP patches for PR26753
convert them into crashes.
llvm-svn: 278394
Summary: Some backends, like WebAssembly, use virtual registers instead of physical registers. This crashes the DbgValueHistoryCalculator pass, which assumes that all registers are physical. Instead, skip virtual registers when iterating aliases, and assume that they are clobbered.
Reviewers: dexonsmith, dschuff, aprantl
Subscribers: yurydelendik, llvm-commits, jfb, sunfish
Differential Revision: https://reviews.llvm.org/D22590
llvm-svn: 278371
After machine block placement, MBBs may not have terminators, and it is
appropriate to check for the end iterator here. We can fold the check
into the next if, as well. This look is really just looking for BBs that
end in CATCHRET.
llvm-svn: 278350
Check for an end iterator from MachineBasicBlock::getFirstTerminator in
llvm::getFuncletMembership. If this is turned into an assertion, it
fires in 48 X86 testcases (for example,
CodeGen/X86/regalloc-spill-at-ehpad.ll).
Since this is likely a latent bug (shouldn't all basic blocks end with a
terminator?) I've filed PR28938.
llvm-svn: 278344
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.
Pseudo example:
loop:
xmm0 = ...
xmm1 = vcvtsi2sdl eax, xmm0<undef>
... = inst xmm0
jmp loop
In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.
Differential Revision: https://reviews.llvm.org/D22466
llvm-svn: 278321
It's more than just inttoptr, but the others can't be tested until we have
support for non-trivial constants (they currently get unavoidably folded to a
ConstantInt).
llvm-svn: 278303
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.
Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.
llvm-svn: 278288
Summary:
See the new test case for one that was (non-deterministically) crashing
on trunk and deterministically hit the assertion that I added in D23302.
Basically, the machine function contains a sequence
DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
%vreg14:sub1<def> = COPY %vreg14:sub0
and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32
instructions into one before calling repairIntervalsInRange.
Now repairIntervalsInRange wants to repair %vreg14, in particular, and
ends up trying to repair %vreg14:sub1 as well, but that only becomes
active _after_ the range that is to be repaired, hence the crash due
to LR.find(...) == LR.begin() at the start of repairOldRegInRange.
I believe that just skipping those subrange is fine, but again, not too
familiar with that code.
Reviewers: MatzeB, kparzysz, tstellarAMD
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23303
llvm-svn: 278268
This change makes it possible for tail-duplication and tail-merging to
be disjoint. By being less aggressive when merging during layout, there are no
overlapping cases between tail-duplication and tail-merging, provided the
thresholds are disjoint.
There is a remaining TODO to benchmark the succ_size() test for non-layout tail
merging.
llvm-svn: 278265
If the value produced by the bitcast hasn't been referenced yet, we can simply
reuse the input register avoiding an unnecessary COPY instruction.
llvm-svn: 278245
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.
i.e.
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )
Differential Revision: https://reviews.llvm.org/D23330
llvm-svn: 278211
For now put them all in the entry block. This should be correct but may give
poor runtime performance. Hopefully MachineSinking combined with
isReMaterializable can solve those issues, but if not the interface is sound
enough to support alternatives.
llvm-svn: 278168
As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.
This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.
Differential Revision: https://reviews.llvm.org/D23007
llvm-svn: 278141
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.
llvm-svn: 278053
Summary:
Based on two patches by Michael Mueller.
This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.
This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.
Reviewers: majnemer, sanjoy, rnk
Subscribers: dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D19908
llvm-svn: 278048
ScanInstructions is now 2 functions:
AnalyzeBranches and ScanInstructions. ScanInstructions also now takes a
pair of arguments delimiting the instructions to be scanned. This will
be used for forked diamond support to re-scan only a portion of the
block.
llvm-svn: 277904
Until now, our use case for the visitor has been to take a stream of bytes
representing a type stream, deserialize the records in sequence, and do
something with them, where "something" is determined by how the user
implements a particular set of callbacks on an abstract class.
For actually writing PDBs, however, we want to do the reverse. We have
some kind of description of the list of records in their in-memory format,
and we want to process each one. Perhaps by serializing them to a byte
stream, or perhaps by converting them from one description format (Yaml)
to another (in-memory representation).
This was difficult in the current model because deserialization and
invoking the callbacks were tightly coupled.
With this patch we change this so that TypeDeserializer is itself an
implementation of the particular set of callbacks. This decouples
deserialization from the iteration over a list of records and invocation
of the callbacks. TypeDeserializer is initialized with another
implementation of the callback interface, so that upon deserialization it
can pass the deserialized record through to the next set of callbacks. In
a sense this is like an implementation of the Decorator design pattern,
where the Deserializer is a decorator.
This will be useful for writing Pdbs from yaml, where we have a
description of the type records in Yaml format. In this case, the visitor
implementation would have each visitation callback method implemented in
such a way as to extract the proper set of fields from the Yaml, and it
could maintain state that builds up a list of these records. Finally at
the end we can pass this information through to another set of callbacks
which serializes them into a byte stream.
Reviewed By: majnemer, ruiu, rnk
Differential Revision: https://reviews.llvm.org/D23177
llvm-svn: 277871
This differs from the previous version by being more careful about template
instantiation/specialization in order to prevent errors when building with
clang -Werror. Specifically:
* begin is not defined in the template and is instead instantiated when Head
is. I think the warning when we don't do that is wrong (PR28815) but for now
at least do it this way to avoid the warning.
* Instead of performing template specializations in LLVM_INSTANTIATE_REGISTRY
instead provide a template definition then do explicit instantiation. No
compiler I've tried has problems with doing it the other way, but strictly
speaking it's not permitted by the C++ standard so better safe than sorry.
Original commit message:
Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.
This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.
llvm-svn: 277806
These are the operations that are trivially identical. Division is omitted for
now because you need to use the correct sign/zero extension.
llvm-svn: 277775
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
rewriteOperands() always performed liveness queries at the base index
rather than the RegSlot/Base as apropriate for the machine operand. This
could lead to illegal rewriting in some cases.
llvm-svn: 277661
When expanding FP constants, we attempt to shrink doubles to floats and perform an extending load.
However, on SystemZ, and possibly on other targets (I've only confirmed the problem on SystemZ), the FP extending load instruction may convert SNaN into QNaN, or may cause an exception. So in the general case, we would still like to shrink FP constants, but SNaNs should be left as doubles.
Differential Revision: https://reviews.llvm.org/D22685
llvm-svn: 277602
IfConversion used to always add the undef flag when adding a use operand
on a newly predicated instruction. This would be an operand for the register
being conditionally redefined. Due to the undef flag, the liveness of this
register prior to the predicated instruction would get lost.
This patch changes this so that such use operands are added only when the
register is live, without the undef flag.
This was reverted but pushed again now, for details follow link below.
Reviewed by Quentin Colombet.
http://reviews.llvm.org/D209077
llvm-svn: 277571
None of GlobalISel requires the property, but this lets us use the
verifier instead of rolling our own "all instructions selected" check.
llvm-svn: 277484
After instruction selection, there should be no pre-isel generic
instructions remaining, nor should generic virtual registers be
used. Verify that.
llvm-svn: 277483
Selected: the InstructionSelect pass ran and all pre-isel generic
instructions have been eliminated; i.e., all instructions are now
target-specific or non-pre-isel generic instructions (e.g., COPY).
Since only pre-isel generic instructions can have generic virtual register
operands, this also means that all generic virtual registers have been
constrained to virtual registers (assigned to register classes) and that
all sizes attached to them have been eliminated.
This lets us enforce certain invariants across passes.
This property is GlobalISel-specific, but is always available.
llvm-svn: 277482
RegBankSelected: the RegBankSelect pass ran and all generic virtual
registers have been assigned to a register bank.
This lets us enforce certain invariants across passes.
This property is GlobalISel-specific, but is always available.
llvm-svn: 277475
RegBankSelect and InstructionSelect run after the legalizer and
require a Legalized function: check that all instructions are legal.
Note that this should be in the MachineVerifier, but it can't use the
MachineLegalizer as it's currently in the separate GlobalISel library.
Note that the RegBankSelect verifier checks have the same layering
problem, but we only use inline methods so end up not needing to link
against the GlobalISel library.
llvm-svn: 277472
Legalized: The MachineLegalizer ran; all pre-isel generic instructions
have been legalized, i.e., all instructions are now one of:
- generic and always legal (e.g., COPY)
- target-specific
- legal pre-isel generic instructions.
This lets us enforce certain invariants across passes.
This property is GlobalISel-specific, but is always available.
llvm-svn: 277470
This is only used for debug prints, but the previous hardcoded ", "
caused it to be printed unnecessarily when OnlySet, and is annoying
when adding new properties.
llvm-svn: 277465
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.
This fixes PR28504.
llvm-svn: 277371
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277313
Summary:
When performing cmp for EQ/NE and the operand is sign extended, we can
avoid the truncaton if the bits to be tested are no less than origianl
bits.
Reviewers: eli.friedman
Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits
Differential Revision: https://reviews.llvm.org/D22933
llvm-svn: 277252
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.
llvm-svn: 277224
Previously this change was submitted from a Windows machine, so
changes made to the case of filenames and directory names did
not survive the commit, and as a result the CMake source file
names and the on-disk file names did not match on case-sensitive
file systems.
I'm resubmitting this patch from a Linux system, which hopefully
allows the case changes to make it through unfettered.
llvm-svn: 277213
In a previous patch, it was suggested to use all caps instead of
rolling caps for initialisms, so this patch changes everything
to do this.
llvm-svn: 277190
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
The following pattern was being layed out poorly:
A
/ \
B C
/ \ / \
D E ? (Doesn't matter)
Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)
The current algorithm gives:
A,B,C,E (D goes on worklist)
It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:
A,B,E, (C and D are in a worklist)
llvm-svn: 277187
For MachineInstrBuilder, having to manually use RegState::Define is ugly and
makes register definitions clunkier than they need to be, so this adds two
convenience functions: addDef and addUse.
For MachineIRBuilder, we want to avoid BuildMI's first-reg-is-def rule because
it's hidden away and causes bugs. So this patch switches buildInstr to
returning a MachineInstrBuilder and adding *all* operands via addDef/addUse.
NFC.
llvm-svn: 277176
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.
This currently fails to select extloads because we have yet to
agree on a representation.
llvm-svn: 277171
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
llvm-svn: 277169
[DAG] Check debug values for invalidation before transferring and mark
old debug values invalid when transferring to another SDValue.
This fixes PR28613.
Reviewers: jyknight, hans, dblaikie, echristo
Subscribers: yaron.keren, ismail, llvm-commits
Differential Revision: https://reviews.llvm.org/D22858
llvm-svn: 277135
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
This broke some out-of-tree AMDGPU tests that relied on the old behavior
wherein isIntrinsic() would return true for any function that starts
with "llvm.". And in general that change will not play nicely with
out-of-tree backends.
llvm-svn: 277087
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.
Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.
llvm-svn: 277001
This version has two fixes compared to the original:
* In Registry.h the template static members are instantiated before they are
used, as clang gives an error if you do it the other way around.
* The use of the Registry template in clang-tidy is updated in the same way as
has been done everywhere else.
Original commit message:
Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.
This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.
llvm-svn: 276973
Summary:
getName() involves a hashtable lookup, so is expensive given how
frequently isIntrinsic() is called. (In particular, many users cast to
IntrinsicInstr or one of its subclasses before calling
getIntrinsicID().)
This has an incidental functional change: Before, isIntrinsic() would
return true for any function whose name started with "llvm.", even if it
wasn't properly an intrinsic. The new behavior seems more correct to
me, because it's strange to say that isIntrinsic() is true, but
getIntrinsicId() returns "not an intrinsic".
Some callers want the old behavior -- they want to know whether the
caller is a recognized intrinsic, or might be one in some other version
of LLVM. For them, we added Function::hasLLVMReservedName(), which
checks whether the name starts with "llvm.".
This change is good for a 1.5% e2e speedup compiling a large Eigen
benchmark.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22065
llvm-svn: 276942
Factor out countDuplicatedInstructions to Count duplicated instructions at the
beginning and end of a diamond pattern. This is in prep for adding support for
diamonds that need to be tail-merged.
llvm-svn: 276910
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.
llvm-svn: 276883
Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.
This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.
Differential Revision: http://reviews.llvm.org/D21385
llvm-svn: 276856
Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match
As mentioned on D22726
llvm-svn: 276855
Change the syntax to use `%0.sub8` to denote a subregister.
This seems like a more natural fit to denote subregisters; I also plan
to introduce a new ":classname" syntax in upcoming patches to denote the
register class of a vreg.
Note that this commit disallows plain identifiers to start with a '.'
character. This shouldn't affect anything as external names/IR
references are all prefixed with '$'/'%', plain identifiers are only
used for instruction names, register mask names and subreg indexes.
Differential Revision: https://reviews.llvm.org/D22390
llvm-svn: 276815
In an instruction like:
CFI_INSTRUCTION .cfi_def_cfa ...
we can drop the '.cfi_' prefix since that should be obvious by the
context:
CFI_INSTRUCTION def_cfa ...
While being a terser and cleaner syntax this also prepares to dropping
support for identifiers starting with a dot character so we can use it
for expressions.
Differential Revision: http://reviews.llvm.org/D22388
llvm-svn: 276785
Instead of an ad-hoc collection of "buildInstr" functions with varying numbers
of registers, this uses variadic templates to provide for as many regs as
needed!
Also make IRtranslator use new "buildBr" function instead of some weird generic
one that no-one else would really use.
llvm-svn: 276762
If we move a last-use register read to a later position we may skip
intermediate segments. This may require us to not only extend the
segment before the NewIdx, but also extend the segment live-in to
OldIdx.
This switches LiveIntervalTest to use AMDGPU so we can test subregister
liveness.
llvm-svn: 276724
This adds LLVM's 3 main cast instructions (inttoptr, ptrtoint, bitcast) to the
IRTranslator. The first two are direct translations (with 2 MachineInstr types
each). Since LLT discards information, a bitcast might become trivial and we
emit a COPY in those cases instead.
llvm-svn: 276690
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.
The corresponding change to clang is in: http://reviews.llvm.org/D16538
Patch by: Joel Jones
Differential Revision: https://reviews.llvm.org/D16213
llvm-svn: 276654
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.
llvm-svn: 276461
This provides a better layering of responsibilities among different
aspects of PDB writing code. Some of the MSF related code was
contained in CodeView, and some was in PDB prior to this. Further,
we were often saying PDB when we meant MSF, and the two are
actually independent of each other since in theory you can have
other types of data besides PDB data in an MSF. So, this patch
separates the MSF specific code into its own library, with no
dependencies on anything else, and DebugInfoCodeView and
DebugInfoPDB take dependencies on DebugInfoMsf.
llvm-svn: 276458
An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size.
After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts.
In doing so we can significantly reduce the number of operations required.
Differential Revision: https://reviews.llvm.org/D21578
llvm-svn: 276432
When we failed to parse a function in the mir parser, we should abort
the whole compilation instead of continuing in a weird state. Indeed,
this was creating strange machine function passes failures that were
hard to understand, until we notice that the function actually did not
get parsed correctly!
llvm-svn: 276348
The clearance calculation did not take into account registers defined as outputs or clobbers in inline assembly machine instructions because these register defs are implicit.
Differential Revision: http://reviews.llvm.org/D22580
llvm-svn: 276266
This patch fixes a very subtle bug in regmask calculation. Thanks to zan
jyu Wong <zyfwong@gmail.com> for bringing this to notice.
For example if CL is only clobbered than CH should not be marked
clobbered but CX, RCX and ECX should be mark clobbered. Previously for
each modified register all of its aliases are marked clobbered by
markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because
when CL is clobbered then MRI::isPhysRegModified() will return true for
CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we
mark CH also clobbered as CH is aliased to CX,ECX,RCX so
markRegClobbred() is not required because isPhysRegModified already take
cares of proper aliasing register. A very simple test case has been
added to verify this change.
Please find relevant bug report here :
http://llvm.org/PR28567
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: https://reviews.llvm.org/D22400
llvm-svn: 276235
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).
llvm-svn: 276158
Reverting this commit for now as it seems to be causing failures on
test-suite tests on the clang-ppc64le-linux-lnt bot.
This reverts commit r276044.
llvm-svn: 276068
Add a check that the layout predecessor of a block is an actual CFG
predecssor of the block as well. No current code fails this check, but
upcoming patches can trigger this, and it makes sense to separate it
out.
llvm-svn: 276066
canTailDuplicate accepts two blocks and returns true if the first can be
duplicated into the second successfully. Use this function to
encapsulate the heuristic.
llvm-svn: 276062
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 276044
This adds two pieces:
- RegisterScavenger:::enterBasicBlockEnd() which behaves similar to
enterBasicBlock() but starts tracking at the end of the basic block.
- A RegisterScavenger::backward() method. It is subtly different
from the existing unprocess() method which only considers uses with
the kill flag set: If a value is dead at the end of a basic block with
a last use inside the basic block, unprocess() will fail to mark it as
live. However we cannot change/fix this behaviour because unprocess()
needs to perform the exact reverse operation of forward().
Differential Revision: http://reviews.llvm.org/D21873
llvm-svn: 276043
Also verify that we never try to set the size of a vreg associated
to a register class.
Report an error when we encounter that in MIR. Fix a testcase that
hit that error and had a size for no reason.
llvm-svn: 276012
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).
I added the new sequence (truncate (a >>n) to i1) to the BT pattern.
Differential Revision: https://reviews.llvm.org/D22354
llvm-svn: 275950
Elsewhere (particularly computeKnownBits) we assume that a global will be
aligned to the value returned by Value::getPointerAlignment. This is used to
boost the alignment on memcpy/memset, so any target-specific request can only
increase that value.
llvm-svn: 275866
DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow
SELECT op code for x86_64 fp128 type for MME targets,
so SoftenFloatOperand does not abort on SELECT op code.
Differential Revision: http://reviews.llvm.org/D21758
llvm-svn: 275818
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:
offset + frame index
when the frame is resolved.
By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.
For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.
Reviewers: dsanders, vkalintris
Differential Review: https://reviews.llvm.org/D21615
llvm-svn: 275786
Previously, we would expand:
%BL<def> = COPY %DL<kill>, %EBX<imp-use,kill>, %EBX<imp-def>
Into:
%BL<def> = MOV8rr %DL<kill>, %EBX<imp-def>
Dropping the imp-use on the floor.
That confused CriticalAntiDepBreaker, which (correctly) assumes that if an
instruction defs but doesn't use a register, that register is dead immediately
before the instruction - while in this case, the high lanes of EBX can be very
much alive.
This fixes PR28560.
Differential Revision: https://reviews.llvm.org/D22425
llvm-svn: 275634
Remove unnecessary clutter in assembly output. When using SjLj EH, the CFI is
not actually used for anything. Do not emit the CFI needlessly. The minor test
adjustments are interesting. The prologue test was just overzealous matcching.
The interesting case is the LSDA change. It was originally added to ensure that
various compilations did not mangle the name (it explicitly checked the name!).
However, subsequent cleanups made it more reliant on the CFI to find the name.
Parse the generated code flow to generically find the label still.
llvm-svn: 275614
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
Summary:
Previously we took an unsigned.
Hooray for type-safety.
Reviewers: chandlerc
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D22282
llvm-svn: 275591
For a fully inlined call chain like a -> b -> c -> d, we were emitting
line info for 'd' 3 separate times: once for d's actual InlineSite line
table, and twice for 'b' and 'c'. This is particularly inefficient when
all these functions are in different headers, because now we need to
encode the file change. Windbg was coping with our suboptimal output, so
this should not be noticeable from the debugger.
llvm-svn: 275502
Summary:
Make the target-specific flags in MachineMemOperand::Flags real, bona
fide enum values. This simplifies users, prevents various constants
from going out of sync, and avoids the false sense of security provided
by declaring static members in classes and then forgetting to define
them inside of cpp files.
Reviewers: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22372
llvm-svn: 275451
Summary:
- Give it a shorter name (because we're going to refer to it often from
SelectionDAG and friends).
- Split the flags and alignment into separate variables.
- Specialize FlagsEnumTraits for it, so we can do bitwise ops on it
without losing type information.
- Make some enum values constants in MachineMemOperand instead.
MOMaxBits should not be a valid Flag.
- Simplify some of the bitwise ops for dealing with Flags.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22281
llvm-svn: 275438
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
llvm-svn: 275367
Avoid exposing a cl::opt in a public header and instead promote this
option in the API.
Alternatively, we could land the cl::opt in CommandFlags.h so that
it is available to every tool, but we would still have to find an
option for clang.
llvm-svn: 275348
IPRA try to optimize caller saved register by propagating register
usage information from callee to caller so it is beneficial to have
caller saved registers compare to callee saved registers when IPRA
is enabled. Please find more detailed explanation here
https://groups.google.com/d/msg/llvm-dev/XRzGhJ9wtZg/tjAJqb0eEgAJ.
This change makes local function do not have any callee preserved
register when IPRA is enabled. A simple test case is also added to
verify this change.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21561
llvm-svn: 275347
Code cleanup: Move references to SlotMapping and SourceMgr into the
PerFunctionMIParsingState to avoid unnecessary passing around in
parameters.
llvm-svn: 275342
Summary:
Previously it would say we had an invariant load if any of the memory
operands were invariant. But the load should be invariant only if *all*
the memory operands are invariant.
No testcase because this has proven to be very difficult to tickle in
practice. As just one example, ARM's ldrd instruction, which loads 64
bits into two 32-bit regs, is theoretically affected by this. But when
it's produced, it loses its memoperands' invariance bits!
Reviewers: jfb
Subscribers: llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D22318
llvm-svn: 275331
Code cleanup: The PerFunctionMIParsingState is per function, moving a
reference into PFS we can avoid passing around the MachineFunction in an
extra parameter most of the time.
Also change most signatures to consistently pass PFS reference first.
llvm-svn: 275329
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.
This patch fixes that by printing the regular output in the output
specified with -o.
Differential Revision: http://reviews.llvm.org/D22251
llvm-svn: 275314
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.
Fixes -verify-machineinstrs assertion.
llvm-svn: 275221
Use LivePhysRegs with a backwards walking algorithm to update live in
lists, this way the results do not depend on the presence of kill flags
anymore.
This patch also reduces the number of registers added as live-in.
Previously all pristine registers as well as all sub registers of a
super register were added resulting in unnecessarily large live in
lists. This fixed https://llvm.org/PR25263.
Differential Revision: http://reviews.llvm.org/D22027
llvm-svn: 275201
Added support for:
1. Multi dimension array.
2. Array of structure type, which previously was declared incompletely.
3. Dynamic size array.
4. Array where element type is a typedef, volatile or constant (this should resolve PR28311).
Differential Revision: http://reviews.llvm.org/D21526
llvm-svn: 275167
Blocks to be tail-merged may share more than one successor. Correct the
comment to state that they share a specific successor, SuccBB, rather
than a single successor, which is not true.
llvm-svn: 275104
Preserve assembly comments from input in output assembly and flags to
toggle property. This is on by default for inline assembly and off in
llvm-mc.
Parsed comments are emitted immediately before an EOL which generally
places them on the expected line.
Reviewers: rtrieu, dwmw2, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20020
llvm-svn: 275058
Drive-by improvement: We would 1) add CSRs, 2) remove callee saved CSRs
and 3) add all CSRs again for the return block. Just adding CSRs once
obviously gives the same results.
llvm-svn: 274955
An identity COPY like this:
%AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.
Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).
llvm-svn: 274952
Because isReallyTriviallyReMaterializableGeneric puts many limits on
rematerializable instructions, this fix can prevent instructions with
tied virtual operands and instructions with virtual register uses from
being kept in DeadRemat, so as to workaround the live interval consistency
problem for the dummy instructions kept in DeadRemat.
But we still need to fix the live interval consistency problem. This patch
is just a short time relieve. PR28464 has been filed as a reminder.
Differential Revision: http://reviews.llvm.org/D19486
llvm-svn: 274928
Mostly through preferring MachineInstr&, avoid implicit conversions from
iterator to pointer.
Although this may bitrot (since there are other uses blocking me from
removing the implicit operator), this removes the last of the implicit
conversions from MachineInstrBundleIterator to MachineInstr* in the
LLVMCodeGen build target.
llvm-svn: 274893
The createRegAllocPass reads and writes to a global variable 'Registry'
via calls to getDefault and setDefault. Run this under a call_once to
avoid races.
llvm-svn: 274875
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.
Differential Revision: http://reviews.llvm.org/D22131
llvm-svn: 274843
findScratchNonCalleeSaveRegister() just needs a simple liveness
analysis, use LivePhysRegs for that as it is simpler and does not depend
on the kill flags.
This commit adds a convenience function available() to LivePhysRegs:
This function returns true if the given register is not reserved and
neither the register nor any of its aliases are alive.
Differential Revision: http://reviews.llvm.org/D21865
llvm-svn: 274685
Now with a corrected test to account for a recently supported properties bit in the debug info of a struct.
Original review: http://reviews.llvm.org/D21939
This reverts commit 970c3fd497a28d25dd69526eb52594a696c37968.
llvm-svn: 274661
This logic was introduced in r157663 and does not make any sense to me.
The motivating example in rdar://11538365 looks like this:
This is the tail:
BB#16: derived from LLVM BB %if.end68
Live Ins: %R0 %R4 %R5
Predecessors according to CFG: BB#15 BB#5
tBLXi pred:14, pred:%noreg, <ga:@CFRelease>, %R0<kill>, <regmask>, %LR<imp-def,dead>, %SP<imp-use>, %SP<imp-def>
t2B <BB#20>, pred:14, pred:%noreg
Successors according to CFG: BB#20
This is the predBB:
BB#5:
Live Ins: %R5
Predecessors according to CFG: BB#4
%R4<def> = t2MOVi 0, pred:14, pred:%noreg, opt:%noreg
t2B <BB#16>, pred:14, pred:%noreg
Successors according to CFG: BB#16
However this is invalid machine code to begin with, if %R0 is live-in to
BB#16 then it must be live-in to BB#5 as well if BB#5 does not define
it. We should not need logic to retroactively fix broken machine code
and in fact the example from r157663 passes cleanly with the code
removed and I do not see any (newly) failing tests with the machine
verifier enabled.
Differential Revision: http://reviews.llvm.org/D22031
llvm-svn: 274655
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.
This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.
Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D21692
llvm-svn: 274644