Summary:
Found when running Valgrind.
This removes two unnecessary assignments when using
AttrBuilder::removeAttribute.
AttrBuilder::removeAttribute returns a reference to the object.
As the LHSes were the same as the callees, the assignments
resulted in memcpy calls where dst = src.
Commited on behalf-of: dstenb (David Stenberg)
Reviewers: mkuper, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25460
llvm-svn: 285298
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
Differential Revision: https://reviews.llvm.org/D25691
llvm-svn: 285296
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
Differential Revision: https://reviews.llvm.org/D25671
llvm-svn: 285286
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Patch by Vadzim Dambrouski.
Differential Revision: https://reviews.llvm.org/D25890
llvm-svn: 285278
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25829
llvm-svn: 285273
This patch corresponds to review:
https://reviews.llvm.org/D25896
It just eliminates the redundant ZExt after a count trailing zeros instruction.
llvm-svn: 285267
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.
llvm-svn: 285246
Change type of some missed DebugInfo-related alignment variables,
that are still uint64_t, to uint32_t.
Original change introduced in r284482.
llvm-svn: 285242
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
push {r8, r4, r7, lr}
sub sp, #24
into nonsense like:
push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}
llvm-svn: 285232
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.
This fixes PR 30715.
llvm-svn: 285231
Summary:
Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`.
This is a folloup of rL258422 and
https://github.com/rust-lang/rust/pull/30917 where llvm failed to
optimize away the bounds checking in a binary search.
Patch by Arthur Silva!
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25941
llvm-svn: 285228
This reverts commit r285191.
LICM appears to rely on the Alias Set Tracker hitting lifetime markers to prevent
code from being moved outside of the original scope.
llvm-svn: 285227
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.
llvm-svn: 285215
This reapplies revision 285093. Original commit message:
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Differential Revision: https://reviews.llvm.org/D25742
llvm-svn: 285212
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.
Refactor processor definition to use ISA version features.
Fixed ISA version for stoney.
Based on Laurent Morichetti's patch.
Differential Revision: https://reviews.llvm.org/D25919
llvm-svn: 285210
Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later.
Reviewers: dnovillo, dblaikie, aprantl, echristo
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25959
llvm-svn: 285207
This finds all of the references to a frame index in a function, and
sorts by the offset. If multiple instructions use the same offset,
nothing was breaking the tie for sorting.
This avoids the test failures the reverted r282999 introduced.
llvm-svn: 285201
Summary:
AMDGPU will need this one i16 is added as a legal type. This is tested by:
test/CodeGen/AMDGPU/sdiv.ll
test/CodeGen/AMDGPU/sdivrem24.ll
test/CodeGen/AMDGPU/udiv.ll
test/CodeGen/AMDGPU/udivrem24.ll
Reviewers: bogner, efriedma
Subscribers: efriedma, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25699
llvm-svn: 285199
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.
Fixes pr30561
Reviewers: igorb, aymanmus, delena
Differential Revision: https://reviews.llvm.org/D25310
llvm-svn: 285196
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Differential Revision: https://reviews.llvm.org/D24425
llvm-svn: 285189
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.
Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.
Differential Revision: https://reviews.llvm.org/D25953
llvm-svn: 285185
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Differential Revision: https://reviews.llvm.org/D24425
llvm-svn: 285181
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
Reviewers: igorb, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25933
llvm-svn: 285173
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
llvm-svn: 285131
Summary:
Fixes PR28281.
MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.
Reviewers: rnk, ruiu, zturner
Differential Revision: https://reviews.llvm.org/D25578
llvm-svn: 285130
Summary:
Select instruction annotation in IR PGO uses the edge count to infer the
branch count. It's currently placed in setInstrumentedCounts() where
no all the BB counts have been computed. This leads to wrong branch weights.
Move the annotation after all BB counts are populated.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25961
llvm-svn: 285128
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Differential Revision: https://reviews.llvm.org/D23896
llvm-svn: 285116
This reverts r285093, as it caused unexpected buildbot failures on
clang-ppc64le-linux, clang-ppc64be-linux, clang-ppc64be-linux-multistage
and clang-ppc64be-linux-lnt. Failing test ubsan/TestCases/TypeCheck/vptr.cpp.
llvm-svn: 285110
Summary:
The intention is to make APFloat an interface class, so that later I can add a second implementation class DoubleAPFloat to correctly implement PPCDoubleDouble semantic. The interface of IEEEFloat is not public, and can be simplified (currently it's exactly the same as the old APFloat), but that belongs to a separate patch.
DoubleAPFloat should look like:
class DoubleAPFloat {
const fltSemantics *Semantics;
std::unique_ptr<APFloat> APFloats; // Two heap-allocated APFloats.
};
There is no functional change, nor public interface change.
Reviewers: hfinkel, chandlerc, iteratee, echristo, kbarton
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25536
llvm-svn: 285105
Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also clarify the name of
the other existing option governing the creation of jump tables.
Differential revision: https://reviews.llvm.org/D25883
llvm-svn: 285104
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases. The motivation is
that many contemporary processors typically perform case switches fairly
quickly.
Differential revision: https://reviews.llvm.org/D25212
llvm-svn: 285099
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.
Differential Revision: https://reviews.llvm.org/D25632
llvm-svn: 285097
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
Thanks to Adrian Prantl for stewarding this patch!
llvm-svn: 285094
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Differential Revision: https://reviews.llvm.org/D25742
llvm-svn: 285093
When indvars widened an induction variable, the debug location for the loop
increment computation was incorrectly set equal to the debug loc of the loop
latch terminator.
This patch fixes the issue by propagating the correct location from the
original loop increment instruction to the new widened increment.
Differential Revision: https://reviews.llvm.org/D25872
llvm-svn: 285083
Now that MemorySSA keeps track of whether MemoryUses are optimized, use
getClobberingMemoryAccess() to check MemoryUse memory dependencies since
it should no longer be so expensive.
This is a follow-up change to https://reviews.llvm.org/D25881
llvm-svn: 285080
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.
The current code already appears to care about that:
def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>;
Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.
llvm-svn: 285077
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.
This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.
Differential Revision: https://reviews.llvm.org/D25874
llvm-svn: 285072
Summary:
Do *not* perform combines such as:
vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
->
build_vector(X, C0, C1, C2)
Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).
Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25524
llvm-svn: 285063
In an IR symbol table I would expect the comdats to be represented as:
- A table of strings, one for each comdat name.
- Each symbol has an optional index into that table.
The natural api for accessing that would be
InputFile:
ArrayRef<StringRef> getComdatTable() const;
Symbol:
int getComdatIndex() const;
This patch implements an API as close to that as possible. The
implementation on top of the current IRObjectFile is a bit hackish,
but should map just fine over a symbol table and is very convenient to
use.
llvm-svn: 285061
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.
Reviewers: delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25594
llvm-svn: 285053
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.
llvm-svn: 285051
This changes locals from being declared by the emitLocal hook in
WebAssemblyTargetStreamer, rather than with an instruction. After exploring
the infastructure in LLVM more, this seems to make more sense since
declaring locals doesn't use an encoded opcode.
This also adds more 0xd opcodes, type encodings, and miscellaneous
binary encoding bits.
llvm-svn: 285040
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.
llvm-svn: 285039
There are two fixes here: one, AnalyzeUsesOfPointer can't return
false until it has checked all the uses of the pointer. Two, if a
global uses another global, we have to assume the address of the
first global escapes.
Fixes https://llvm.org/bugs/show_bug.cgi?id=30707 .
Differential Revision: https://reviews.llvm.org/D25798
llvm-svn: 285034
(Const)?MIOperands is equivalent to the C++ style
MachineInstr::mop_iterator. Use the latter for consistency except for a
few callers of MIOperands::analyzePhysReg().
llvm-svn: 285029
This patch adds a pass, controlled by an option and off by default for
now, for making implicit get_local/set_local explicit. This simplifies
emitting wasm with MC.
Differential Revision: https://reviews.llvm.org/D25836
llvm-svn: 285009
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.
Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.
Differential Revision: https://reviews.llvm.org/D25917
llvm-svn: 285006
This fixes a bug in the handling of lexical scopes, when more than one
scope is defined on the same line or functions are inlined into call
sites that are on the same line as the function definition. This
situation can easily happen in macro expansions.
The problem is solved by introducing a SmallDenseMap<DIScope *,
DILexicalBlockFile *, 1> that keeps track of all the different lexical
scopes that share a line/file location.
Fixes PR30681.
llvm-svn: 284998
Add support for estimating the square root or its reciprocal and division or
reciprocal using the combiner generic Newton series.
Differential revision: https://reviews.llvm.org/D25291
llvm-svn: 284986
Summary:
When using MemorySSA, re-optimize MemoryPhis when removing a store since
this may create MemoryPhis with all identical arguments.
Also, when using MemorySSA to check if two MemoryUses are reading from
the same version of the heap, use the defining access instead of calling
getClobberingAccess, since the latter can currently result in many more
AA calls. Once the MemorySSA use optimization tracking changes are
done, we can remove this limitation, which should result in more loads
being CSE'd.
Reviewers: dberlin
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25881
llvm-svn: 284984
https://reviews.llvm.org/D24924
This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.
llvm-svn: 284983
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.
This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.
Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.
Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.
This fixes several GL45-CTS.shaders.indexing.* tests.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25633
llvm-svn: 284980
Fix AsmParser lines to correctly handle end-of-line pre-processor
comments parsing when '#' is not the assembly line comment prefix.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25567
llvm-svn: 284978
If we don't have futimens(), we fall back to futimes(), which only supports
microsecond timestamps. In that case, we need to explicitly cast away the extra
precision in setLastModificationAndAccessTime().
llvm-svn: 284977
Summary:
Most of the changes are very straight-forward. The only choice I had to make was
to use second-precision time points in the Archive classes. I did this because
the archive files use that precision in the on-disk representation anyway.
Reviewers: rafael, zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25773
llvm-svn: 284974
Summary:
Add relocations for AArch64 ILP32. Includes:
- Addition of definitions for R_AARCH32_*
- Definition of new -target-abi: ilp32
- Definition of data layout string
- Tests for added relocations. Not comprehensive, but matches
existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64".
- Tests for llvm-readobj
Reviewers: zatrazz, peter.smith, echristo, t.p.northover
Subscribers: aemerson, rengolin, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25159
llvm-svn: 284973
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Patch by James Molloy and Pablo Barrio.
Reviewers: reames, bkramer, mcrosier, gberry, haicheng, jmolloy, sebpop
Subscribers: jojo, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D25477
llvm-svn: 284971
Summary:
This is a follow-up to D25416. It removes all usages of TimeValue from
llvm/Support library (except for the actual TimeValue declaration), and replaces
them with appropriate usages of std::chrono. To facilitate this, I have added
small utility functions for converting time points and durations into appropriate
OS-specific types (FILETIME, struct timespec, ...).
Reviewers: zturner, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25730
llvm-svn: 284966
Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci
as not being part of microMIPS. This does not cover the sync instruction alias,
as that will be handled with a different patch. Add sync to the valid tests for
microMIPS.
Reviewers: vkalintiris
Differential Revision: https://reviews.llvm.org/D25795
llvm-svn: 284962
Summary: With MSVC 2013 and GCC < 4.8 gone, we can use the "constexpr" keyword.
Reviewers: bkramer, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25901
llvm-svn: 284947
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.
llvm-svn: 284939
In BasicAA GEP operand values get adjusted ("wrap-around") based on the
pointersize. Otherwise, in non-64b modes, AA could report false negatives.
However, a wrap-around is valid only for a fully evaluated expression.
It had been introduced to fix an alias problem in
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160118/326163.html.
This commit restricts the wrap-around to constant gep operands only where the
value is known at compile-time.
llvm-svn: 284908
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
Differential Revision: https://reviews.llvm.org/D25805
llvm-svn: 284891
If a 64-bit value is tested against a bit which is known to be in the range
[0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly
shorter encoding.
Differential Revision: https://reviews.llvm.org/D25862
llvm-svn: 284864
Summary: This adds support for dumping the globals stream from PDB files using llvm-pdbdump, similar to the support we have for the publics stream.
Reviewers: ruiu, zturner
Subscribers: beanz, mgorny, modocache
Differential Revision: https://reviews.llvm.org/D25801
llvm-svn: 284861
Summary:
Utility pass to remove gc.relocates created by rewrite statepoints for GC.
With respect to safepoint verification, the IR generated would be incorrect, and cannot run
as such.
This would be a single transformation on the final optimized IR.
The benefit of the pass is for easy analysis when the IRs are 'polluted' by too
many gc.relocates.
Added tests.
test run: All RS4GC tests with -verify option. Local downstream tests on large
IR files. This also works when the pointer being gc.relocated is another
gc.relocate.
Reviewers: sanjoy, reames
Subscribers: beanz, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D25096
llvm-svn: 284855
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value
This is the DAG equivalent of:
https://reviews.llvm.org/rL284649
plus the fold for the NUW case which already existed in InstSimplify.
Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().
llvm-svn: 284844
After register allocation it is possible to have a spill of a register
that is only partially defined. That in itself it fine, but creates a
problem for double vector registers. Stores of such registers are pseudo
instructions that are expanded into pairs of individual vector stores,
and in case of a partially defined source, one of the stores may use
an entirely undefined register. To avoid this, track the defined parts
and only generate actual stores for those.
llvm-svn: 284841
Summary:
Need to reorder the operands to have the callee as the last argument.
Adds a pseudo-instruction, and a pass to lower it into a real
call_indirect.
This is the first of two options for how to fix the problem.
Reviewers: dschuff, sunfish
Subscribers: jfb, beanz, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D25708
llvm-svn: 284840
Because we're just 'or-ing' these 2 variables later in the code, I
don't think there's a logical bug here, but of course the string with
"no size" is the one that should have the size suffix stripped off.
llvm-svn: 284826
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.
Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x
Note that although the test cases include a 'sin' function call, I'm side-stepping the
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.
Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.
Differential Revision: https://reviews.llvm.org/D25297
llvm-svn: 284824
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
Summary:
The spill size was incorrectly set to 196 bits,
which isn't a multiple of 8. This problem was detected when
experimenting with asserts that the spill size should be a
multiple of the byte size.
New corrected value for the spill size is set to 192 bits.
Note that tablegen (RegisterInfoEmitter) will divide the
size set in the RegisterClass definition by 8. So this
change should not have any impact on the tablegen output
(trunc(192/8) == trunc(196/8) == 24 bytes).
Reviewers: t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: https://reviews.llvm.org/D25818
llvm-svn: 284814
There's no agreement about this patch. I personally find the
PRE machinery of the current GVN hard enough to reason about
that I'm not sure I'll try to land this again, instead of working
on the rewrite).
llvm-svn: 284796
rL284780 fixed the PREL31 relocation and added a test for it. Being
the first such test for ARM relocations, it exposed incorrect endianness
assumptions (causing buildbot failures on big-endian hosts). Fix that by
using the same helpers used for the x86 case.
llvm-svn: 284789
This is to avoid inlining too many multiplication operands into a SCEV, which could
take exponential time in the worst case.
Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin
Differential Revision: https://reviews.llvm.org/D25794
llvm-svn: 284784
Summary:
This allows us to mark when uses have been optimized.
This lets us avoid rewalking (IE when people call getClobberingAccess on everything), and also
enables us to later relax the requirement of use optimization during updates with less cost.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25172
llvm-svn: 284771
load commands that use the MachO::twolevel_hints_command type
which includes only the LC_TWOLEVEL_HINTS load command.
This is not used in llvm libObject code or in llvm tool code. But
does appear in one of the binary test files. While this load command is
obsolete it is easier to add code for it in libObject than edit or change
the binary test case.
llvm-svn: 284769
This was all using ArrayRef<>s before which presents a problem
when you want to serialize to or deserialize from an actual
PDB stream. An ArrayRef<> is really just a special case of
what can be handled with StreamInterface though (e.g. by using
a ByteStream), so changing this to use StreamInterface allows
us to plug in a PDB stream and get all the record serialization
and deserialization for free on a MappedBlockStream.
Subsequent patches will try to remove TypeTableBuilder and
TypeRecordBuilder in favor of class that operate on
Streams as well, which should allow us to completely merge
the reading and writing codepaths for both types and symbols.
Differential Revision: https://reviews.llvm.org/D25831
llvm-svn: 284762
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284757
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation. If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)
Updated the promote-vec3.ll test after this change. The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.
Reviewers: RKSimon
Subscribers: llvm-commits, srhines
Differential Revision: https://reviews.llvm.org/D25790
llvm-svn: 284752
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.
This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.
original commit msg:
[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284746
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.
We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.
llvm-svn: 284744
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
This is a resubmission of r284590. The mingw build should be fixed now. The
problem was we were matching time_t with _localtime_64s, which was incorrect on
_USE_32BIT_TIME_T systems. Instead I use localtime_s, which should always
evaluate to the correct function.
llvm-svn: 284720
Post-RA sched strategy and scheduling instruction annotations for z196, zEC12
and z13.
This scheduler optimizes decoder grouping and balances processor resources
(including side steering the FPd unit instructions).
The SystemZHazardRecognizer keeps track of the scheduling state, which can
be dumped with -debug-only=misched.
Reviers: Ulrich Weigand, Andrew Trick.
https://reviews.llvm.org/D17260
llvm-svn: 284704
- Add alignment attribute to DIVariable family
- Modify bitcode format to match new DIVariable representation
- Update tests to match these changes (also add bitcode upgrade test)
- Expect that frontend passes non-zero align value only when it is not default
(was forcibly aligned by alignas()/_Alignas()/__atribute__(aligned())
Differential Revision: https://reviews.llvm.org/D25073
llvm-svn: 284678
load commands that use the MachO::thread_command type
but are not used in llvm libObject code but used in llvm tool code.
This includes the LC_UNIXTHREAD and LC_THREAD
load commands.
A quick note about the philosophy of the error checking in
libObject for Mach-O files, the idea behind the checking is
that we never will return a Mach-O file out of libObject that
contains unknown things in the load commands.
To do this the 32-bit ARM and PPC general tread states
needed to be defined as two test case binaries contained
them. If other thread states for other CPUs need to be
added we will do that as needed.
Going forward the LC_MAIN load command is used to
set the entry point in Mach-O executables these days
instead of an LC_UNIXTHREAD as was done in the past.
So today only in core files are LC_THREAD load commands
and thread states usually found.
Other thread states have not yet been defined in
include/Support/MachO.h at this time. But that can be
added as needed with their corresponding checking also
added.
llvm-svn: 284668
Profile runtime can generate an empty raw profile (when there is no function in
the shared library). This empty profile is treated as a text format profile. A
test format profile without the flag of "#IR" is thought to be a clang
generated profile. So in llvm profile merging, we will get a bogus warning of
"Merge IR generated profile with Clang generated profile."
The fix here is to skip the empty profile (when the buffer size is 0) for
profile merge.
Reviewers: vsk, davidxl
Differential Revision: http://reviews.llvm.org/D25687
llvm-svn: 284659
0 - X --> X, if X is 0 or the minimum signed value
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
I noticed this pattern might be created in the backend after the change from D25485,
so we'll want to add a similar fold for the DAG.
The use of computeKnownBits in InstSimplify may be something to investigate if the
compile time of InstSimplify is noticeable. We could replace computeKnownBits with
specific pattern matchers or limit the recursion.
Differential Revision: https://reviews.llvm.org/D25785
llvm-svn: 284649
This code crashed on funclet-style EH instructions such as catchpad,
catchswitch, and cleanuppad. Just treat all EH pad instructions
equivalently and avoid merging the globals they reference through any
use.
llvm-svn: 284633
Some instructions from the original loop, when vectorized, can become trivially
dead. This happens because of the way we structure the new loop. For example,
we create new induction variables and induction variable "steps" in the new
loop. Thus, when we go to vectorize the original induction variable update, it
may no longer be needed due to the instructions we've already created. This
patch prevents us from creating these redundant instructions. This reduces code
size before simplification and allows greater flexibility in code generation
since we have fewer unnecessary instruction uses.
Differential Revision: https://reviews.llvm.org/D25631
llvm-svn: 284631
This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example.
for.body:
%i = phi i32 [ %start, %for.body.lr.ph ], [ %i.inc, %for.inc ]
%within_limits = icmp ult i32 %i, 64
br i1 %within_limits, label %continue, label %for.end
continue:
%i.i64 = zext i32 %i to i64
%arrayidx = getelementptr inbounds i32, i32* %base, i64 %i.i64
%val = load i32, i32* %arrayidx, align 4
br label %for.inc
for.inc:
%i.inc = add nsw nuw i32 %i, 1
%cmp = icmp slt i32 %i.inc, %limit
br i1 %cmp, label %for.body, label %for.end
There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D25738
llvm-svn: 284629
Summary:
Changes default backend parallelism from thread::hardware_concurrency to
the new llvm::heavyweight_hardware_concurrency, which for X86 Linux
defaults to the number of physical cores (and will fall back to
thread::hardware_concurrency otherwise). This avoid oversubscribing
the physical cores using hyperthreading.
Reviewers: mehdi_amini, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25775
llvm-svn: 284618
This reverts commit r284590 as it fails on the mingw buildbot. I think I know the
fix, but I cannot test it right now. Will reapply when I verify it works ok.
This reverts r284590.
llvm-svn: 284615
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.
This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239https://reviews.llvm.org/rL284395
There may be others, so please let me know if you see any regressions.
Differential Revision: https://reviews.llvm.org/D25485
llvm-svn: 284611
This required reengineering of some of the part of liveness calculation,
including fixing some issues caused by the limitations of the previous
approach. The current code is not necessarily the fastest, but it should
be functionally correct (at least more so than before). The compile-time
performance will be addressed in the future.
llvm-svn: 284609
Summary:
std::chrono mostly covers the functionality of llvm::sys::TimeValue and
lldb_private::TimeValue. This header adds a bit of utility functions and
typedefs, which make the usage of the library and porting code from TimeValues
easier.
Rationale:
- TimePoint typedef - precision of system_clock is implementation defined -
using a well-defined precision helps maintain consistency between platforms,
makes it interact better with existing TimeValue classes, and avoids cases
there a time point is implicitly convertible to a specific precision on some
platforms but not on others.
- system_clock::to_time_t only accepts time_points with the default system
precision (even though time_t has only second precision on all platforms we
support). To avoid the need for explicit casts, I have added a toTimeT()
wrapper function. toTimePoint(time_t) was not strictly necessary, but I have
added it for symmetry.
Reviewers: zturner, mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25416
llvm-svn: 284590
Most z13 vector instructions have a base form where the data type of
the operation (whether to consider the vector to be 16 bytes, 8
halfwords, 4 words, or 2 doublewords) is encoded into a mask field,
and then a set of extended mnemonics where the mask field is not
present but the data type is encoded into the mnemonic name.
Currently, LLVM only supports the type-specific forms (since those
are really the ones needed for code generation), but not the base
type-generic forms.
To complete the assembler support and make it fully compatible with
the GNU assembler, this commit adds assembler aliases for all the
base forms of the various vector instructions.
It also adds two more alias forms that are documented in the PoP:
VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc.
VNOT -- special variant of VNO
llvm-svn: 284586
The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are
documented in the Principles of Operation to have an optional last
operand to encode arbitrary values in a mask field.
This commit adds support for those optional operands, and cleans up
the patterns to generate vector string instruction as bit. No change
to code generation intended.
llvm-svn: 284585
Declare the LLVM_CMAKE_PATH to the source directory location of CMake
files, in order to make it possible to easily use them in subprojects.
Such a variable is already declared in most of LLVM projects
(and inconsistently mixed with direct source tree references), including
Clang, LLDB, compiler-rt, libcxx... Declaring it inside main LLVM tree
makes it possible to avoid having to declare fallback values or use
conditionals in those projects.
It should be noted that in some of the subprojects LLVM_CMAKE_PATH is
used to reference generated LLVMConfig.cmake file. However, these
references are conditional to stand-alone builds and explicitly
including this file is unnecessary in combined builds.
Differential Revision: https://reviews.llvm.org/D25724
llvm-svn: 284581
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 284580
This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.
llvm-svn: 284578
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).
Differential Revision: https://reviews.llvm.org/D25625
llvm-svn: 284571
Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.
New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.
There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.
The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.
Reviewers: RKSimon, delena, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25651
llvm-svn: 284567
Example of output:
COVERAGE:
COVERED: in DSO2(int) /pathto/DSO2.cpp:6
COVERED: in DSO2(int) /pathto/DSO2.cpp:8
COVERED: in DSO1(int) /pathto/DSO1.cpp:6
COVERED: in DSO1(int) /pathto/DSO1.cpp:8
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:16
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:19
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:25
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:26
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO1.so
UNCOVERED_LINE: in DSO1(int) /pathto/DSO1.cpp:9
UNCOVERED_FUNC: in Uncovered1()
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO2.so
UNCOVERED_LINE: in DSO2(int) /pathto/DSO2.cpp:9
UNCOVERED_FUNC: in Uncovered2()
MODULE_WITH_COVERAGE: /pathto/LLVMFuzzer-DSOTest
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:21
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:27
UNCOVERED_FILE: /pathto/DSOTestExtra.cpp
Several things are not perfect here:
* we are using objdump+awk instead of sancov because sancov does not support DSOs yet.
* this breaks in the presence of ASAN_OPTIONS=strip_path_prefix=...
(need to implement another API to get the module name by PC)
llvm-svn: 284554
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284545
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
sqrt(val);
is transformed to
if (val < 0)
sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.
These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.
Reviewers: hfinkel, mehdi_amini, davidxl
Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz
Differential Revision: https://reviews.llvm.org/D24414
llvm-svn: 284542
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284541
This is just a quick utility handy for getting rough summaries of types
in a given object or dwo file. I've been using it to investigate the
amount of type info redundancy across a project build, for example.
llvm-svn: 284537
The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.
Differential Revision: https://reviews.llvm.org/D25713
llvm-svn: 284536
Summary:
The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation:
1. add a new metadata for function section prefix
2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function
3. output the section prefix set by CGP
Reviewers: davidxl, eraman
Subscribers: vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D24989
llvm-svn: 284533
Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.
Differential Revision: https://reviews.llvm.org/D24808
llvm-svn: 284531
load command that use the MachO:: linkedit_data_command
type but is not used in llvm libObject code but used in llvm tool code.
This is for the LC_CODE_SIGNATURE load command.
llvm-svn: 284529
AArch64 actually supports many 8-bit operations under the definition used by
GlobalISel: the designated information-carrying bits of a GPR32 get the right
value if you just use the normal 32-bit instruction.
llvm-svn: 284526
load commands that use the MachO::routines_command and
and MachO::routines_command_64 types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_ROUTINES and LC_ROUTINES_64
load commands.
llvm-svn: 284504
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284495
This patch teaches ias for mips to handle expressions such as
(8*4)+(8*31)($sp). Such expression typically occur from the expansion
of multiple macro definitions.
This partially resolves PR/30383.
Thanks to Sean Bruno for reporting the issue!
Reviewers: zoran.jovanovic, vkalintiris
Differential Revision: https://reviews.llvm.org/D24667
llvm-svn: 284485
The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands.
MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit
immediate.
This patch correct the definition of sync so that it is accepted with an
operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned
immediate for MIPS32 and later revisions.
Additionally a clear error is given when the MIPS32 version of sync is
used when targeting pre MIPS32.
This partially resolves PR/30714.
Thanks to Daniel Sanders for reporting this issue!
Reveiwers: vkalintiris
Differential Revision: https://reviews.llvm.org/D25672
llvm-svn: 284483
In futher patches we shall have alignment field added to DIVariable family
and switching from uint64_t to uint32_t will save 4 bytes per variable.
Differential Revision: https://reviews.llvm.org/D25620
llvm-svn: 284482
ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads
or stores using the specified source or destination register and the next
register.
This patch does not add support for the cases where the offset is greater than
a 16 bit signed immediate as that would lead to a wrong/misleading error
message as the assembler would report "instruction requires a CPU feature
not currently enabled" for ld & sd for MIPS64 when their offset is not a signed
16 bit number.
This fixes PR/29159.
Thanks to Sean Bruno for reporting this issue!
Reviewers: vkalintiris, seanbruno, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D24556
llvm-svn: 284481
Committing on behalf of Coby Tayree: After check-all and LGTM
Desc:
AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}')
Currently, the following forms are allowed:
{k<num>}
{k<num>}{z}
This patch allows the following forms:
{z}{k<num>}
and ignores the next form:
{z}
Justification would be quite simple - GCC
Differential Revision: http://reviews.llvm.org/D25013
llvm-svn: 284479
Summary:
Instead of instantiating the MipsFastISel class and checking if the
target is supported in the overriden methods, we should perform that
check before creating the class. This allows us to enable FastISel *only*
for targets that truly support it, ie. MIPS32 to MIPS32R5.
Reviewers: sdardis
Subscribers: ehostunreach, llvm-commits
Differential Revision: https://reviews.llvm.org/D24824
llvm-svn: 284475
In loops that look something like
i = n;
do {
...
} while(i++ < n+k);
where k is a constant, the maximum backedge count is k (in fact the backedge
count will be either 0 or k, depending on whether n+k wraps). More generally
for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will
iterate either 0 or that constant number of times.
This allows for more loop unrolling with the recent upper bound loop unrolling
changes, and I'm working on a patch that will let loop unrolling additionally
make use of the loop being executed either 0 or k times (we need to retain the
loop comparison only on the first unrolled iteration).
Differential Revision: https://reviews.llvm.org/D25607
llvm-svn: 284465
This reverts commits 284436 and 284437 because they still break AArch64 bots:
Value of: format_number(-10, IntegerStyle::Integer, 1)
Actual: "-0"
Expected: "-10"
llvm-svn: 284462
This patch assigns cost of the scaling used in addressing for Cortex-R52.
On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.
Differential Revision: http://reviews.llvm.org/D25670
Reviewer: jmolloy
llvm-svn: 284460
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.
This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.
It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.
Differential Revision: https://reviews.llvm.org/D23808
llvm-svn: 284459
This patch adds simplified support for tail calls on ARM with XRay instrumentation.
Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program
#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
std::printf("In fC()\n");
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
std::printf("In fB()\n");
fC();
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
std::printf("In fA()\n");
fB();
}
// Avoid infinite recursion in case the logging function is instrumented (so calls logging
// function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}
int main(int argc, char* argv[]) {
__xray_set_handler(simplyPrint);
printf("Patching...\n");
__xray_patch();
fA();
printf("Unpatching...\n");
__xray_unpatch();
fA();
return 0;
}
gives the following output:
Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()
So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).
Differential Revision: https://reviews.llvm.org/D25030
llvm-svn: 284456
When Error was threaded through these APIs back in r265606 the
"return" was missed here, which triggers a warning if/when I add
LLVM_NODISCARD to the Error type.
llvm-svn: 284454
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25666
llvm-svn: 284451
Summary:
There are differences in codegen between Linux and Windows due to:
1. Using std::sort which uses quicksort which is a non-stable sort.
2. Iterating over Set data structure where the iteration order is
non deterministic.
Reviewers: arsenm, grosbach, junbuml, zinob, MatzeB
Subscribers: MatzeB, wdng
Differential Revision: https://reviews.llvm.org/D25695
llvm-svn: 284441
This resubmits commits 284425 and r284428, which were reverted
in r284429 due to some infinite recursion caused by an incorrect
selection of function overloads. Reproduced the failure on Linux
using GCC 4.8.4, and confirmed that with the new patch the tests
path on GCC as well as MSVC. So hopefully this fixes everything.
llvm-svn: 284436
Summary:
Reclaiming the name 'CachedHashString' will let us add a type with that
name that owns its value.
Reviewers: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25644
llvm-svn: 284434
load commands that use the MachO::sub_framework_command,
MachO::sub_umbrella_command, MachO::sub_library_command
and MachO::sub_client_command types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA,
LC_SUB_LIBRARY and LC_SUB_CLIENT load commands.
llvm-svn: 284431
raw_ostream has not afforded a lot of flexibility in terms of
how to format numbers when outputting. Wrap this all up into
a set of low level helper functions that can be used to output
numbers with arbitrary precision, alignment, format, etc and
then update raw_ostream to use these functions.
This will be useful for upcoming improvements to llvm's string
formatting libraries, but are still useful independently.
Differential Revision: https://reviews.llvm.org/D25497
llvm-svn: 284425
As noted in:
https://reviews.llvm.org/D25685
This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch.
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.
llvm-svn: 284421
If -coverage is passed, but -g is not, clang populates the PassManager
pipeline with StripSymbols(debugOnly = true).
The stripSymbol pass therefore scans the list of named metadata,
drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD)
around. The verifier runs, and finds out that there's a CU not listed
in !llvm.dbg.cu (as it was previously dropped) -> crash.
When we strip debug info, so, check if there's coverage data,
and strip it as well, in order to avoid pending metadata left around.
Differential Revision: https://reviews.llvm.org/D25689
llvm-svn: 284418
Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info.
Reviewers: davidxl, mzolotukhin, jmolloy
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D25286
llvm-svn: 284415
Summary:
This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker.
The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function.
Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better.
Reviewers: peter.smith, compnerd
Subscribers: rengolin, sas
Differential Revision: https://reviews.llvm.org/D25151
llvm-svn: 284410
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25198
llvm-svn: 284397
This came up as part of:
https://reviews.llvm.org/D25485
Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.
llvm-svn: 284395
Based on post-commit review for D25585/r284180, rename
hardware_physical_concurrency to heavyweight_hardware_concurrency,
to better reflect what type of tasks it should be used for and
to enable other systems to map this to something other than the
number of physical cores.
llvm-svn: 284390
/../foo is still a proper path after removing the dotdot. This should
now finally match https://9p.io/sys/doc/lexnames.html [Cleaning names].
llvm-svn: 284384
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.
In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.
llvm-svn: 284381
Not all ConstantExprs can be represented by a global variable, for example most
pointer arithmetic other than addition of a constant, so we can't convert these
values from switch statements to lookup tables.
Differential Revision: https://reviews.llvm.org/D25550
llvm-svn: 284379
Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor.
Reviewers: sanjoy, grosser
Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser
Differential Revision: https://reviews.llvm.org/D16492
llvm-svn: 284378
CodeGenPrepare knows how to move a zext of a load into the same basic block
where the load lives. The goal is to help ISel match a zero-extending load
instead of two separated instructions.
CGP attempts to move a zext computation even if it lives in a basic block that
does not post-dominate the load's basic block. That means, the hoisted zext may
be speculated. Preserving the zext location would hurt the debugging experience
and the quality of sample pgo.
With this patch, when moving a zext near to its associated load, CGP no longer
propagates the zext's debug location. Instead, CGP conservatively reuses the
same debug location for the load and the zext.
An alternative approach would be to assign an artificial line-0 location to the
zext. However we don't want to over-use the 'line-0' for this particular case
because it would have a size cost in the line-table section for no additional
benefit.
Differential Revision: https://reviews.llvm.org/D25611
llvm-svn: 284377
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
llvm-svn: 284311
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
llvm-svn: 284303
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Differential Revision: https://reviews.llvm.org/D25464
llvm-svn: 284290
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
llvm-svn: 284287
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
llvm-svn: 284268
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Reviewers: arsenm, nhaehnle
Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24672
llvm-svn: 284267
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
Reviewers: bogner, arsenm
Subscribers: nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24964
llvm-svn: 284266
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
Use PackedRegisterRef to store the register information in the graph nodes.
This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.
llvm-svn: 284255
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Differential Revision: https://reviews.llvm.org/D25535
llvm-svn: 284251
This is an improvement when compiling with llvm. llvm doesn't inline
the call to insert, so the align is always executed and shows up in
the profile.
With gcc the call to insert is inlined and the align computation moved
and done only if needed.
With this patch we explicitly only compute it if it is needed.
In the two tests with debug info, the speedup was
scylla
master 3.008959365
patch 2.932080942 1.02621974786x faster
firefox
master 6.709823604
patch 6.592387227 1.01781393795x faster
In all others the difference was in the noise.
llvm-svn: 284249
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25289
llvm-svn: 284224
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D21473
llvm-svn: 284218
Committing in the name of Ziv Izhar: After check-all and LGTM .
The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
jmp short label
label:
}
A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958
Differential Revision: https://reviews.llvm.org/D24957
llvm-svn: 284211
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.
llvm-svn: 284204
Summary:
This will be used by ThinLTO to set the amount of backend
parallelism, which performs better when restricted to the number
of physical cores (on X86 at least, where getHostNumPhysicalCores is
currently defined). If not available this falls back to
thread::hardware_concurrency.
Note I didn't add to the thread class since that is a typedef to
std::thread where available.
Reviewers: mehdi_amini
Subscribers: beanz, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D25585
llvm-svn: 284180
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.
llvm-svn: 284175
Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.
llvm-svn: 284173
Summary:
This operation is promoted the same way was ISD::BSWAP. This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.
Reviewers: bogner, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25202
llvm-svn: 284163
the X86 subdirectory. Original commit message:
Requires a valid TargetMachine to be passed to the SafeStack pass.
Patch by Michael LeMay
Differential revision: http://reviews.llvm.org/D24896
llvm-svn: 284161
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.
Differential Revision: https://reviews.llvm.org/D24849
llvm-svn: 284160
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.
Unfortunately no test case for in-tree targets.
llvm-svn: 284152
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 284151
Summary:
For now I have only added support for x86_64 Linux, but other systems
can be added incrementally.
This is to be used for setting the default parallelism for ThinLTO
backends (instead of thread::hardware_concurrency which includes
hyperthreading and is too aggressive). I'll send this as a follow-on
patch, and it will fall back to hardware_concurrency when the new
getHostNumPhysicalCores returns -1 (when not supported for a given
host system).
I also added an interface to MemoryBuffer to force reading a file
as a stream - this is required for /proc/cpuinfo which is a special
file that looks like a normal file but appears to have 0 size.
The existing readers of this file in Host.cpp are reading the first
1024 or so bytes from it, because the necessary info is near the top.
But for the new functionality we need to be able to read the entire
file. I can go back and change the other readers to use the new
getFileAsStream as a follow-on patch since it seems much more robust.
Added a unittest.
Reviewers: mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25564
llvm-svn: 284138
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.
Fixes part of PR29098.
llvm-svn: 284136
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.
Based on a patch by David Kreitzer
This bug is a consequence of
r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines
We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.
Differential Revision: https://reviews.llvm.org/D25566
llvm-svn: 284130
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.
For instance:
LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]
Above, (1) takes less cycles than (2).
By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.
Differential Revision: http://reviews.llvm.org/D24857
Reviewers: jmolloy, rengolin
llvm-svn: 284127
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.
Differential Revision: https://reviews.llvm.org/D25333
llvm-svn: 284123
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.
llvm-svn: 284119
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.
Reviewers: vkalintiris
Differential Review: https://reviews.llvm.org/D24291
llvm-svn: 284115
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.
Differential Revision: http://reviews.llvm.org/D25022
llvm-svn: 284108
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.
llvm-svn: 284102
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.
Differential Revision: https://reviews.llvm.org/D25542
llvm-svn: 284098
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.
llvm-svn: 284097
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
PrivateLinkage should equal to the Internal Linkage.
- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
"L" may conflict to the normal symbol name starting with 'L'.
Based on a patch by Han Sangjin! Manually updated test cases.
llvm-svn: 284096
This CL didn't actually address the test case in PR30499, and clang
still crashes.
Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"
Reverts r283965 and r283967.
llvm-svn: 284093
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25293
llvm-svn: 284061
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
We need to use the overload of Mangler::getNameWithPrefix that takes a
GlobalValue in order to mangle in the stdcall stack byte count for Windows
targets.
Differential Revision: https://reviews.llvm.org/D25529
llvm-svn: 284040
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.
Differential Revision: https://reviews.llvm.org/D25478
llvm-svn: 284036
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.
llvm-svn: 284031
Module inline asm was always being linked/concatenated
when running the IRLinker. This is correct for full LTO but not when
we are importing for ThinLTO, as it can result in multiply defined
symbols when the module asm defines a global symbol.
In order to test with llvm-lto2, I had to work around PR30396,
where a symbol that is defined in module assembly but defined in the
LLVM IR appears twice. Added workaround to llvm-lto2 with a FIXME.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25359
llvm-svn: 284030
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness. I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.
It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.
Reviewers: jmolloy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25502
llvm-svn: 284028
Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default.
Patch by Li Huang
Differential Revision: https://reviews.llvm.org/D18777
llvm-svn: 284022
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.
Differential Revision: https://reviews.llvm.org/D25374
llvm-svn: 284015
This combiner breaks debug experience and should not be run when optimizations are disabled.
For example:
int main() {
int j = 0;
j += 2;
if (j == 2)
return 0;
return 5;
}
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.
Differential Revision: https://reviews.llvm.org/D19268
llvm-svn: 284014
An arithmetic shift can be safely changed to a logical shift if the first
operand is known positive. This allows ComputeKnownBits (and similar analysis)
to determine the sign bit of the shifted value in some cases. In turn, this
allows InstCombine to canonicalize a signed comparison (a > 0) into an equality
check (a != 0).
PR30577
Differential Revision: https://reviews.llvm.org/D25119
llvm-svn: 284013
The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.
Differential Revision: https://reviews.llvm.org/D25186
llvm-svn: 284012
As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead.
Differential Revision: https://reviews.llvm.org/D25466
llvm-svn: 284000
subcommands
This commit fixes a bug where the help output doesn't display subcommands when
a tool has less than 3 subcommands.
This change doesn't include a corresponding unittest as there is no viable way
to provide a unittest for it.
Differential Revision: https://reviews.llvm.org/D25463
llvm-svn: 283998
The basic inlining operation makes the following changes to the call graph:
1) Add edges that were previously transitive edges. This is always trivial and
this patch gives the LCG helper methods to make this more convenient.
2) Remove the inlined edge. We had existing support for this, but it contained
bugs that needed to be fixed. Testing in the same pattern as the inliner
exposes these bugs very nicely.
3) Delete a function when it becomes dead because it is internal and all calls
have been inlined. The LCG had no support at all for this operation, so this
adds that support.
Two unittests have been added that exercise this specific mutation pattern to
the call graph. They were extremely effective in uncovering bugs. Sadly,
a large fraction of the code here is just to implement those unit tests, but
I think they're paying for themselves. =]
This was split out of a patch that actually uses the routines to
implement inlining in the new pass manager in order to isolate (with
unit tests) the logic that was entirely within the LCG.
Many thanks for the careful review from folks! There will be a few minor
follow-up patches based on the comments in the review as well.
Differential Revision: https://reviews.llvm.org/D24225
llvm-svn: 283982
This reverts commit r283946.
This breaks when build with GCC:
lib/Fuzzer/FuzzerTracePC.cpp:169:6: error: always_inline function might not be inlinable [-Werror=attributes]
lib/Fuzzer/FuzzerTracePC.cpp:169:6: error: inlining failed in call to always_inline 'void fuzzer::TracePC::HandleCmp(void*, T, T) [with T = long unsigned int]': target specific option mismatch
lib/Fuzzer/FuzzerTracePC.cpp:198:65: error: called from here
llvm-svn: 283979
Although Copies are not specific to preISel, we still have to assign them
a proper register class. However, given they are not constrained to
anything we do not have to handle the source register at the copy. It
will be properly mapped when reaching the related definition.
In the process, the handlong of G_ANYEXT is slightly modified as those
end up being selected as copy. The difference is that when register size
do not match on both sides, we need to insert SUBREG_TO_REG operation,
otherwise the post RA copy expansion will not be happy!
llvm-svn: 283972
This implements the cleanup that Danny asked to commit separately from the
previous fix to GVN-hoist in https://reviews.llvm.org/D25476#inline-219818
Tested with ninja check on x86_64-linux.
llvm-svn: 283967
This is a refreshed version of a patch that was reverted: it fixes
the problems reported in both PR30216 and PR30499, and
contains all the test-cases from both bugs.
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Tested on x86_64-linux with check and a test-suite run.
Differential Revision: https://reviews.llvm.org/D25476
llvm-svn: 283965
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
B = Splat A
C = Splat B
=>
C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
B = Splat A
C = Splat B
=>
B = Splat A
C = COPY B
Fixes PR30663.
Reviewers: echristo, iteratee, kbarton, nemanjai
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25493
llvm-svn: 283961
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.
llvm-svn: 283953
Reverts r283938 to reinstate r283867 with a fix.
The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.
llvm-svn: 283942
load commands that uses the MachO::linker_option_command
type but not used in llvm libObject code but used in llvm tool code.
This includes just LC_LINKER_OPTION load command.
llvm-svn: 283939
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283934
The previous commit was failing because we filled empty slots of
the debug stream index with kInvalidStreamIndex. It should've been 0.
llvm-svn: 283925
Low level functionality to format numbers were embedded in the
implementation of raw_ostream. I have need to use these through
an interface other than the overloaded stream operators, so they
need to be raised to a level that they can be used from either
raw_ostream operators or other code.
llvm-svn: 283921
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)
Differential Revision: https://reviews.llvm.org/D25298
llvm-svn: 283919
I fixed all the other Targets in r283702, and interestingly the
sanitizers are only now "sometimes" catching this bug on the only
one I missed.
llvm-svn: 283914
This has existed pretty much forever AFAICT, but the code was
never being exercised because nobody was using the class. A
user of this class surfaced, and now we're breaking with UB.
The code was obviously wrong, so it's fixed here.
llvm-svn: 283912
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction.
An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect
different patterns. In this case, the SimplifyDemandedBits fold prevents us from
performing a zext to sext fold that would then be recognized as a negation of a sext.
llvm-svn: 283900
Previously we would print
USAGE: <exe> [subcommand] [options]
Even if no subcommands were present. This changes the output
format to only print "[subcommand]" if there is at least one
subcommand.
Fixes llvm.org/pr30598
Patch by Serge Guelton
llvm-svn: 283892
For each block check that it doesn't have any uses outside of it's innermost loop.
Differential Revision: https://reviews.llvm.org/D25364
llvm-svn: 283877
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.
This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.
In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.
We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.
There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.
In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.
Differential Revision: https://reviews.llvm.org/D24228
llvm-svn: 283867
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.
Differential Revision: https://reviews.llvm.org/D25180
llvm-svn: 283866
Allow instructions such as 'cmp w0, #(end - start)' by folding the
expression into a constant. For ELF, we fold only if the symbols are in
the same section. For MachO, we fold if the expression contains only
symbols that are not linker visible.
Fixes https://llvm.org/bugs/show_bug.cgi?id=18920
Differential Revision: https://reviews.llvm.org/D23834
llvm-svn: 283862
This reverts commit r283842.
test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.
llvm-svn: 283857
LLVM's RandomNumberGenerator wasn't compatible with
the random distribution from <random>.
Fixes PR25105
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D25443
llvm-svn: 283854
Summary: This patch sets function as hot if function's entry count is hot/cold.
Reviewers: eraman, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25048
llvm-svn: 283852
This changes MachineRegisterInfo to be initializes after parsing all
instructions. This is in preparation for upcoming commits that allow the
register class specification on the operand or deduce them from the
MCInstrDesc.
This commit removes the unused feature of having nonsequential register
numbers. This was confusing anyway as the vreg numbers would be
different after parsing when you had "holes" in your numbering.
This patch also introduces the concept of an incomplete virtual
register. An incomplete virtual register may be used during .mir parsing
to construct MachineOperands without knowing the exact register class
(or register bank) yet.
NFC except for some error messages.
Differential Revision: https://reviews.llvm.org/D22397
llvm-svn: 283848
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283842
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.
This patch changes this by attempting to split all live intervals before
performing recoloring.
This fixes LLVM bug PR14879.
I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.
Reviewers: qcolombet
Subscribers: MatzeB
Differential Revision: https://reviews.llvm.org/D25070
llvm-svn: 283838
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.
This fixes PR30597.
Patch by Ariel Ben-Yehuda!
Differential Revision: https://reviews.llvm.org/D25215
llvm-svn: 283836
This only adds the support for 64-bit vector OR. Adding more sizes is
not difficult, but it requires a bigger refactoring because ORs work on
any size, not necessarly the ones that match the width of the register
width. Right now, this is not expressed in the legalization, so don't
bother pushing the refactoring yet.
llvm-svn: 283831
Previously, there is no way to create a stream other than pre-defined
special stream such as DBI or IPI. This patch adds a new method,
addDbgStream, to add a debug stream to a PDB file.
Differential Revision: https://reviews.llvm.org/D25356
llvm-svn: 283823
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.
Patch by Edward Jones, thanks!
Differential Revision: https://reviews.llvm.org/D24459
llvm-svn: 283797
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.
The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.
Differential Revision: https://reviews.llvm.org/D25281
llvm-svn: 283763
Summary:
Rotate by 1 is translated to 1 micro-op, while rotate with imm8 is translated to 2 micro-ops.
Fixes pr30644.
Reviewers: delena, igorb, craig.topper, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D25399
llvm-svn: 283758
Commit in the name of:Coby Tayree
1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64).
2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent)
This patch applies the needed changes to clang
clang patch: https://reviews.llvm.org/D25004
Differential Revision: D25005
llvm-svn: 283717
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions.
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.
Differential Revision: https://reviews.llvm.org/D25322
llvm-svn: 283694
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
This seems to have been responsible for the XMM16-31 spills observed in PR29112. With this fixed the test case has been modified to no longer have a spill of XMM16.
llvm-svn: 283668
Summary:
When there is a call to an alias in the same module, we were not
adding a call edge. So we could incorrectly think that the alias
was dead if it was inlined in that function, despite having a
reference imported elsewhere. This resulted in unsats at link time.
Add a call edge when the call is to an alias.
Reviewers: davide, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25384
llvm-svn: 283664
Avoid generating indexed vector instructions for Exynos. This is needed for
fmla/fmls/fmul/fmulx. For example, the instruction
fmla v0.4s, v1.4s, v2.s[1]
is less efficient than the instructions
dup v2.4s, v2.s[1]
fmla v0.4s, v1.4s, v2.4s
Patch written by Abderrazek Zaafrani.
Differential Revision: https://reviews.llvm.org/D21571
llvm-svn: 283663
Value names may be prefixed with a binary '1' to indicate that the
backend should not modify the symbols due to any platform naming
convention.
This should not show up in the YAML opt record file because it breaks
the YAML parser.
llvm-svn: 283656
Clang always emit a hash for ThinLTO, but as other frontend are
starting to use ThinLTO, this could be a serious bug.
Differential Revision: https://reviews.llvm.org/D25379
llvm-svn: 283655
We need to add an entry in the combined-index for modules that have
a hash but otherwise empty summary, this is needed so that we can
get the hash for the module.
Also, if no entry is present in the combined index for a module, we
need to skip it when trying to compute a cache entry.
Differential Revision: https://reviews.llvm.org/D25300
llvm-svn: 283654
This is the first step towards round-tripping symbol information,
and thusly being able to write symbol information to a PDB.
This patch writes the symbol information for each compiland to
the Yaml when running in pdb2yaml mode. There's still some loose
ends, such as what to do about relocations (necessary in order to
print linkage names), how to print enums with friendly names, and
how to give the dumper access to the StringTable, but this is a
good first start.
llvm-svn: 283641
Once MULHS was expanded, this exposed an issue where the condition
register was thought to be 16-bit. This caused an attempt to copy a
16-bit register to an 8-bit register.
Authored by Jake Goulding
llvm-svn: 283634
Summary:
If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup.
Before this change, CoroSplit synthesized these stores after coro.begin:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
store void (%f.Frame*)* @f.destroy, void (%f.Frame*)** %destroy.addr
```
In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops.
Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
%0 = select i1 %need.alloc, void (%f.Frame*)* @f.destroy, void (%f.Frame*)* @f.cleanup
store void (%f.Frame*)* %0, void (%f.Frame*)** %destroy.addr
```
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25377
llvm-svn: 283625
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283619
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.
Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.
Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.
When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.
This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.
rdar://28300923
llvm-svn: 283617
Type visitor code had already been refactored previously to
decouple the visitor and the visitor callback interface. This
was necessary for having the flexibility to visit in different
ways (for example, dumping to yaml, reading from yaml, dumping
to ScopedPrinter, etc).
This patch merely implements the same visitation pattern for
symbol records that has already been implemented for type records.
llvm-svn: 283609
If we're going to canonicalize IR towards select of constants, try harder to create those.
Also, don't lose the metadata.
This is actually 4 related transforms in one patch:
// select X, (sext X), C --> select X, -1, C
// select X, (zext X), C --> select X, 1, C
// select X, C, (sext X) --> select X, C, 0
// select X, C, (zext X) --> select X, C, 0
Differential Revision: https://reviews.llvm.org/D25126
llvm-svn: 283575
Summary: -fsample-profile needs discriminator, which will not be added if built with -g0. This patch makes sure the discriminator is added for sample-profile at -g0. A followup patch will be send out to update clang tests.
Reviewers: davidxl, dblaikie, echristo, dnovillo
Subscribers: mehdi_amini, probinson, llvm-commits
Differential Revision: https://reviews.llvm.org/D25132
llvm-svn: 283565
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant. This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.
This partial reverts r282612, which was a more conservative solution
to this problem.
Reviewers: reames, sanjoy, apilipenko
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24772
llvm-svn: 283557
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Differential Revision: https://reviews.llvm.org/D25332
llvm-svn: 283550
MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors.
This patch inserts a vreg copy mi to ensure that the registers are correct.
Fix for PR30607
Differential Revision: https://reviews.llvm.org/D25280
llvm-svn: 283539
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Differential Revision: https://reviews.llvm.org/D24796
llvm-svn: 283535
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
Summary:
There was a bug with sequences like
s_mov_b64 s[0:1], exec
s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
...
s_mov_b64_term exec, s[2:3]
because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.
Note that the test case also exposes an unrelated bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25306
llvm-svn: 283528
Summary:
This class deals with the lowering of CodeGen `MachineInstr` objects to
MC `MCInst` objects.
Reviewers: kparzysz, arsenm
Subscribers: wdng, beanz, japaric, mgorny
Differential Revision: https://reviews.llvm.org/D25269
llvm-svn: 283522
GetCaseResults assumed that a terminator with one successor was an
unconditional branch. This is not necessarily the case, it could be a
cleanupret.
Strengthen the check by querying whether or not the terminator is
exceptional.
llvm-svn: 283517
Summary:
I had for the second time today a bug where llvm::format("%s", Str)
was called with Str being a StringRef. The Linux and MacOS bots were
fine, but windows having different calling convention, it printed
garbage.
Instead we can catch this at compile-time: it is never expected to
call a C vararg printf-like function with non scalar type I believe.
Reviewers: bogner, Bigcheese, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25266
llvm-svn: 283509
Per spec changes, this implements block signatures, and adds just enough
logic to produce correct block signatures at the ends of functions.
Differential Revision: https://reviews.llvm.org/D25144
llvm-svn: 283503
Per spec changes, store instructions in WebAssembly no longer have a return
value. Update the instruction descriptions.
Differential Revision: https://reviews.llvm.org/D25122
llvm-svn: 283501
Summary:
These nodes need legalization for 3-element vectors. This commit
handles the legalization and adds tests for zext and sext.
This fixes PR30614.
Reviewers: RKSimon, srhines
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25268
llvm-svn: 283496
Add a weak alias to the renamed Comdat function in IR level instrumentation,
using it's original name. This ensures the same behavior w/ and w/o IR
instrumentation, even for non standard conforming code.
Differential Revision: http://reviews.llvm.org/D25339
llvm-svn: 283490
When replacing FrameIndex with BasePtr, we must preserve BasePtr for
LEA64_32r since BasePtr is used later for stack adjustment if it is
the same as StackPtr.
Patch by H.J Lu <hjl.tools@gmail.com>
Differential Revision: https://reviews.llvm.org/D23575
llvm-svn: 283486
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.
This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.
Differential Revision: https://reviews.llvm.org/D24683
llvm-svn: 283480
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
llvm-svn: 283473
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.
llvm-svn: 283471
Change erroneous parsing of push immediate instructions in intel syntax
to default to pointer size by rewriting into the ATT style for matching.
This fixes PR22028.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25288
llvm-svn: 283457
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'
It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.
llvm-svn: 283442
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.
llvm-svn: 283437
Summary:
The computeKnownBits and ComputeNumSignBits functions in ValueTracking can now do a simple look-through of ExtractElement.
Reviewers: majnemer, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24955
llvm-svn: 283434
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.
Testcase added.
llvm-svn: 283423
If we don't truncate, LLVM asserts when the label difference doesn't fit
in a 16 bit field. This patch truncates two kinds of data: trailing null
terminated names in symbol records, and inline line tables. The inline
line table test that I have is too large (many MB), so I'm not checking
it in.
Hopefully fixes PR28264.
llvm-svn: 283403
This came out of a discussion in https://reviews.llvm.org/D25285.
There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.
This also removes an entirely obsolete and bitrotted testcase for PR7662.
Reapplies 283390 with a forgotten testcase.
llvm-svn: 283400
Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes. It will however allow a client to distinguish back from cross edges in a DFS tree.
Reviewers: nadav, mehdi_amini, dberlin
Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits
Differential Revision: https://reviews.llvm.org/D25191
llvm-svn: 283391
This came out of a discussion in https://reviews.llvm.org/D25285.
There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.
This also removes an entirely obsolete and bitrotted testcase for PR7662.
llvm-svn: 283390
This allows LLVM to describe locations of aggregate variables that have
been split by SROA.
Fixes PR29141
Reviewers: amccarth, majnemer
Differential Revision: https://reviews.llvm.org/D25253
llvm-svn: 283388
To be default constructible, Archive::child_iterator needs to be able to
construct an Archive::Child with a null parent, however Archive::Child's
constructor always dereferenced its Parent argument to compute the remaining
archive size. This commit fixes Archive::Child's constructor to only do the
size calculation when the parent is non-null.
llvm-svn: 283387
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Differential Revision: https://reviews.llvm.org/D24076
llvm-svn: 283383
The vectorizer already holds a pointer to one cost model artifact in a member
variable (i.e., MinBWs). As we add more, it will be easier to communicate these
artifacts to the vectorizer if we simply pass a pointer to the cost model
instead.
llvm-svn: 283373
The register scavenging code does not support multiple definitions of
the same vreg.
Differential Revision: https://reviews.llvm.org/D25220
llvm-svn: 283369
The vectorizer already holds a pointer to the legality analysis in a member
variable, so it makes sense that we would pass it in the constructor.
llvm-svn: 283368
This patch refactors the cost estimation of scalarized loads and stores to
reuse getScalarizationOverhead for the cost of the extractelement and
insertelement instructions we might create. The existing code accounted for
this cost, but it was functionally equivalent to the helper function.
llvm-svn: 283364
Previously we would give up when we saw the bitpiece DWARF expression
and print "[complex expression]" when actually we handled bitpiece
expressions outside the loop.
llvm-svn: 283355
The cost model has to estimate the probability of executing predicated blocks.
However, we currently always assume predicated blocks have a 50% chance of
executing (this value is hardcoded in several places throughout the code).
Since we always use the same value, this patch adds a helper function for
getting this uniform probability. The function simplifies some comments and
makes our assumptions more clear. In the future, we may want to extend this
with actual block probability information if it's available.
llvm-svn: 283354
The integrated assembler evaluates the expressions such as ~0x80000000 to
0xffffffff7fffffff early in the parsing process. This patch adds compatibility
with gas so that li loads the expected value (0x7fffffff) in those cases. This
only occurs iff all the upper 32bits are set and maintains existing checks by
not truncating the result down to 32 bits if any of the the upper bits are not
set.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D23399
llvm-svn: 283353
This patch adds a single helper function for checking if an instruction will be
scalarized with predication. Such instructions include conditional stores and
instructions that may divide by zero. Existing checks have been updated to use
the new function.
llvm-svn: 283350
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.
Reviewers: bogner, tstellarAMD, mkuper
Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D25007
llvm-svn: 283347
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.
Reviewers: vkalintiris, zoran.jovanoic
Differential Review: https://reviews.llvm.org/D24499
llvm-svn: 283334
This is not a valid encoding - these instructions cannot do PC-relative addressing.
The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.
llvm-svn: 283323
Summary:
This adds the AVR machine code backend (`AVRAsmBackend.cpp`). This will
allow us to generate machine code from assembled AVR instructions.
Reviewers: arsenm, kparzysz
Subscribers: modocache, japaric, wdng, beanz, mgorny
Differential Revision: https://reviews.llvm.org/D25029
llvm-svn: 283297
This should allow users of the library to get a range to iterate through
all the subcommands that are registered to the global parser. This
allows users to define subcommands in libraries that self-register to
have dispatch done at a different stage (like main). It allows for
writing code like the following:
for (auto *S : cl::getRegisteredSubcommands()) {
if (*S) {
// Dispatch on S->getName().
}
}
This change also contains tests that show this usage pattern.
Reviewers: zturner, dblaikie, echristo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24489
llvm-svn: 283296
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.
Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.
llvm-svn: 283292
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283274
Summary:
These are analog to the existing LLVMConstExactSDiv and LLVMBuildExactSDiv
functions.
Reviewers: deadalnix, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D25259
llvm-svn: 283269
Summary:
Attempting to fix PR30384.
Take the same approach as in compiler_rt and add a simplified version of __get_cpuid_max.
Including cpuid.h is no longer needed.
Reviewers: echristo, joerg
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24597
llvm-svn: 283265
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.
Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.
The ingredients of this patch are:
Remove the reciprocal estimate command-line debug option.
Add TargetRecip to TargetLowering.
Remove TargetRecip from TargetOptions.
Clean up the TargetRecip implementation to work with this new scheme.
Set the default reciprocal settings in TargetLoweringBase (everything is off).
Update the PowerPC defaults, users, and tests.
Update the x86 defaults, users, and tests.
Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.
Differential Revision: https://reviews.llvm.org/D24816
llvm-svn: 283252
load commands that uses the MachO::encryption_info_command and
MachO::encryption_info_command types but not used in llvm libObject
code but used in llvm tool code.
This includes just LC_ENCRYPTION_INFO and
LC_ENCRYPTION_INFO_64 load commands.
llvm-svn: 283250
AArch64InstrInfo::shouldScheduleAdjacent() determines whether two
instruction can benefit from macroop fusion on apple CPUs. The list
turned out to be incomplete:
- the "rr" variants of the instructions were missing
- even the "rs" variants can have shift value == 0 and behave like the
"rr" variants
This also splits the MacropFusion target feature into
ArithmeticBccFusion and ArithmeticCbzFusion.
Differential Revision: https://reviews.llvm.org/D25142
llvm-svn: 283243
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.
Differential Revision: https://reviews.llvm.org/D25224
llvm-svn: 283236
This patch corresponds to review:
The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.
llvm-svn: 283212
Treat soft-float as unsupported for fast-isel. Additionally, ensure we check
that lowering f32 arguments also considers the case of soft-float mode.
Reviewers: ehostunreach, vkalintiris, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D24505
llvm-svn: 283209
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.
The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.
Thanks to James Miller <james@aatch.net> for help in debugging.
Differential Revision: https://reviews.llvm.org/D25223
llvm-svn: 283203
This fixes the inconsistency of the fp denormal option names: in LLVM this was
DenormalType, but in Clang this is DenormalMode which seems better.
Differential Revision: https://reviews.llvm.org/D24906
llvm-svn: 283192
This patch corresponds to review:
https://reviews.llvm.org/D23155
This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:
Int to Fp conversions of 1 or 2-byte values loaded from memory
Building vectors of 1 or 2-byte integers with values loaded from memory
Storing individual 1 or 2-byte elements from integer vectors
This patch implements all of those uses.
llvm-svn: 283190
Slightly improves the precision of GlobalsAA in certain situations, and
makes the behavior of optimization passes more predictable.
Differential Revision: https://reviews.llvm.org/D24104
llvm-svn: 283165
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
llvm-svn: 283164
WebAssembly has officially switched from being an AST to being a stack
machine. Update various bits of terminology and README.md entries
accordingly.
llvm-svn: 283154
This is to avoid problems with win32 + ELF which surprisingly happens a
lot in practice: If a user just specifies -march on the commandline the
object format changes along with the architecture to ELF in many
instances while the OS stays with the default/host OS.
llvm-svn: 283151
Refactor the code so that the same function can be used for all
instructions with all the same operands for up to 3 operands.
This is going to be useful for cast instructions.
NFC.
llvm-svn: 283144
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.
llvm-svn: 283143
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.
Differential Revision: https://reviews.llvm.org/D24627
llvm-svn: 283140
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).
Reviewers: davidxl, danielcdh, hfinkel, chandlerc
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D24168
llvm-svn: 283134
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.
Differential Revision: https://reviews.llvm.org/D24680
llvm-svn: 283129
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
Differential Revision: https://reviews.llvm.org/D25165
llvm-svn: 283119
If the llvm. prefix is dropped other parts of llvm don't see this as
an intrinsic. This means that the number of regular symbols depends
on the context the module is loaded into, which causes LTO to abort.
Fixes PR30509.
llvm-svn: 283117
Retrying after buildbot reset.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
This fixes PR28921.
Reviewers: rnk, loladiro
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24839
llvm-svn: 283111
Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D24727
llvm-svn: 283099
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.
Patch mostly by Pablo Barrio.
Differential Revision: https://reviews.llvm.org/D25077
llvm-svn: 283098
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.
Unfortunately, I don't have a small test case.
llvm-svn: 283086
I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files.
llvm-svn: 283083
This was first landed in rL283058 and subsequenlty reverted since a
change this depends on (rL283057) was buggy and had to be reverted.
llvm-svn: 283079
This change teaches getEquivalentICmp to be smarter about generating
ICMP_NE and ICMP_EQ predicates.
An earlier version of this change was landed as rL283057 which had a
use-after-free bug. This new version has a fix for that bug, and a (C++
unittests/) test case that would have triggered it rL283057.
llvm-svn: 283078
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.
This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.
Bug found during internal testing.
Differential Revision: https://reviews.llvm.org/D25039
llvm-svn: 283070
They've broken the sanitizer-bootstrap bots. Reverting while I investigate.
Original commit messages:
r283057: "[ConstantRange] Make getEquivalentICmp smarter"
r283058: "[SCEV] Rely on ConstantRange instead of custom logic; NFCI"
llvm-svn: 283062
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.
Fixes PR26970.
llvm-svn: 283060
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.
This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.
llvm-svn: 283038
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.
We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful
I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets
llvm-svn: 283037
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.
llvm-svn: 283020
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
This fixes PR28921.
Reviewers: rnk, loladiro
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24839
llvm-svn: 282992
The binder is in a specific section that "reverse" the edges in a
regular dead-stripping: the binder is live as long as a global it
references is live.
This is a big hammer that prevents LLVM from dead-stripping these,
while still allowing linker dead-stripping (with special knowledge
of the section).
Differential Revision: https://reviews.llvm.org/D24673
llvm-svn: 282988
We don't need to have singleton ValueMapping on their own, we can just
reuse one of the elements of the 3-ops mapping.
This allows even more code sharing.
NFC.
llvm-svn: 282959
When we create a PDB file using PDBFileBuilder, the information
in the superblock, such as the size of the resulting file, is not
available.
Previously, PDBFileBuilder::initialize took a superblock assuming
that all the members of the struct are correct. That is useful when
you want to restore the exact information from a YAML file, but
that's probably the only use case in which that is useful.
When we are creating a PDB file on the fly, we have to backfill the
members.
This patch redefines PDBFileBuilder::initialize to take only a
block size. Now all the other members are left as default values,
so that they'll be updated when commit() is called.
Differential Revision: https://reviews.llvm.org/D25108
llvm-svn: 282944
WritableStream needs the exact file size to open a file, but
until we fix the final layout of a PDB file, we don't know the
size of the file.
This patch changes the parameter type of PDBFileBuilder::commit
to solve that chiecken-and-egg problem. Now the function opens
a file after fixing the layout, so it can create a file with the
exact size.
Differential Revision: https://reviews.llvm.org/D25107
llvm-svn: 282940
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.
Differential Revision: https://reviews.llvm.org/D24836
llvm-svn: 282920
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.
```
%save = call token @llvm.coro.save(i8* null)
%Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
switch i8 %suspend, label %exit [
i8 0, label %await.ready
i8 1, label %exit
]
await.ready:
%val = load i32, i32* %Result.i19
```
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24418
llvm-svn: 282902
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.
We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24417
llvm-svn: 282899
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24412
llvm-svn: 282897
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.
Reviewers: davidxl, mzolotukhin
Subscribers: haicheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25098
llvm-svn: 282894
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.
Patch by Jacob Gravelle
Differential Revision: https://reviews.llvm.org/D24942
llvm-svn: 282886
Summary:
This patch is adding the support for a shadow memory with
dynamically allocated address range.
The compiler-rt needs to export a symbol containing the shadow
memory range.
This is required to support ASAN on windows 64-bits.
Reviewers: kcc, rnk, vitalybuka
Subscribers: zaks.anna, kubabrecka, dberris, llvm-commits, chrisha
Differential Revision: https://reviews.llvm.org/D23354
llvm-svn: 282881
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
Summary:
This change adds the AVR assembly instruction printer.
No tests are included in this patch. I have left them downstream so we can
add them once `llc` successfully runs (there's very few components left
to upstream until this).
Reviewers: arsenm, kparzysz
Subscribers: wdng, beanz, mgorny
Differential Revision: https://reviews.llvm.org/D25028
llvm-svn: 282854
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.
This patch changes this by attempting to split all live intervals before
performing recoloring.
This fixes LLVM bug PR14879.
I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.
Reviewers: qcolombet
Subscribers: MatzeB
Differential Revision: https://reviews.llvm.org/D25070
llvm-svn: 282852
I'm not completely sure what this method does or why all the 256-bit VTs returned VR128RegClass when the comments on the method definiton say it should return the largest super register class. I just figured AVX-512 should be similar.
llvm-svn: 282836
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.
I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.
llvm-svn: 282835
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24976
llvm-svn: 282833
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.
llvm-svn: 282832
Summary:
This lets people link against LLVM and their own version of the UTF
library.
I determined this only affects llvm, clang, lld, and lldb by running
$ git grep -wl 'UTF[0-9]\+\|\bConvertUTF\bisLegalUTF\|getNumBytesFor' | cut -f 1 -d '/' | sort | uniq
clang
lld
lldb
llvm
Tested with
ninja lldb
ninja check-clang check-llvm check-lld
(ninja check-lldb doesn't complete for me with or without this patch.)
Reviewers: rnk
Subscribers: klimek, beanz, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D24996
llvm-svn: 282822
This uses a TableGen'ed like structure for all 3-operands instrs.
The output of the RegBankSelect pass should be identical but the
RegisterBankInfo will do less dynamic allocations.
llvm-svn: 282817
(Recommit after making sure IsVerbose gets properly initialized in
DiagnosticInfoOptimizationBase. See previous commit that takes care of
this.)
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.
Then we need the magic to be able to turn an LAA remark into an LV
remark. This is done via a new OptimizationRemark ctor.
llvm-svn: 282813
Also, make foldSelectExtConst() a member of InstCombiner, remove
unnecessary parameters from its interface, and group visitSelectInst
helpers together in the header file.
llvm-svn: 282796
load command that uses the MachO::entry_point_command type
but not used in llvm libObject code but used in llvm tool code.
This includes just the LC_MAIN load command.
llvm-svn: 282766
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.
Then we need the magic to be able to turn an LAA remark into an LV
remark. This is done via a new OptimizationRemark ctor.
llvm-svn: 282758
Instead of producing a mapping for all the operands, we only generate a
mapping for the definition. Indeed, the other operands are not
constrained by the instruction and thus, we should leave the choice to
the actual definition to do the right thing.
In pratice this is almost NFC, but with one advantage. We will have only
one instance of OperandsMapping for each copy and phi that map to one
register bank instead of one different instance for each different
number of operands for each copy and phi.
llvm-svn: 282756
The VS debugger doesn't appear to understand the 0x68 or 0x69 type
indices, which were probably intended for use on a platform where a C
'int' is 8 bits. So, use the character types instead. Clang was already
using the character types because '[u]int8_t' is usually defined in
terms of 'char'.
See the Rust issue for screenshots of what VS does:
https://github.com/rust-lang/rust/issues/36646
Fixes PR30552
llvm-svn: 282739
load command that uses the Mach::source_version_command type
but not used in llvm libObject code but used in llvm tool code.
This includes just the LC_SOURCE_VERSION load command.
llvm-svn: 282736
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006
Reviewers: tejohnson, mehdi_amini, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24940
llvm-svn: 282733
The last one remaining after which emitAnalysis can be removed is when
we convert the LAA's report to a vectorization report. This requires
converting LAA to the new interface first.
llvm-svn: 282726
The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data.
This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements.
Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future).
llvm-svn: 282720
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.
Fixes one of the cases from PR29112.
llvm-svn: 282690
AsmPrinter. This was reinitializing the Mangler after we moved the
Mangler down to TLOF and causing us to have two different unnamed
global values accessed with the same name.
This should fix the problems on the ubsan tests here:
http://lab.llvm.org:8011/builders/clang-cmake-mips/builds/15307
llvm-svn: 282675
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.
This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.
llvm-svn: 282667
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model. This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.
Reviewers: t.p.northover, peter.smith, rovka
Subscribers: salim.nasser, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D24702
llvm-svn: 282661
Summary:
The patch fixes regression caused by two earlier patches D18777 and D18867.
Reviewers: reames, sanjoy
Differential Revision: http://reviews.llvm.org/D24280
From: Li Huang
llvm-svn: 282650
load command that uses the Mach::rpath_command type
but not used in llvm libObject code but used in llvm tool code.
This includes just the LC_RPATH load command.
llvm-svn: 282649
This is a step toward statically allocate InstructionMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.
This should already improve compile time by getting rid of a bunch of
memmove of SmallVectors.
llvm-svn: 282643
LiveDebugVariables doesn't propagate DBG_VALUEs accross basic block
boundaries any more; this functionality was split into LiveDebugValues.
We can thus drop the now dead references to LexicalScopes from LiveDebugVariables.
llvm-svn: 282638
other load commands that use the Mach::version_min_command type
but not used in llvm libObject code but used in llvm tool code.
This includes LC_VERSION_MIN_MACOSX, LC_VERSION_MIN_IPHONEOS,
LC_VERSION_MIN_TVOS and LC_VERSION_MIN_WATCHOS load commands.
llvm-svn: 282635
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.
llvm-svn: 282626
Summary:
When using llc with -compile-twice, module is generated twice, but getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI will still get the old PSI with the original (invalidated) Module. This patch checks if the module has changed when calling getPSI, if yes, update the module and invalidate the Summary.
The bug does not show up in the current llc because PSI is not used in CodeGen yet. But with https://reviews.llvm.org/D24989, the bug will be exposed by test/CodeGen/PowerPC/pr26378.ll
Reviewers: eraman, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24993
llvm-svn: 282616
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.
This is similar to D13008.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D24729
llvm-svn: 282612
This addresses PR26055 LiveDebugValues is very slow.
Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.
For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:
void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
f(i);
if (i)
f(i);
f(i);
}
This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.
The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.
https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!
llvm-svn: 282611
Summary:
Instead of creating and destroying SCEVUnionPredicate instances (which
internally creates and destroys a DenseMap), use temporary SmallPtrSet
instances of remember the set of predicates that will get reified into a
SCEVUnionPredicate.
Reviewers: silviu.baranga, sbaranga
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25000
llvm-svn: 282606
Implement 'retn' simply by aliasing it to the relevant 'ret' instruction
Commit on behalf of coby
Differential Revision: https://reviews.llvm.org/D24346
llvm-svn: 282601
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 282600
Summary:
This adds the AVRMCTargetDesc file in tree. It allows creation of the
core classes used in the backend.
Reviewers: arsenm, kparzysz
Subscribers: wdng, beanz, mgorny
Differential Revision: https://reviews.llvm.org/D25023
llvm-svn: 282597
The previous data layout caused issues when dealing with atomics.
Foe example, it is illegal to load a 16-bit value with less than 16-bits
of alignment.
This changes the data layout so that all types are aligned by at least
their own width.
Interestingly, this also _slightly_ decreased register pressure in some
cases.
llvm-svn: 282587
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening
The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.
Differential Revision: https://reviews.llvm.org/D24953
llvm-svn: 282580
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.
It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).
Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451
llvm-svn: 282570
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.
Differential Revision: https://reviews.llvm.org/D24625
llvm-svn: 282567
Ever since LAA was split out into an analysis on its own, this function
stopped emitting the report directly. Instead it stores it to be
retrieved by the client which can then emit it as its own report
(e.g. -Rpass-analysis=loop-vectorize).
llvm-svn: 282561
other load commands that use the MachO::dylinker_command type
but not used in llvm libObject code but used in llvm tool code.
This includes LC_ID_DYLINKER, LC_LOAD_DYLINKER
and LC_DYLD_ENVIRONMENT load commands.
llvm-svn: 282553
Another step toward TableGen'ed like structure for the RegisterBankInfo
of AArch64. By doing this, we also save a bit of compile time for the
exact same output.
llvm-svn: 282550
The 'or' case shows up in copysign. The copysign code also had
redundant checking for a scalar zero operand with 'and', so I
removed that.
I'm not sure how to test vector 'and', 'andn', and 'xor' yet,
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.
llvm-svn: 282546
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant. Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true. This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.
Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy
Subscribers: junbuml, aemerson, mcrosier, rengolin
Differential Revision: https://reviews.llvm.org/D24570
llvm-svn: 282543
There is really no reason for these to be separate.
The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed. There,
you have to query analysis to get the full picture.
I think we should just explain the reason for missing the optimization
in the missed remark when possible. Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.
llvm-svn: 282542
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
Turns out several external projects relied on llvm printing statistics
on exit. Let's go back to this behaviour by default and have an optional
parameter to disable it.
llvm-svn: 282532
LLVM developers might be surprised to learn that there are blocks
without valid insertion points (catchswitch), so it seems worth calling
that out explicitly. Also add a FIXME about what we should really be
doing if we ever need to make optimized Windows EH code debuggable.
While I'm here, make auto usage more consistent with LLVM standards and
avoid an unecessary call to insertBefore.
llvm-svn: 282521
A landing pad can have live-in registers that are defined by the runtime,
not the program (exception pointer register and exception selector
register). Make sure to recognize that case and not link these registers
with any defs in the program.
Each landing pad will have phi nodes added at the beginning to provide
definitions of these registers, but the uses of those phi nodes will not
have any reaching defs.
llvm-svn: 282519
I don't expect `PendingLoopPredicates` to have very many
elements (e.g. when -O3'ing the sqlite3 amalgamation,
`PendingLoopPredicates` has at most 3 elements). So now we use a
`SmallPtrSet` for it instead of the more heavyweight `DenseSet`.
llvm-svn: 282511
Variables are sometimes missing their debug location information in
blocks in which the variables should be available. This would occur
when one or more predecessor blocks had not yet been visited by the
routine which propagated the information from predecessor blocks.
This is addressed by only considering predecessor blocks which have
already been visited.
The solution to this problem was suggested by Daniel Berlin on the
LLVM developer mailing list.
Differential Revision: https://reviews.llvm.org/D24927
llvm-svn: 282506
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282499
Disable tail calls while the remaining bugs are fixed. Enable only for tests.
Reviewers: vkalintiris
Differential Review: https://reviews.llvm.org/D24912
llvm-svn: 282487
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.
Reviewers: vkalintiris, zoran.jovanoic
Differential Review: https://reviews.llvm.org/D24499
llvm-svn: 282485
This patch corresponds to review:
https://reviews.llvm.org/D24396
This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.
llvm-svn: 282478
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.
Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.
Reviewers: pcc
Subscribers: kcc
Differential Revision: https://reviews.llvm.org/D24948
llvm-svn: 282461
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.
Patch by Jacob Gravelle
Differential Revision: https://reviews.llvm.org/D24889
llvm-svn: 282442
other load commands that use the Mach::linkedit_data_command type
but not used in llvm libObject code but used in llvm tool code.
This includes LC_FUNCTION_STARTS, LC_SEGMENT_SPLIT_INFO
and LC_DYLIB_CODE_SIGN_DRS load commands.
llvm-svn: 282441
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
Previously enabling the statistics with EnableStatistics() would lead to
them getting printed to stderr/-info-output-file on exit. However
frontends may want a way to enable statistics and do the printing on
their own instead of the forced printing on exit.
This changes the code so that only the -stats option enables printing on
exit, EnableStatistics() only enables the tracking but requires invoking
one of the PrintStatistics() variants.
Differential Revision: https://reviews.llvm.org/D24819
llvm-svn: 282425
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.
Differential Revision: https://reviews.llvm.org/D23889
llvm-svn: 282418
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.
Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24408
llvm-svn: 282414
Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the
32-bit to 64-bit zero-extend can be done for free by taking advantage
of the 32-bit defining instruction zeroing the upper 32-bits of the X
register destination. This enables better instruction selection in a
few cases, such as:
sub x0, xzr, x8
instead of:
mov x8, xzr
sub x0, x8, w9, uxtw
madd x0, x1, x1, x8
instead of:
mul x9, x1, x1
add x0, x9, w8, uxtw
cmp x2, x8
instead of:
sub x8, x2, w8, uxtw
cmp x8, #0
add x0, x8, x1, lsl #3
instead of:
lsl x9, x1, #3
add x0, x9, w8, uxtw
Reviewers: t.p.northover, jmolloy
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D24747
llvm-svn: 282413
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables. As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified. One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.
This patch considers a limit that a target may set to the number of
destination addresses in a jump table.
Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.
Differential revision: https://reviews.llvm.org/D21940
llvm-svn: 282412
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.
llvm-svn: 282401
This patch fixes PR30366.
Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.
This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.
Differential Revision: https://reviews.llvm.org/D24565
llvm-svn: 282398
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.
llvm-svn: 282387
Summary:
Replace a LEA instruction of the form 'lea (%esp), %ebx' --> 'mov %esp, %ebx'
MOV is preferable over LEA because usually there are more issue-slots available to execute MOVs than LEAs. Latest processors also support zero-latency MOVs.
Fixes pr29022.
Reviewers: hfinkel, delena, igorb, myatsina, mkuper
Differential Revision: https://reviews.llvm.org/D24705
llvm-svn: 282385
In a previous change I collapsed two different caches into one. When
doing that I noticed that ScalarEvolution's move constructor was not
moving those caches.
To keep the previous change simple, I've moved that bugfix into this
separate change.
llvm-svn: 282376
Both `loopHasNoSideEffects` and `loopHasNoAbnormalExits` involve walking
the loop and maintaining similar sorts of caches. This commit changes
SCEV to compute both the predicates via a single walk, and maintain a
single cache instead of two.
llvm-svn: 282375
This change simplifies a data structure optimization in the
`BackedgeTakenInfo` class for loops with exactly one computable exit.
I've sanity checked that this does not regress compile time performance,
using sqlite3's amalgamated build.
llvm-svn: 282365
Stop looking at users of UndefValue and ConstantPointerNull in the
objective C ARC optimizers. The other users aren't actually
interesting, since they're not pointing at a particular object. I
imagine these calls could be optimized through -instcombine... maybe
they already are?
These early returns will be required at some point in the future, with a
WIP patch that asserts when someone accesses a use-list on ConstantData.
llvm-svn: 282338
There is no benefit in looking through assumptions on UndefValue to
guess known bits. Return early to avoid walking their use-lists, and
assert that all instances of ConstantData are handled here for similar
reasons (UndefValue was the only integer/pointer holdout).
llvm-svn: 282337
Assumptions on UndefValue and ConstantPointerNull aren't relevant to
other users. Ignore them entirely to avoid wasting cycles walking
through their (possibly extremely extensive (cross-module)) use-lists.
It wasn't clear how to add a specific test for this, and it'll be
covered anyway by an eventual patch that asserts when trying to access
the use-list of an instance of ConstantData.
llvm-svn: 282334
Check and return early for ConstantPointerNull and UndefValue
specifically in isKnownNonNullAt, and assert that ConstantData never
make it to isKnownNonNullFromDominatingCondition.
This confirms that isKnownNonNullFromDominatingCondition never walks
through the use-list of an instance of ConstantData. Given that such
use-lists cross module boundaries, it never really made sense to do so,
and was potentially very expensive.
llvm-svn: 282333
This is a step toward statically allocate ValueMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.
llvm-svn: 282324
Previously we were using the address of the unique instance of a partial
mapping in the related map to access this instance. However, when the
map grows, the whole set of instances may be moved elsewhere and the
previous addresses are not valid anymore.
Instead, keep the address of the unique heap allocated instance of a
partial mapping.
Note: I did not see any actual bugs for that problem as the number of
partial mappings dynamically allocated is small (<= 4).
llvm-svn: 282323
Return early from llvm::isSafeToDestroyConstant() whenever the value
`isa<ConstantData>()`. These constants are shared across the
LLVMContext. We never really want to delete them here, and walking
their use-lists can be very expensive.
(This is motivated by an eventual goal of removing use-lists entirely
from ConstantData.)
llvm-svn: 282320
This is similar to:
https://reviews.llvm.org/rL279958
By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.
We should also be able to extend this code to handle vectors.
llvm-svn: 282312
The NativeObjectOutput class has a design problem: it mixes up the caching
policy with the interface for output streams, which makes the client-side
code hard to follow and would for example make it harder to replace the
cache implementation in an arbitrary client.
This change separates the two aspects by moving the caching policy
to a separate field in Config, replacing NativeObjectOutput with a
NativeObjectStream class which only deals with streams and does not need to
be overridden by most clients and introducing an AddFile callback for adding
files (e.g. from the cache) to the link.
Differential Revision: https://reviews.llvm.org/D24622
llvm-svn: 282299
As the development of GlobalISel move forward, this statistic should
strictly decrease until it reaches zero. At this point, it would mean
GlobalISel can replace SDISel (at least on the tested inputs :P).
llvm-svn: 282275
Collect statistics about the number of partial mappings dynamically
allocated and accessed. Ultimately, when the whole TableGen
infrastructure is set, those numbers should be zero.
llvm-svn: 282274
Summary: When identifying cold blocks, consider only the edge to the normal destination if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst.
Reviewers: mcrosier, hfinkel, davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D24868
llvm-svn: 282262
This patch corresponds to review:
https://reviews.llvm.org/D21135
This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld
In order to improve some build_vector and extractelement patterns.
llvm-svn: 282246
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.
llvm-svn: 282241
If inserting more than one constant into a vector:
define <4 x float> @foo(<4 x float> %x) {
%ins1 = insertelement <4 x float> %x, float 1.0, i32 1
%ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
ret <4 x float> %ins2
}
InstCombine could reduce that to a shufflevector:
define <4 x float> @goo(<4 x float> %x) {
%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1
Differential Revision: https://reviews.llvm.org/D24182
llvm-svn: 282237
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.
llvm-svn: 282231
Summary:
For AMDGPU, we have been using the operating system component of the triple
for specifying the low-level runtime that is being used. The rationale for
this is that the host operating system (e.g. Linux) is irrelevant for GPU code,
since its execution enviroment will be mostly controled by the low-level runtime
being used to execute the code.
In most cases, higher level languages have their own runtime which is
implemented on top of the low-level runtime. The kernel ABIs of each
language mostly depend on the low-level runtime, but there may be some
slight differences between languages. OpenCL for example, may append
additional arguments to the kernel in order to pass values like global
offsets or buffers for printf. OpenMP, HCC, or other languages may want
to add their own values which differ from OpenCL.
The reason for adding a new opencl environment type is to make it possible for the backend
to distinguish between the ABIs of the higher-level languages and handle them correctly.
It seems cleaner to use the enviroment component for this rather than creating a new
OS type for every combination of low-level runtime / high-level language.
Reviewers: Anastasia, chandlerc
Subscribers: whchung, pekka.jaaskelainen, wdng, yaxunl, llvm-commits
Differential Revision: https://reviews.llvm.org/D24735
llvm-svn: 282218
Statically instanciate the most common PartialMappings. This should
be closer to what the code would look like when TableGen support is
added for GlobalISel. As a side effect, this should improve compile
time.
llvm-svn: 282215
In the verify method of the ValueMapping class we used to check that the
mapping exactly matches the bits of the input value. This is problematic
for statically allocated mappings because we would need a different
mapping for each different size of the value that maps on one
instruction. For instance, with such scheme, we would need a different
mapping for a value of size 1, 5, 23 whereas they all end up on a 32-bit
wide instruction.
Therefore, change the verifier to check that the meaningful bits are
covered by the mapping instead of matching them.
llvm-svn: 282214
This is another step toward TableGen'ed like structures. The BreakDown of
the mapping of the value will be statically computed by TableGen, thus
we only have to point to the right entry in the table instead of
dynamically allocate the mapping for each instruction.
We still support the dynamic allocation through a factory of
PartialMapping to ease the bring-up of the targets while the TableGen
backend is not available.
llvm-svn: 282213
We already have the udiv variant of this transform, so I think this is ok for
InstCombine too even though there is an increase in IR instructions. As the
tests and TODO comments show, the transform can lead to follow-on combines.
This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672
Differential Revision: https://reviews.llvm.org/D24527
llvm-svn: 282209
Currently all nodes get added to the NextSU list when they are released,
so any candidate must be in that list, making the heuristic ineffective.
Remove it for now, we can add it back later in a working fashion if
necessary.
llvm-svn: 282200
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"
It's causing compiler crashes building Harfbuzz (PR30499).
llvm-svn: 282199
USR_OVF is a subregister of USR, which is a member of CtrRegs. Having both
a register and its proper subregister in the same register class has bad
consequences for lane mask calculation: based solely on the lane mask info,
USR_OVF would not appear to be a subregister of USR.
llvm-svn: 282192
According to MSDN (see the PR), functions which don't touch any callee-saved
registers (including %rsp) don't need any unwind info.
This patch makes LLVM not emit unwind info for such functions, to save
binary size.
Differential Revision: https://reviews.llvm.org/D24748
llvm-svn: 282185
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.
llvm-svn: 282182
A recent patch added support for consumeInteger() and made
getAsInteger delegate to this function. A few buildbots are
failing as a result with an assertion failure. On a hunch,
I tested what happens if I call getAsInteger() on an empty
string, and sure enough it crashes the same way that the
buildbots are crashing.
I confirmed that getAsInteger() on an empty string did not
crash before my patch, so I suspect this to be the cause.
I also added a unit test for the empty string.
llvm-svn: 282170
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Differential Revision: https://reviews.llvm.org/D24517
llvm-svn: 282168
StringRef::getInteger() exists and treats the entire string as
an integer of the specified radix, failing if any invalid characters
are encountered or the number overflows.
Sometimes you might have something like "123456foo" and you want
to get the number 123456 and leave the string "foo" remaining.
This is similar to what would be possible by using the standard
runtime library functions strtoul et al and specifying an end
pointer.
This patch adds consumeInteger(), which does exactly that. It
consumes as much as possible until an invalid character is found,
and modifies the StringRef in place so that upon return only
the portion of the StringRef after the number remains.
Differential Revision: https://reviews.llvm.org/D24778
llvm-svn: 282164
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.
A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:
if (isa<TerminatorInst>(I))
return false;
is better than listing all the other less frequent instructions.
Differential Revision: https://reviews.llvm.org/D23929
llvm-svn: 282160
The additional fix is:
When adding debug information to a lowered phi node in mem2reg
check that we have a valid insertion point after the phi for adding
the debug information.
This change addresses the issue in pr30468 where a lowered phi was
added before a catchswitch and no debug information should be added
after the phi in this case.
Differential Revision: https://reviews.llvm.org/D24797
llvm-svn: 282155
This patch corresponds to:
https://reviews.llvm.org/D21409
The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords
in the vector register when in little-endian mode. Custom code ensures that the
necessary swaps are inserted for these. This patch simply removes the possibilty
that a load/store node will match one of these instructions in the SDAG as that
would not insert the necessary swaps.
llvm-svn: 282144
This patch corresponds to review:
https://reviews.llvm.org/D19825
The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.
llvm-svn: 282143
For MIPS '#' is the start of comment line. Therefore we get assembler errors if # is used in the structure names.
Differential: D24334
Reviewed by: zhaoqin
llvm-svn: 282141
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.
We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.
This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.
llvm-svn: 282132
This commit is basically the first step toward what will
RegisterBankInfo look when it gets TableGen'ed.
It introduces a XXXGenRegisterBankInfo.def file that is what TableGen
will issue at some point. Moreover, the RegBanks field in
RegisterBankInfo changed to reflect the static (compile time) aspect of
the information.
llvm-svn: 282131
When initializing an instance of OperandsMapper, instead of using
SmallVector::resize followed by std::fill, use the function that
directly does that in SmallVector.
llvm-svn: 282130
load commands. Added a missing check and made the check for more than
one like other other “more than one” checks. And of course added test cases.
llvm-svn: 282104
Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.
This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.
This improves an internal workload by 2.2x.
Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'
Differential Revision: https://reviews.llvm.org/D24564
llvm-svn: 282101
With the new LTO API in r278338, we stopped emitting the individual
index files and imports files for some modules in the distributed backend
case (thinlto-index-only plugin option).
Specifically, this is when the linker decides not to include a module in the
link, because it was in an archive library and did not have a strong
reference to it. Not creating the expected output files makes the
distributed build system implementation more difficult, in terms of
checking for the expected outputs of the thin link, and scheduling the
backend jobs. To address this, the gold-plugin will write dummy empty
.thinlto.bc and .imports files for modules not included in the link
(which LTO never sees).
Augmented a gold v1.12+ test, since that version of gold has the handling
for notifying on modules not being included in the link.
llvm-svn: 282100
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.
Differential Revision: https://reviews.llvm.org/D24275
llvm-svn: 282087
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.
Reviewers: dnovillo
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D24788
llvm-svn: 282084
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.
llvm-svn: 282069
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).
llvm-svn: 282068
The postRA scheduler performs alias analysis to determine if stores and loads
can moved past each other. When a function has more arguments than argument
registers for the calling convention used, excess arguments are spilled onto the
stack. LLVM by default assumes that argument slots are immutable, unless the
function contains a tail call. Without the knowledge of that a function contains
a tail call site, stores and loads to fixed stack slots may be re-ordered
causing the out-going arguments to clobber the incoming arguments before the
incoming arguments are supposed to be dead.
Reviewers: vkalintiris
Differential Review: https://reviews.llvm.org/D24077
llvm-svn: 282063
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model. This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.
Reviewers: t.p.northover, peter.smith, rovka
Subscribers: salim.nasser, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D24702
llvm-svn: 282057
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.
We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.
With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.
llvm-svn: 282055
There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction.
This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required.
llvm-svn: 282052
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.
Reviewers: eliben, jpienaar
Subscribers: jlebar, vchuravy, cfe-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D9168
llvm-svn: 282045
Summary:
Emit an empty summary section, instead of no summary section, when
there are no global variables in the index. This ensures that LTO
will treat these files as ThinLTO inputs, instead of as regular
LTO inputs.
In addition to not being what the user likely intended when
compiling with -flto=thin, the current behavior is problematic for
distributed build systems that expect to get ThinLTO index and imports
files back for each input compiled with -flto=thin. Combining into
a single regular LTO module also reduces the backend parallelism.
And in the case where the index was suppressed due to uses in
inline assembly, combining into a single LTO module could provoke
renaming of duplicates that we were trying to prevent by suppressing
the index.
This change required a couple of fixes to handle the empty summary
section.
Reviewers: mehdi_amini
Subscribers: mehdi_amini, llvm-commits, pcc
Differential Revision: https://reviews.llvm.org/D24779
llvm-svn: 282037
TargetMachine::getNameWithPrefix and inline the result into the singular
caller." and "Remove more guts of TargetMachine::getNameWithPrefix and
migrate one check to the TLOF mach-o version." temporarily until I can
get the whole call migrated out of the TargetMachine as we could hit
places where TLOF isn't valid.
This reverts commits r281981 and r281983.
llvm-svn: 282028
Summary:
This is an NFC refactoring change as a precursor to the actual fix for rematerializing in
presence of phi.
https://reviews.llvm.org/D24399
Pasted from review:
findRematerializableChainToBasePointer changed to return the root of the
chain. instead of true or false.
move the PHI matching logic into the caller by inspecting the root return value.
This includes an assertion that the alternate root is in the liveset for the
call.
Tested with current RS4GC tests.
Reviewers: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24780
llvm-svn: 282023
This check caused us to skip adding layout information for calls to
alloca in sspreq/sspstrong mode. We check properly for sspstrong later
on (and add the correct layout info when doing so), so removing this
shouldn't hurt.
No test is included, since testing this using lit seems to require
checking for exact offsets in asm, which is something that the lit tests
for this avoid. If someone cares deeply, I'm happy to write a unittest
or something to cover this, but that feels like overkill.
Patch by Daniel Micay.
Differential Revision: https://reviews.llvm.org/D22714
llvm-svn: 282022
Previously, such section would be marked as SHT_PROGBITS which
makes it impossible to use an initialized C variable declaration
to emit an (allocated) ELF note. The new behavior is also consistent
with ELF assembly parser.
Differential Revision: https://reviews.llvm.org/D24692
llvm-svn: 282010
that use the Mach::dylib_command type for the load commands that are
currently used in the MachOObjectFile constructor.
This contains the missing checks for LC_ID_DYLIB, LC_ID_DYLIB, etc.
load commands and the fields for the Mach::dylib_command type.
Also checks that only an MH_DYLIB or MH_STUB_DYLIB has an
LC_ID_DYLIB load command (and others filetype don’t) and there
is not more than one of these load commands.
llvm-svn: 282008