Summary:
If one of the uses of the value is a single edge PHINode, handle it.
Original:
%val = something
<suspend>
%p = PHINode [%val]
After Spill + Part13:
%val = something
%slot = gep val.spill.slot
store %val, %slot
<suspend>
%p = load %slot
Plus tiny fixes/changes:
* use correct index for coro.free in CoroCleanup
* fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24242
llvm-svn: 281020
llvm-cov writes out an index file in '-output-dir' mode, albeit not a
very informative one. Try to fix that by using the CoverageReport API to
include some basic summary information in the index file.
llvm-svn: 281011
This test should have broken after r280896. Fix up the test case
speculatively, since I don't have a way to test it.
I wonder why I didn't get any angry bot emails about this. Maybe none of
the win32 bots test llvm-cov? That could explain it, since the test says
it 'REQUIRES: system-windows', which is restricted to win32 hosts.
Also: why is 'system-windows' not defined for non-win32 Windows bots?
llvm-svn: 281008
This adds more tests for shuffles where the output width does not match
the input width and/or the output is generated from more than two inputs.
llvm-svn: 281005
The REX prefix should be used on indirect jmps, but not direct ones.
For direct jumps, the unwinder looks at the offset to determine if
it's inside the current function.
Differential Revision: https://reviews.llvm.org/D24359
llvm-svn: 281003
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.
Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo
Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D24164
llvm-svn: 280995
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.
llvm-svn: 280992
The text and html coverage views take different approaches to emitting
highlighted regions. That's because this problem is easier in the text
view: there's no need to worry about escaping text or adding tooltip
content to a highlighted snippet.
Unfortunately, the html view didn't get region highlighting quite right.
This patch fixes the situation, bringing parity between the two views.
llvm-svn: 280981
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.
This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.
Differential Revision: https://reviews.llvm.org/D24271
llvm-svn: 280979
Fix the .arch asm parser to use the full set of features for the architecture
and any extensions on the command line. Add and update testcases accordingly
as well as add an extension that was used but not supported.
llvm-svn: 280971
And associated commits, as they broke the Thumb bots.
This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.
llvm-svn: 280967
Summary:
This allows specifying instructions that are available only in specific assembler variant. If AsmVariantName is specified then instruction will be presented only in MatchTable for this variant. If not specified then assembler variants will be determined based on AsmString.
Also this allows splitting assembler match tables in same way as it is done in dissasembler.
Reviewers: ab, tstellarAMD, craig.topper, vpykhtin
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D24249
llvm-svn: 280952
Materializing something like "-3" can be done as 2 instructions:
MOV r0, #3
MVN r0, r0
This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256.
There were no tests failing after changing this... :/
llvm-svn: 280928
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.
This is an initial step towards determining the sign bits of a vector (PR29079).
Differential Revision: https://reviews.llvm.org/D24253
llvm-svn: 280927
This reverts commit r280808.
It is possible that this change results in an infinite loop. This
is causing timeouts in some tests on ARM, and a Chromebook bot is
failing.
llvm-svn: 280918
Summary:
C allows to jump over variables declaration so lifetime.start can be
avoid before variable usage. To avoid false-positives on such rare cases
we detect them and remove from lifetime analysis.
PR27453
PR28267
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24321
llvm-svn: 280907
Summary:
When cloning blocks for prologue/epilogue we need to replicate the loop
structure from the original loop. It wasn't a problem for the innermost
loops, but it led to an incorrect loop info when we unrolled a loop with
a child loop - in this case created prologue-loop had a child loop, but
loop info didn't reflect that.
This fixes PR28888.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits, silvas
Differential Revision: https://reviews.llvm.org/D24203
llvm-svn: 280901
In r279628, we made SourceCoverageView list the binary associated with a
view and started adding labels (e.g "Source: foo" or "Function: bar") to
everything. Condense this information a bit to unclutter reports.
llvm-svn: 280896
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.
Instead, we can use the same logic the builder uses to determine whether the
attributes work out.
Differential Revision: https://reviews.llvm.org/D24315
llvm-svn: 280894
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)
Differential Revision: https://reviews.llvm.org/D23931
llvm-svn: 280888
Summary:
This file should be referenced from
thinlto-function-summary-callgraph-pgo.ll file,
but someone forgot to use it there. Everything worked because
we store pgo data about callsite blocks, so there is no need to have
pgo count of @func.
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24309
llvm-svn: 280882
Summary:
C allows to jump over variables declaration so lifetime.start can be
avoid before variable usage. To avoid false-positives on such rare cases
we detect them and remove from lifetime analysis.
PR27453
PR28267
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24321
llvm-svn: 280880
We can't create metadata-valued PHIs; don't try to do so when sinking.
I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).
Previously, running the test case would crash with:
Invalid use of metadata!
%.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
LLVM ERROR: Broken function found, compilation aborted!
llvm-svn: 280866
This is a revert of r280676 which was a revert of r280637;
ie, this is r280637 again. It was speculatively reverted to
help debug buildbot failures.
llvm-svn: 280861
Shadow uses need to be analyzed together, since each individual shadow
will only have a partial reaching def. All shadows together may cover
a given register ref, while each individual shadow may not.
llvm-svn: 280855
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.
Differential Revision: https://reviews.llvm.org/D24288
llvm-svn: 280837
The original commit was too aggressive about marking LibCalls as AAPCS. The
libcalls contain libc/libm/libunwind calls which are not AAPCS, but C.
llvm-svn: 280833
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.
Example:
define void @f(i32 %x, i32 %y) {
entry:
%p = icmp eq i32 %x, %y
br i1 %p, label %bb1, label %bb2
bb1:
tail call void @foo()
ret void
bb2:
tail call void @bar()
ret void
}
before:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne .LBB0_2
jmp foo
.LBB0_2:
jmp bar
after:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne bar
.LBB0_1:
jmp foo
I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.
This patch only does 32-bit, but 64-bit would work similarly.
Differential Revision: https://reviews.llvm.org/D24108
llvm-svn: 280832
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.
Differential Revision: https://reviews.llvm.org/D23424
llvm-svn: 280829
Summary:
Previously we were trying to represent this with the "contains" list of
the .cv_inline_linetable directive, which was not enough information.
Now we directly represent the chain of inlined call sites, so we know
what location to emit when we encounter a .cv_loc directive of an inner
inlined call site while emitting the line table of an outer function or
inlined call site. Fixes PR29146.
Also fixes PR29147, where we would crash when .cv_loc directives crossed
sections. Now we write down the section of the first .cv_loc directive,
and emit an error if any other .cv_loc directive for that function is in
a different section.
Also fixes issues with discontiguous inlined source locations, like in
this example:
volatile int unlikely_cond = 0;
extern void __declspec(noreturn) abort();
__forceinline void f() {
if (!unlikely_cond) abort();
}
int main() {
unlikely_cond = 0;
f();
unlikely_cond = 0;
}
Previously our tables gave bad location information for the 'abort'
call, and the debugger wouldn't snow the inlined stack frame for 'f'.
It is important to emit good line tables for this code pattern, because
it comes up whenever an asan bug occurs in an inlined function. The
__asan_report* stubs are generally placed after the normal function
epilogue, leading to discontiguous regions of inlined code.
Reviewers: majnemer, amccarth
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24014
llvm-svn: 280822
Summary:
LSV replaces multiple adjacent loads with one vectorized load and a
bunch of extractelement instructions. This patch makes the
extractelement instructions' names match those of the original loads,
for (hopefully) improved readability.
Reviewers: asbirlea, tstellarAMD
Subscribers: arsenm, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23748
llvm-svn: 280818
Summary:
This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.
Reviewers: scott-0, jmolloy, compnerd, rengolin
Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D24133
llvm-svn: 280808
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.
llvm-svn: 280807
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi
The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.
This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.
Added reproducible test cases to x86-sse4a.ll.
Differential Revision: https://reviews.llvm.org/D24256
llvm-svn: 280804
Summary:
The o32 ABI doesn't not support the TImode helpers. For the time being,
disable just the shift libcalls as they break recursive builds on MIPS.
Reviewers: sdardis
Subscribers: llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D24259
llvm-svn: 280798
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.
Fixes PR30188 (again).
llvm-svn: 280792
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.
Fixes PR30292.
llvm-svn: 280790
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).
On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.
I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).
Fixes PR30280.
llvm-svn: 280789
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.
Differential Revision: https://reviews.llvm.org/D24221
llvm-svn: 280785
This was originally submitted in r280549, and reverted in r280577
due to breaking one MSVC buildbot. The issue is that MSVC 2013
doesn't synthesize move constructors. So even though i was
writing std::move(A) it was copying it, leading to a bogus ArrayRef.
The solution here is to simply remove the std::vector<> from the
type, since it is unused and unnecessary. This way the ArrayRef
continues to point into the original memory backing the CVType.
llvm-svn: 280769
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).
Fixes PR30276.
For convenience, here's the commit message from r246507, which explains the
problem is greater detail:
[DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 280767
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch
Patch by Tom Stellard and Konstantin Zhuravlyov
Differential Revision: https://reviews.llvm.org/D21562
llvm-svn: 280747
This patch provides easy navigation to find the zero count lines, especially useful when the source file is very large.
Differential Revision: https://reviews.llvm.org/D23277
llvm-svn: 280739
This adds a copy of the demangler in libcxxabi.
The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.
The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:
No constexpr.
No alignas
On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi
My current plan is:
Commit something like this
Change lld to use it
Change lldb to use it as the fallback
Add a few #ifdefs so that exactly the same file can be used in
libcxxabi to export abi::__cxa_demangle.
Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.
llvm-svn: 280732
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.
This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.
Differential Revision: https://reviews.llvm.org/D24254
llvm-svn: 280715
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set). However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region). In this case we'd use static estimates. Since static
estimates for branches are determined independently, they are
inconsistent. Updating them can "randomly" inflate block frequencies.
I've run into this in a completely cold loop of h264ref from
SPEC. -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).
The new testcase demonstrate the problem. We check array elements
against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting
check. The block names should be self-explanatory.
In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).
There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated. These are the resulting branch and block
frequencies for the loop body:
check_1 (16)
(8) / |
eq_1 | (8)
\ |
check_2 (16)
(8) / |
eq_2 | (8)
\ |
check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
First we thread eq_1 -> check_2 to check_3. Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2. Changed frequencies are highlighted with * *:
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
\ eq_2 | (*0*)
\ \ |
` --- check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges. Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
/-- eq_2~ | (*0*)
(back edge) |
check_3 (*0*)
(*0*) / |
(loop exit) | (*0*)
|
(back edge)
As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.
There are a few potential problems here:
1. The profile data seems odd. There is a single profile sample of the
loop being entered. On the other hand, there are no weights inside the
loop.
2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.
3. We shouldn't create profile metadata that is calculated from static
estimation. I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling. Estimated probabilities should only be reflected in BPI/BFI.
Any one of these would probably fix the immediate problem. I went for 3
because I think it's a good policy to have and added a FIXME about 2.
Differential Revision: https://reviews.llvm.org/D24118
llvm-svn: 280713
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.
This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.
Thanks to mike.k for reporting this issue!
Reviewers: seanbruno, vkalintiris
Differential Review: https://reviews.llvm.org/D23795
llvm-svn: 280706
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it
there is no guarantee that this part of the stack will not be modified
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.
This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24093
llvm-svn: 280705
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.
llvm-svn: 280696
It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.
Taken from optimized output from clang's test case.
llvm-svn: 280695
This is a Windows ARM specific issue. If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up. This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.
For now, report a bundle as being unpredicatable. Although this is false, this
would trigger a failure case previously anyways, so this is no worse. That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.
Under certain circumstances, it may be possible to "predicate the bundle". This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence. If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.
llvm-svn: 280689
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.
The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured. Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).
llvm-svn: 280683
Summary:
Move early uses of spilled variables after CoroBegin.
For example, if a parameter had address taken, we may end up with the code
like:
define @f(i32 %n) {
%n.addr = alloca i32
store %n, %n.addr
...
call @coro.begin
This patch fixes the problem by moving uses of spilled variables after CoroBegin.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24234
llvm-svn: 280678
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
%.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"
This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.
Differential Revision: https://reviews.llvm.org/D24231
llvm-svn: 280660
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.
llvm-svn: 280648
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:
* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.
This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24068
llvm-svn: 280646
memcpy with ld/st.
When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.
Differential Revision: https://reviews.llvm.org/D23499
llvm-svn: 280617
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.
The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).
Fixes PR27721.
llvm-svn: 280614
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree. The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations. In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination. This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.
Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.
Fixes PR29151.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24117
llvm-svn: 280610
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.
Differential Revision: http://reviews.llvm.org/D24169
llvm-svn: 280600
Because the recent change about ODR type uniquing in the context,
we can reach types defined in another module during IR linking.
This triggered some assertions in case we IR link without starting
from an empty module. To alleviate that, we can self-map metadata
defined in the destination module so that they won't be visited.
Differential Revision: https://reviews.llvm.org/D23841
llvm-svn: 280599
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:
1. Sampling instructions by themselves don't need to run in WQM, only their
coordinate inputs need it (unless of course there is a dependent sampling
instruction). The initial scanInstructions step is modified accordingly.
2. When switching back from WQM to Exact, switch back as soon as possible.
This affects the logic in processBlock.
This should always be a win or at best neutral.
There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D22092
llvm-svn: 280590
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.
The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.
Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23131
llvm-svn: 280589
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.
llvm-svn: 280584
PowerPC assembly code in the wild, so it seems, has things like this:
bc+ 12, 28, .L9
This is a bit odd because the '+' here becomes part of the BO field, and the BO
field is otherwise the first operand. Nevertheless, the ISA specification does
clearly say that the +- hint syntax applies to all conditional-branch mnemonics
(that test either CTR or a condition register, although not the forms which
check both), both basic and extended, so this is supposed to be valid.
This introduces some asm-parser-only definitions which take only the upper
three bits from the specified BO value, and the lower two bits are implied by
the +- suffix (via some associated aliases).
Fixes PR23646.
llvm-svn: 280571
dcbf has an optional hint-like field, add support for the extended form and the
associated mnemonics (dcbfl and dcbflp).
Partially fixes PR24796.
llvm-svn: 280559
Before we were kind of imitating the behavior of a Yaml sequence
by outputting each record one after the other. This makes it a
little cumbersome when we want to go the other direction -- from
Yaml to Pdb. So this treats FieldList records as no different than
any other list of records, by printing them as a Yaml sequence with
the exact same format.
llvm-svn: 280549
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.
llvm-svn: 280545
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.
Differential Revision: https://reviews.llvm.org/D24190
llvm-svn: 280532
Previously we were splitting our records at 0xFFFF bytes, which the
Microsoft tools don't like.
Should fix failure on the new Windows self-host buildbot.
This length appears in microsoft-pdb/PDB/dbi/dbiimpl.h
llvm-svn: 280522
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.
Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.
This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126
Differential Revision: https://reviews.llvm.org/D23886
llvm-svn: 280504
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.
I've added the updated member index computation to the existing reverse
interleaved group test.
llvm-svn: 280497
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.
An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.
Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".
This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.
Differential Revision: https://reviews.llvm.org/D24143
llvm-svn: 280488
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2
That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).
The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()
Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.
This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.
Differential Revision: https://reviews.llvm.org/D24154
llvm-svn: 280482