Summary:
This patch aims to generalize matching of the strided store accesses to more general masks.
The more general rule is to have consecutive accesses based on the stride:
[x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...]
All elements in the masks need not form a contiguous space, there may be gaps.
As before, undefs are allowed and filled in with adjacent element loads.
Reviewers: HaoLiu, mssimpso
Subscribers: mkuper, delena, llvm-commits
Differential Revision: https://reviews.llvm.org/D23646
llvm-svn: 289573
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.
This is a follow-up change to r286875.
This change fixes PR31038.
Reviewers: MatzeB
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26682
llvm-svn: 287142
Lower a = b * C where C = (2^n + 1) * 2^m to
add w0, w0, w0, lsl n
lsl w0, w0, m
Differential Revision: https://reviews.llvm.org/D229245
llvm-svn: 287019
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.
Differential revision: https://reviews.llvm.org/D26518
llvm-svn: 286907
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.
For example, the final generated code should be:
stp xzr, xzr, [x0]
instead of:
movi v0.2d, #0
str q0, [x0]
Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26561
llvm-svn: 286875
Summary:
Fix off-by-one indexing error in loop checking that inserted value was a
splat vector.
Add code to check that INSERT_VECTOR_ELT nodes constructing the splat
vector have the expected constant index values.
Reviewers: t.p.northover, jmolloy, mcrosier
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26409
llvm-svn: 286616
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
Differential revision: https://reviews.llvm.org/D22975
llvm-svn: 286523
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.
Differential Revision: https://reviews.llvm.org/D26394
llvm-svn: 286231
Add support for estimating the square root or its reciprocal and division or
reciprocal using the combiner generic Newton series.
Differential revision: https://reviews.llvm.org/D25291
llvm-svn: 284986
Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.
Differential Revision: https://reviews.llvm.org/D24808
llvm-svn: 284531
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables. As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified. One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.
This patch considers a limit that a target may set to the number of
destination addresses in a jump table.
Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.
Differential revision: https://reviews.llvm.org/D21940
llvm-svn: 282412
This reverts commit b7d42b0048f65346e9fa37fb65defeea7ce8c337 per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).
llvm-svn: 282000
This reverts commit ad8ca1528242e2a4cb363e3779309e70eb7a430e per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).
llvm-svn: 281999
Cleanup/change the code that checks for possible tailcall conventions to
look the same as the one in the X86 target. This makes the distinction
between calling conventions that can guarnatee tailcalls and the ones
that may tailcall more obvious.
- Add Swift to the mayTailCall list
- PreserveMost seemed to be incorrectly part of the guarnteed tail call
list, move it to the mayTailCall list.
llvm-svn: 281376
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
In the code to detect fixed-point conversions and make use of AArch64's special
instructions, we weren't prepared for weird types. The fptosi direction got
fixed recently, but not the similar sitofp code.
llvm-svn: 279852
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
Summary:
The DAG combine transformation that was generating the
aarch64_neon_vcvtfp2fxs node was assuming that all
inputs where legal and wasn't accounting that the input
could be a v4f64 if we're trying to do the transformation
before legalization. We now bail out in this case.
All illegal types besides v4f64 were already rejected.
Fixes https://llvm.org/bugs/show_bug.cgi?id=28877.
Reviewers: jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23261
llvm-svn: 278002
when constraint "w" is used on a 32-bit operand.
This enables compiling the following code, which used to error out in
the backend:
void foo1(int a) {
asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}
Fixes PR28633.
llvm-svn: 276344
Inference of the 'returned' attribute was fixed in r276008, lets try
turning the backend support back on.
This reverts commit r275677.
llvm-svn: 276081
r275042 reverted function-attribute inference for the 'returned' attribute
because the feature triggered self-hosting failures on ARM and AArch64. James
Molloy determined that the this-return argument forwarding feature, which
directly ties the returned input argument to the returned value, was the cause.
It seems likely that this forwarding code contains, or triggers, a subtle bug.
Disabling for now until we can track that down.
llvm-svn: 275677
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
This reverts commit r259387 because it inserts illegal code after legalization
in some backends where i64 OR type is illegal for example.
llvm-svn: 274573
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.
llvm-svn: 274337
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr. In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
llvm-svn: 274287
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Differential Revision: http://reviews.llvm.org/D20376
llvm-svn: 273403
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
Testing for specific CPUs has a number of problems, better use subtarget
features:
- When some tweak is added for a specific CPU it is often desirable for
the next version of that CPU as well, yet we often forget to add it.
- It is hard to keep track of checks scattered around the target code;
Declaring all target specifics together with the CPU in the tablegen
file is a clear representation.
- Subtarget features can be tweaked from the command line.
To discourage people from using CPU checks in the future I removed the
isCortexXX(), isCyclone(), ... functions. I added an getProcFamily()
function for exceptional circumstances but made it clear in the comment
that usage is discouraged.
Reformat feature list in AArch64.td to have 1 feature per line in
alphabetical order to simplify merging and sorting for out of tree
tweaks.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D20762
llvm-svn: 271555
A constant pool holding the address of a variable in equivalent to
a got entry. It produces exactly the same instruction sequence as a
got use and unlike a got use this is not uniqued by the linker.
llvm-svn: 271311
Canonicalize (srl (bswap i32 x), 16) to (rotr (bswap i32 x), 16), if the high
16-bits of x are zero. Similarly, canonicalize (srl (bswap i64 x), 32) to
(rotr (bswap i64 x), 32), if the high 32-bits of x are zero.
test_rev_w_srl16: test_rev_w_srl16:
and w8, w0, #0xffff and w8, w0, #0xffff
rev w8, w8 ---> rev16 w0, w8
lsr w0, w8, #16
test_rev_x_srl32: test_rev_x_srl32:
rev x8, x8 ---> rev32 x0, x8
lsr x0, x8, #32
llvm-svn: 270896
Summary: When emitting comparison for fp16, in addition to promote the LHS and RHS to fp32, we need to change the VT as well.
Reviewers: t.p.northover
Subscribers: t.p.northover, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19922
llvm-svn: 269151
Unlike xN/wN, the size of vN is genuinely ambiguous in the assembly, so we
should try to infer what was intended from the type. But only down to 64-bits
(vN can never represent sN, hN or bN).
llvm-svn: 269132
Summary:
This implements the lowering of the X constraint on
AArch64.
The default behaviour of the X constraint lowering is to
restrict it to "f". This is a problem because the "f"
constraint is not implemented on AArch64 and would be too
restrictive anyway. Therefore, the AArch64 hook will
lower this to "w" (if the operand is a floating point or
vector) or "r" otherwise.
The implementation is similar with the one added for
ARM (r267411).
This is the AArch64 side of the fix for http://llvm.org/PR26493
Reviewers: rengolin
Subscribers: aemerson, rengolin, llvm-commits, t.p.northover
Differential Revision: http://reviews.llvm.org/D19967
llvm-svn: 268907
This patch adds support for estimating the square root, its reciprocal and
division or reciprocal using the combiner generic reciprocal machinery.
llvm-svn: 268539
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer. I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.
This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.
Differential Revision: http://reviews.llvm.org/D19098
llvm-svn: 266818
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
llvm-svn: 266806
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.
I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.
It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.
Should fix PR25526.
llvm-svn: 266339
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D19007
llvm-svn: 266251
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).
llvm-svn: 265851
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.
llvm-svn: 265481
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.
This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.
Related to rdar://24207743
Differential Revision: http://reviews.llvm.org/D18680
llvm-svn: 265329
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.
There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.
Differential Revision: http://reviews.llvm.org/D18016
llvm-svn: 263092
Revert r262248 in an attempt to fix the clang-native-aarch64-full
bot and to investigate a performance regression in
SingleSource/Benchmarks/CoyoteBench/huffbench
llvm-svn: 262388
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
This is just a trivial implementation:
- Support only arguments passed in registers.
- Support only "plain" arguments, i.e., no sext/zext attribute.
At this point, it is possible to play with the IRTranslator on AArch64:
llc -mtriple arm64-<vendor>-<os> -print-machineinstrs <input.ll> -o - -global-isel
For now, we only support the translation of program with adds and returns.
Follow-up patches are on their way to add a test case (the MIRParser is
not ready as it is).
llvm-svn: 260600
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.
Also rename some variable to better reflect their usage.
llvm-svn: 258605
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.
Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
%cc1 = fcmp one float %a, %b
%cc2 = fcmp olt float %c, %d
%and = and i1 %cc1, %cc2
%r = select i1 %and, i32 %e, i32 %f
ret i32 %r
}
Assuming (%a < %b) and (%c < %d); we used to do:
fcmp s0, s1 # nzcv <- 1000
orr w8, wzr, #0x1 # w8 <- 1
csel w9, w8, wzr, mi # w9 <- 1
csel w8, w8, w9, gt # w8 <- 1
fcmp s2, s3 # nzcv <- 1000
cset w9, mi # w9 <- 1
tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000
csel w0, w0, w1, ne # w0 <- w0
We now do:
fcmp s2, s3 # nzcv <- 1000
fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000
fccmp s0, s1, #8, le # !le, so: nzcv <- 1000
csel w0, w0, w1, pl # !pl, so: w0 <- w1
In other words, we transformed:
(c < d) && ((a < b) || (a > b))
into:
(c < d) && (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
(c < d) && !((a u>= b) && (a u<= b))
Note that this problem doesn't occur in the test-suite.
changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
In general we can create code for arbitrary "... (and (and A B) C)"
sequences. We can also implement some "or" expressions, because
"(or A B)" is equivalent to "not (and (not A) (not B))" and we can
implement some negation operations. [...] However there is no way
to negate the result of a partial sequence.
Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
== ((a olt b) || (a ogt b))
== ((a ord b) && (a une b))
- (a ueq b)
== ((a uno b) || (a oeq b))
== ((a ule b) && (a uge b))
Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
(a || b)
= !(!a && !(b))
= !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2)
= !(!a && !b1 && !b2)
However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:
(a || b)
= !(!a && (!b))
= !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2)
= !(!a && b1 && b2)
This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.
llvm-svn: 258533
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.
llvm-svn: 258532
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type. This patch skips
lowering of VSETCC if the operand is an f16 vector.
v4 and v8 tests included.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15361
llvm-svn: 258471
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
I will commit fixes to other platforms as well.
PR26136
llvm-svn: 257929
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.
Differential Revision: http://reviews.llvm.org/D15515
llvm-svn: 255895
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
The target independent portion was committed as r255353.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15341
llvm-svn: 255821
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).
DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.
One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.
Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.
Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!
llvm-svn: 255089
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).
The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.
rdar://22491037
llvm-svn: 254475
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
SELECT_CC has the nasty property of having operands with unrelated
types. So if you do something like:
f32 = select_cc f16, f16, f32, f32, cc
You'd only look for the action for <select_cc, f32>, but never f16.
If the types are all legal, but the op isn't (as for f16 on AArch64,
or for f128 on x86_64/AArch64?), then you get into trouble.
For f128, we have softenSetCCOperands to handle this case.
Similarly, for f16, we can directly promote the CC operands.
llvm-svn: 253344
AArch64 has the ability to use the top 8-bits of an "address" for extra
information, with the memory subsystem automatically masking them off for loads
and stores. When that's happening, we can sometimes skip masks on memory
operations in the compiler.
However, this requires the host OS and support stack to preserve those bits so
it can't be enabled everywhere. In principle iOS 8.0 and above do take the
required precautions and but we'll put it under a flag for now.
llvm-svn: 252573
Summary:
Lowering this pattern early to an `EXTRACT_SUBREG` was making it impossible to match larger patterns in tblgen that use `extract_subvector(..., 0)` as part of the their input pattern.
It seems like there will exist somewhere a better way of specifying this pattern over all relevant register value types, but I didn't manage to find it.
Reviewers: t.p.northover, jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14207
llvm-svn: 252464
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.
Reviewers: rengolin, t.p.northover
Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14082
llvm-svn: 251401
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.
llvm-svn: 251324
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
llvm-svn: 250456
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.
Differential Revision: http://reviews.llvm.org/D13508
llvm-svn: 249550
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).
The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.
This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.
The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.
Differential Revision: http://reviews.llvm.org/D12635
llvm-svn: 248622
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248291
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).
Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)
Differential Revision: http://reviews.llvm.org/D12557
llvm-svn: 247429
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.
Differential Revision: http://reviews.llvm.org/D12425
llvm-svn: 246555
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.
Reported in rdar://22223823
Differential Revision: http://reviews.llvm.org/D12467
llvm-svn: 246462
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.
There should already be testing in place for this, so no further tests
were added.
Reviewers: jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12276
llvm-svn: 246030
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
This fixes http://llvm.org/PR24459
llvm-svn: 245641
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
llvm-svn: 245640
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
llvm-svn: 245035
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
llvm-svn: 244860
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.
This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.
This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.
This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.
Reviewers: Akira Hatanaka
llvm-svn: 244693
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:
- Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
- f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!
llvm-svn: 244595
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!
The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.
llvm-svn: 243926
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -aarch64-strict-align to decide whether strict alignment should be
forced.
rdar://problem/21529937
llvm-svn: 243516
This fix was suggested as part of D11345 and is part of fixing PR24141.
With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.
There is no NFC-intended other than that.
Differential Revision: http://reviews.llvm.org/D11531
llvm-svn: 243498
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.
clang and driver changes in http://reviews.llvm.org/D10524
Added -femulated-tls flag to select the emulated TLS model,
which will be used for old targets like Android that do not
support ELF TLS models.
Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.
Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.
TODO: Add proper DIE for emulated TLS variables.
Added new unit tests with emulated TLS.
Differential Revision: http://reviews.llvm.org/D10522
llvm-svn: 243438
This path add the aarch64 lowering of __builtin_thread_pointer. It uses
the already implemented AArch64ISD::THREAD_POINTER used in TLS generation.
llvm-svn: 243412
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
Differential Revision: http://reviews.llvm.org/D11408
llvm-svn: 243100
This is a new iteration of the reverted r238793 /
http://reviews.llvm.org/D8232 which wrongly assumed that any and/or
trees can be represented by conditional compare sequences, however there
are some restrictions to that. This version fixes this and adds comments
that explain exactly what types of and/or trees can actually be
implemented as conditional compare sequences.
Related to http://llvm.org/PR20927, rdar://18326194
Differential Revision: http://reviews.llvm.org/D10579
llvm-svn: 242436
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.
Differential Revision: http://reviews.llvm.org/D10977
llvm-svn: 241827
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11042
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11040
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11037
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11028
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.
llvm-svn: 241411
SDNode already had ops() which would iterate over the operands and return
SDUse*. This version instead gets the SDValue's out of the SDUse's so that
we can use foreach in more places.
Reviewed by David Blaikie.
llvm-svn: 240805
The patch triggers a miscompile on SPEC 2006 403.gcc with the (ref)
200.i and scilab.i inputs. I opened PR23866 to track analysis of this.
This reverts commit r238793.
llvm-svn: 239880
These are really immediate DUPs, and suffer from the same problem
with long instructions with a high/2 variant (e.g. smull).
By extending a MOVI (or DUP, before this patch), we can avoid an ext
on the other operand of the long instruction, e.g. turning:
ext.16b v0, v0, v0, #8
movi.4h v1, #0x53
smull.4s v0, v0, v1
into:
movi.8h v1, #0x53
smull2.4s v0, v0, v1
While there, add a now-necessary combine to fold (VT NVCAST (VT x)).
llvm-svn: 239799
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.
PR20927, rdar://18326194
Differential Revision: http://reviews.llvm.org/D8232
llvm-svn: 238793
This is important because of different addressing modes
depending on the address space for GPU targets.
This only adds the argument, and does not update
any of the uses to provide the correct address space.
llvm-svn: 238723
The raw non-instruction/constant form of this is still relying on being
able to access the pointee type from a pointer type - those will be
cleaned up later. For now, just focus on the cases where the pointee
type is easily accessible.
llvm-svn: 237958
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.
This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.
Patch by Luke Cheeseman.
Differential Revision: http://reviews.llvm.org/D9699
llvm-svn: 237579
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
llvm-svn: 236916
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.
This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.
Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).
llvm-svn: 235922
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.
llvm-svn: 235917
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.
Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors. Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.
The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64. I am
adding a test here for completeness.
Reviewers: aemerson, rengolin, ab, jmolloy, srhines
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9166
llvm-svn: 235609
Found by code inspection, but breaking i16 at least breaks other tests.
They aren't checking this in particular though, so also add some
explicit tests for the already working types.
llvm-svn: 235148
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.
Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.
f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.
While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.
Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755
llvm-svn: 234550
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.
This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.
<rdar://problem/17829180>
llvm-svn: 234462
Instead of lowering SELECT to SELECT_CC which is further lowered later
immediately call the SELECT_CC lowering code. This is preferable
because:
- Avoids an unnecessary roundtrip through the legalization queues with
an intermediate node.
- More importantly: Lowered operations get visited last leading to SELECT_CC
getting visited with legalized operands and unlegalized ones for preexisting
SELECT_CC nodes. This does not hurt the current code (hence no testcase) but
is required for another patch I am working on.
Differential Revision: http://reviews.llvm.org/D8187
llvm-svn: 234334
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.
rdar://problem/19267165
llvm-svn: 233753
Simplify boolean expressions using `true` and `false` with `clang-tidy`
Patch by Richard Thomson.
Reviewed By: rengolin
Differential Revision: http://reviews.llvm.org/D8525
llvm-svn: 233089
Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson
Differential Revision: http://reviews.llvm.org/D8369
llvm-svn: 232562
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality, e.g.,
(v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
(v2i16 (truncate (v2i64)))))
->
(v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
(v2i32 (truncate (v2i64)))))))
This isn't really target-specific, and, as such, would best go in the
DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387
llvm-svn: 232459
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
llvm-svn: 231840
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.
This does not represent a behavioural change and as such no tests were added.
Patch by: Richard Diamond.
Reviewers: jfb
Reviewed By: jfb
Subscribers: jfb, aemerson, t.p.northover, llvm-commits
Differential Revision: http://reviews.llvm.org/D7713
llvm-svn: 231250
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
accesses is produced, by making use of a new pseudo instruction. The patch also removes
two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).
llvm-svn: 231227
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.
llvm-svn: 230699
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.
llvm-svn: 230583
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.
As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.
llvm-svn: 230245
This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.
Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>
llvm-svn: 230241
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.
There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.
llvm-svn: 230118
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.
Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.
Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering
llvm-svn: 229413
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229218
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.
Fortunately this is easy enough, just extend or truncate the naturally
compared result.
I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.
Should fix PR21645.
llvm-svn: 228518
from a conditional branch fed by an add/sub/mul-with-overflow node.
We previously used the SDLoc of the overflow node, for no good reason.
In some cases, this led to the Bcc and B terminators having different
source orders, and DBG_VALUEs being inserted between them.
The real issue is with the code that can't handle DBG_VALUEs between
terminators: the few places affected by this will be fixed soon.
In the meantime, fixing the SDLoc is a positive change no matter what.
No tests, as I have no idea how to get .loc emitted for branches?
rdar://19347133
llvm-svn: 228463
Original patch by Luke Iannini. Minor improvements and test added by
Erik de Castro Lopo.
Differential Revision: http://reviews.llvm.org/D6877
From: Erik de Castro Lopo <erikd@mega-nerd.com>
llvm-svn: 226473
type (in addition to the memory type).
The *LoadExt* legalization handling used to only have one type, the
memory type. This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.
However, this isn't always the case. For instance, on X86, with AVX,
this is legal:
v4i32 load, zext from v4i8
but this isn't:
v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.
Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.
Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.
Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior. The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)
No functional change intended.
Differential Revision: http://reviews.llvm.org/D6532
llvm-svn: 225421
Weak externals are resolved statically, so we can actually generate the tail
call on PE/COFF targets without breaking the requirements. It is questionable
whether we want to propagate the current behaviour for MachO as the requirements
are part of the ARM ELF specifications, and it seems that prior to the SVN
r215890, we would have tail'ed the call. For now, be conservative and only
permit it on PE/COFF where the call will always be fully resolved.
llvm-svn: 225119
In the large code model we have to first get the address of the GOT entry, load
the address of the constant, and then load the constant itself.
To avoid these loads and the GOT entry alltogether this commit changes the way
how FP constants are materialized in the large code model. The constats are now
materialized in a GPR and then bitconverted/moved into the FPR.
Reviewed by Tim Northover
Fixes rdar://problem/16572564.
llvm-svn: 223941