Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.
This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.
llvm-svn: 237810
ld64 currently mishandles internal pointer relocations (i.e.
ARM64_RELOC_UNSIGNED referred to by section & offset rather than symbol). The
existing __cfstring clause was an early discovery and workaround for this, but
the problem is wider and we should avoid such relocations wherever possible for
now.
This code should be reverted to allowing internal relocations as soon as
possible.
PR23437.
llvm-svn: 237621
This was previously returning int. However there are no negative opcode
numbers and more importantly this was needlessly different from
MCInstrDesc::getOpcode() (which even is the value returned here) and
SDValue::getOpcode()/SDNode::getOpcode().
llvm-svn: 237611
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.
This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.
Patch by Luke Cheeseman.
Differential Revision: http://reviews.llvm.org/D9699
llvm-svn: 237579
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first two times this was committed (r229831, r233055), it caused several buildbot failures.
At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed.
llvm-svn: 237234
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
llvm-svn: 236916
We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there.
Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use.
llvm-svn: 236764
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
llvm-svn: 236613
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
This matches other assemblers and is less unexpected (e.g. PR23227).
On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both
agree on LShr. On COFF, I couldn't get my hands on an assembler yet,
so don't change the behavior. For now, don't change it on non-AArch64
Darwin either, as the other assembler is gas v1.38, which does an AShr.
llvm-svn: 235963
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.
This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.
Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).
llvm-svn: 235922
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.
llvm-svn: 235917
Summary:
Constant stores of f16 vectors can create NvCast nodes from various
operand types to v4f16 or v8f16 depending on patterns in the stored
constants. This patch adds nvcast rules with v4f16 and v8f16 values.
AArchISelLowering::LowerBUILD_VECTOR has the details on which constant
patterns generate the nvcast nodes.
Reviewers: jmolloy, srhines, ab
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9201
llvm-svn: 235610
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.
Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors. Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.
The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64. I am
adding a test here for completeness.
Reviewers: aemerson, rengolin, ab, jmolloy, srhines
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9166
llvm-svn: 235609
The CondOpt pass currently uses LiveIntervals to set the dead flag on a def. This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead.
This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64.
Reviewed by Chad Rosier and Zhaoshi.
llvm-svn: 235532
Enough concerns were raised that this optimization is pessimising some code patterns.
The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.
llvm-svn: 235491
The result is either an Untyped reg sequence, on ldN with N > 1, or
just the type of the input vector, on ld1. Don't force Untyped.
Instead, just use the type of the reg sequence.
This mirrors the behavior of createTuple, which feeds the LD1*_POST.
The narrow code path wasn't actually covered by tests, because V64
insert_vector_elt are widened to V128 before the LD1LANEpost combine
has the chance to run, usually.
The only case where it does run on V64 vectors is if the vector ops
legalizer ran. So, tickle the code with a ctpop.
Fixes PR23265.
llvm-svn: 235243
Found by code inspection, but breaking i16 at least breaks other tests.
They aren't checking this in particular though, so also add some
explicit tests for the already working types.
llvm-svn: 235148
A big-endian vector return needs a byte-swap which we aren't doing right now.
For now just bail on these cases to get correctness back.
llvm-svn: 235133
Fixed compilation with clang on some buildbots with "-Werror -Wmissing-field-initializers"
Related to: http://reviews.llvm.org/rL235089
llvm-svn: 235099
In order to introduce v8.1a-specific entities, Mappers should be aware of SubtargetFeatures available.
This patch introduces refactoring, that will then allow to easily introduce:
- v8.1-specific "pan" PState for PStateMapper (PAN extension)
- v8.1-specific sysregs for SysRegMapper (LOR,VHE extensions)
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8496
Patch by Tom Coxon
llvm-svn: 235089
The ARMv8 ARMARM states that for these instructions in A64 state:
"Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS).
Make the disassembler accept any encoding with these ignored bits set to 1.
llvm-svn: 234896
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).
llvm-svn: 234840
The patch is generated using clang-tidy misc-use-override check.
This command was used:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
-checks='-*,misc-use-override' -header-filter='llvm|clang' \
-j=32 -fix -format
http://reviews.llvm.org/D8925
llvm-svn: 234679
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.
Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.
llvm-svn: 234666
The spilled registers are pristine and thus, correctly handled by
the register scavenger and so on, but the liveness information is
strictly speaking wrong at this point.
Fix that.
llvm-svn: 234664
Using SchedAliases is convenient and works well for latency and resource
lookup for instructions. However, this creates an entry in
AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any
SchedReadAdvance since the lookup will fail.
http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!
llvm-svn: 234594
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.
Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.
f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.
While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.
Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755
llvm-svn: 234550
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.
The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.
This fixes rdar://problem/20470788.
llvm-svn: 234529
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.
This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.
<rdar://problem/17829180>
llvm-svn: 234462
We weren't checking the sign of the floating point immediate before translating
it to "fmov sD, wzr". Similarly for D-regs.
Technically "movi vD.2s, #0x80, lsl #24" would work most of the time, but it's
not a blessed alias (and I don't think it should be since people expect writing
sD to zero out the high lanes, and there's no dD equivalent). So an error it is.
rdar://20455398
llvm-svn: 234372
Instead of lowering SELECT to SELECT_CC which is further lowered later
immediately call the SELECT_CC lowering code. This is preferable
because:
- Avoids an unnecessary roundtrip through the legalization queues with
an intermediate node.
- More importantly: Lowered operations get visited last leading to SELECT_CC
getting visited with legalized operands and unlegalized ones for preexisting
SELECT_CC nodes. This does not hurt the current code (hence no testcase) but
is required for another patch I am working on.
Differential Revision: http://reviews.llvm.org/D8187
llvm-svn: 234334
v8.1a is renamed to architecture, accordingly to approaches in ARM backend.
Excess generic cpu is removed. Intended use: "generic" cpu with "v8.1a" subtarget feature
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8766
llvm-svn: 233810
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.
rdar://problem/19267165
llvm-svn: 233753
When we expand the RET_ReallyLR pseudo instruction we also need to transfer the
implicit operands.
The return register is an implicit operand and without it the liveness
calculation generates an incorrect live-out set for the patchpoint.
This fixes rdar://problem/19068476.
llvm-svn: 233635
method.
This enables the instprinter to print a different system register name based on
the feature bits of the per-function subtarget.
Differential Revision: http://reviews.llvm.org/D8668
llvm-svn: 233412
per-function subtarget.
Currently, code-gen passes the default or generic subtarget to the constructors
of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which
enables some targets (AArch64, ARM, and X86) to change their instprinter's
behavior based on the subtarget feature bits. Since the backend can now use
different subtargets for each function, instprinter has to be changed to use the
per-function subtarget rather than the default subtarget.
This patch takes the first step towards enabling instprinter to change its
behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to
AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the
various print methods table-gen auto-generates.
I will follow up with changes to instprinters of AArch64, ARM, and X86.
llvm-svn: 233411
Subtarget features must not be a part of the target machine. So, they are now not being stored in SysRegMapper, but provided each time fromString()/toString() are called
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8655
llvm-svn: 233386
Third element is to be added soon to "struct AArch64NamedImmMapper::Mapping". So its instances are renamed from ...Pairs to ...Mappings
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8582
llvm-svn: 233300
class AArch64NamedImmMapper is to become dependent of SubTargetFeatures, while class AArch64Operand don't have access to the latter.
So, AArch64NamedImmMapper constructor invocations are refactored away from methods of AArch64Operand.
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8579
llvm-svn: 233297
This code depended on a bug in the FindAssociatedSection function that would
cause it to return the wrong result for certain absolute expressions. Instead,
use EvaluateAsRelocatable.
llvm-svn: 233119
Simplify boolean expressions using `true` and `false` with `clang-tidy`
Patch by Richard Thomson.
Reviewed By: rengolin
Differential Revision: http://reviews.llvm.org/D8525
llvm-svn: 233089
This reverts commit r233055.
It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time.
llvm-svn: 233068
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first time this was committed (r229831), it caused several buildbot failures.
At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed.
Differential Revision: http://reviews.llvm.org/D8542
llvm-svn: 233055
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
llvm-svn: 233024
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Clang also has code for 'Ump', 'Utf', 'Usa', and 'Ush' but calls
llvm_unreachable() on this code path so they are not converted to a
constraint id at the moment.
No functional change intended.
Reviewers: t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D8177
llvm-svn: 232941
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
llvm-svn: 232842
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
llvm-svn: 232825
There are two main advantages to doing this
* Targets that only need to handle one of the formats specially don't have
to worry about the others. For example, x86 now only registers a
constructor for the COFF streamer.
* Changes to the arguments passed to one format constructor will not impact
the other formats.
llvm-svn: 232699
as we don't necessarily need to do this yet - though we could move
the base class to the TargetMachine as it isn't subtarget dependent.
This reverts commit r232103.
llvm-svn: 232665
Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson
Differential Revision: http://reviews.llvm.org/D8369
llvm-svn: 232562
Before this patch code wanting to create temporary labels for a given entity
(function, cu, exception range, etc) had to keep its own counter to have stable
symbol names.
createTempSymbol would still add a suffix to make sure a new symbol was always
returned, but it kept a single counter. Because of that, if we were to use
just createTempSymbol("cu_begin"), the label could change from cu_begin42 to
cu_begin43 because some other code started using temporary labels.
Simplify this by just keeping one counter per prefix and removing the various
specialized counters.
llvm-svn: 232535
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality, e.g.,
(v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
(v2i16 (truncate (v2i64)))))
->
(v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
(v2i32 (truncate (v2i64)))))))
This isn't really target-specific, and, as such, would best go in the
DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387
llvm-svn: 232459
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.
Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8173
llvm-svn: 232373
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.
PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there.
llvm-svn: 232165
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.
Original commit message:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
llvm-svn: 232093
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8171
llvm-svn: 232027
Summary:
I don't know why every singled backend had to redeclare its own DataLayout.
There was a virtual getDataLayout() on the common base TargetMachine, the
default implementation returned nullptr. It was not clear from this that
we could assume at call site that a DataLayout will be available with
each Target.
Now getDataLayout() is no longer virtual and return a pointer to the
DataLayout member of the common base TargetMachine. I plan to turn it into
a reference in a future patch.
The only backend that didn't have a DataLayout previsouly was the CPPBackend.
It now initializes the default DataLayout. This commit is NFC for all the
other backends.
Test Plan: clang+llvm ninja check-all
Reviewers: echristo
Subscribers: jfb, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8243
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231987
time. The target independent code was passing in one all the
time and targets weren't checking validity before using. Update
a few calls to pass in a MachineFunction where necessary.
llvm-svn: 231970
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
llvm-svn: 231840
Most are redundant, and they never seem to fire.
The V128 integer patterns already exist in the INS multiclass.
The duplicates only fire when the vector index type isn't i64,
because they accept "imm" instead of an explicit "i64", as the
instruction definition patterns do.
TLI::getVectorIdxTy is i64 on AArch64, so this should never happen.
Also, one of them had a typo: for i64, INSvi32lane was used.
I noticed because I mistakenly used an explicit i32 as the idx type,
and got ins.s for an i64 vector_insert.
The V64 patterns also don't seem to ever fire, as V64 vector
extract/insert are legalized to V128.
The equivalent float patterns are unique and useful, so keep them.
No functional change intended; none exhibited on the LIT and LNT tests.
llvm-svn: 231838
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.
llvm-svn: 231632
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.
llvm-svn: 231571
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!
<rdar://problem/20072968>
llvm-svn: 231527
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.
-- before
_extgotequiv:
.long _extfoo
_delta:
.long _extgotequiv-_delta
-- after
_delta:
.long L_extfoo$non_lazy_ptr-_delta
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_extfoo$non_lazy_ptr:
.indirect_symbol _extfoo
.long 0
llvm-svn: 231475
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:
-- before
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
-- after
.globl _foo
_foo:
.long 42
.globl _delta
Ltmp3:
.long _foo@GOT-Ltmp3
llvm-svn: 231474
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.
This does not represent a behavioural change and as such no tests were added.
Patch by: Richard Diamond.
Reviewers: jfb
Reviewed By: jfb
Subscribers: jfb, aemerson, t.p.northover, llvm-commits
Differential Revision: http://reviews.llvm.org/D7713
llvm-svn: 231250
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
accesses is produced, by making use of a new pseudo instruction. The patch also removes
two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).
llvm-svn: 231227