alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
FYI, re-committing this with a tweak so MemoryOp's default
constructor is trivial and will work with MSVC 2012. Thanks
to Reid Kleckner and Jim Grosbach for help with the tweak.
rdar://11312406
llvm-svn: 205986
It doesn't build with MSVC 2012, because MSVC doesn't allow union
members that have non-trivial default constructors. This change added
'SMLoc AlignmentLoc' to MemoryOp, which made MemoryOp's default ctor
non-trivial.
This reverts commit r205930.
llvm-svn: 205944
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
rdar://11312406
llvm-svn: 205930
This consolidates the duplicated MachO checks in the directive parsing for
various directives that are unsupported for Mach-O. The error message change is
unimportant as this restores the behaviour to that prior to the addition of the
new directive handling. Furthermore, use a more direct check for MachO
targeting rather than an indirect feature check of the assembler.
Also simplify the test execution command to avoid temporary files. Further more,
perform the check in both object and assembly emission.
Whether all non-applicable directives are handled is another question. .fnstart
is marked as being unsupported, however, the complementary .fnend is not. The
additional unwinding directives are also still honoured. This change does not
change that, though, it would be good to validate and mark them as being
unsupported if they are unsupported for the MachO emission.
llvm-svn: 205678
Removed "GNU Assembler extension (compatibility)" definitions from ARMInstrInfo.td
Fixed ARMAsmParser::ParseInstruction GNU compatability branch, so it also works for thumb mode from now.
Added new tests.
llvm-svn: 205622
Implementing this via ComputeMaskedBits has two advantages:
+ It actually works. DAGISel doesn't deal with the chains properly
in the previous pattern-based solution, so they never trigger.
+ The information can be used in other DAG combines, as well as the
trivial "get rid of truncs". For example if the trunc is in a
different basic block.
rdar://problem/16227836
llvm-svn: 205540
The terminal barrier of a cmpxchg expansion will be either Acquire or
SequentiallyConsistent. In either case it can be skipped if the
operation has Monotonic requirements on failure.
rdar://problem/15996804
llvm-svn: 205535
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).
Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:
1. an atomicrmw followed by using the *new* value can be more
efficient. As an IR pass, simple CSE could handle this
efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
optimisation.
I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).
llvm-svn: 205525
The trouble as in ARMAsmParser, in ParseInstruction method. It assumes that ARM::R12 + 1 == ARM::SP.
It is wrong, since ARM::<Register> codes are generated by tablegen and actually could be any random numbers.
llvm-svn: 205524
add operation since extract_vector_elt can perform an extend operation. Get the input lane
type from the vector on which we're performing the vpaddl operation on and extend or
truncate it to the output type of the original add node.
llvm-svn: 205523
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.
llvm-svn: 205453
Unlike other v6+ processors, cortex-m0 never supports unaligned accesses.
From the v6m ARM ARM:
"A3.2 Alignment support: ARMv6-M always generates a fault when an unaligned
access occurs."
rdar://16491560
llvm-svn: 205452
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.
Patch by Reinoud Elhorst.
llvm-svn: 205409
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.
llvm-svn: 205309
Issue subject: Crash using integrated assembler with immediate arithmetic
Fix description:
Expressions like 'cmp r0, #(l1 - l2) >> 3' could not be evaluated on asm parsing stage,
since it is impossible to resolve labels on this stage. In the end of stage we still have
expression (MCExpr).
Then, when we want to encode it, we expect it to be an immediate, but it still an expression.
Patch introduces a Fixup (MCFixup instance), that is processed after main encoding stage.
llvm-svn: 205094
I started trying to fix a small issue, but this code has seen a small fix too
many.
The old code was fairly convoluted. Some of the issues it had:
* It failed to check if a symbol difference was in the some section when
converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
requiring a quiet nasty work around in ARM.
* It was missing comments.
llvm-svn: 205076
Fix description:
Expressions like 'cmp r0, #(l1 - l2) >> 3' could not be evaluated on asm parsing stage,
since it is impossible to resolve labels on this stage. In the end of stage we still have
expression (MCExpr).
Then, when we want to encode it, we expect it to be an immediate, but it still an expression.
Patch introduces a Fixup (MCFixup instance), that is processed after main encoding stage.
llvm-svn: 204899
vector list parameter that is using all lanes "{d0[], d2[]}" but can
match and instruction with a ”{d0, d2}" parameter.
I’m finishing up a fix for proper checking of the unsupported
alignments on vld/vst instructions and ran into this. Thus I don’t
have a test case at this time. And adding all code that will
demonstrate the bug would obscure the very simple one line fix.
So if you would indulge me on not having a test case at this
time I’ll instead offer up a detailed explanation of what is
going on in this commit message.
This instruction:
vld2.8 {d0[], d2[]}, [r4:64]
is not legal as the alignment can only be 16 when the size is 8.
Per this documentation:
A8.8.325 VLD2 (single 2-element structure to all lanes)
<align> The alignment. It can be one of:
16 2-byte alignment, available only if <size> is 8, encoded as a = 1.
32 4-byte alignment, available only if <size> is 16, encoded as a = 1.
64 8-byte alignment, available only if <size> is 32, encoded as a = 1.
omitted Standard alignment, see Unaligned data access on page A3-108.
So when code is added to the llvm integrated assembler to not match
that instruction because of the alignment it then goes on to try to match
other instructions and comes across this:
vld2.8 {d0, d2}, [r4:64]
and and matches it. This is because of the method
ARMOperand::isVecListDPairSpaced() is missing the check of the Kind.
In this case the Kind is k_VectorListAllLanes . While the name of the method
may suggest that this is OK it really should check that the Kind is
k_VectorList.
As the method ARMOperand::isDoubleSpacedVectorAllLanes() is what was
used to match {d0[], d2[]} and correctly checks the Kind:
bool isDoubleSpacedVectorAllLanes() const {
return Kind == k_VectorListAllLanes && VectorList.isDoubleSpaced;
}
where the original ARMOperand::isVecListDPairSpaced() does not check
the Kind:
bool isVecListDPairSpaced() const {
if (isSingleSpacedVectorList()) return false;
return (ARMMCRegisterClasses[ARM::DPairSpcRegClassID]
.contains(VectorList.RegNum));
}
Jim Grosbach has reviewed the change and said: Yep, that sounds right. …
And by "right" I mean, "wow, that's a nasty latent bug I'm really, really
glad to see fixed." :)
rdar://16436683
llvm-svn: 204861
We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
llvm-svn: 204813
After some discussion on IRC, emitting a call to the library function seems
like a better default, since it will move from a compiler internal error to
a linker error, that the user can work around until LLVM is fixed.
I'm also adding a note on the responsibility of the user to confirm that
the cache was cleared on platforms where nothing is done.
llvm-svn: 204806
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
llvm-svn: 204802
When a label is parsed, check if there is type information available for the
label. If so, check if the symbol is a function. If the symbol is a function
and we are in thumb mode and no explicit thumb_func has been emitted, adjust the
symbol data to indicate that the function definition is a thumb function.
The application of this inferencing is improved value handling in the object
file (the required thumb bit is set on symbols which are thumb functions). It
also helps improve compatibility with binutils.
The one complication that arises from this handling is the MCAsmStreamer. The
default implementation of getOrCreateSymbolData in MCStreamer does not support
tracking the symbol data. In order to support the semantics of thumb functions,
track symbol data in assembly streamer. Although O(n) in number of labels in
the TU, this is already done in various other streamers and as such the memory
overhead is not a practical concern in this scenario.
llvm-svn: 204544
Sicne MBB->computeRegisterLivenes() returns Dead for sub regs like s0,
d0 is used in vpop instead of updating sp, which causes s0 dead before
its use.
This patch checks the liveness of each subreg to make sure the reg is
actually dead.
llvm-svn: 204411
Given
bar = foo + 4
.long bar
MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.
Fixing this unfortunately required some other changes:
* Since the variable is now evaluated, it would prevent the ELF writer from
noticing the weakref marker the elf streamer uses. This patch then replaces
that with a VariantKind in MCSymbolRefExpr.
* Using VariantKind then requires us to look past other VariantKind to see
.weakref bar,foo
call bar@PLT
doing this also fixes
zed = foo +2
call zed@PLT
so that is a good thing.
* Looking past VariantKind means that the relocation selection has to use
the fixup instead of the target.
This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.
llvm-svn: 204294
The revision I'm reverting breaks handling of transitive aliases. This blocks us
and breaks sanitizer bootstrap:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/2651
(and checked locally by Alexey).
This revision is the result of:
svn merge -r204059:204058 -r204028:204027 -r203962:203961 .
+ the regression test added to test/MC/ELF/alias.s
Another way to reproduce the regression with clang:
$ cat q.c
void a1();
void a2() __attribute__((alias("a1")));
void a3() __attribute__((alias("a2")));
void a1() {}
$ ~/work/llvm-build/bin/clang-3.5-good -c q.c && mv q.o good.o && \
~/work/llvm-build/bin/clang-3.5-bad -c q.c && mv q.o bad.o && \
objdump -t good.o bad.o
good.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 q.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 g F .text 0000000000000006 a1
0000000000000000 g F .text 0000000000000006 a2
0000000000000000 g F .text 0000000000000006 a3
bad.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 q.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 g F .text 0000000000000006 a1
0000000000000000 g F .text 0000000000000006 a2
0000000000000000 g .text 0000000000000000 a3
llvm-svn: 204137
Add an assertion that a valid section is referenced. The potential NULL pointer
dereference was identified by the clang static analyzer.
llvm-svn: 204114
This performs the equivalent of a .set directive in that it creates a symbol
which is an alias for another symbol or value which may possibly be yet
undefined. This directive also has the added property in that it marks the
aliased symbol as being a thumb function entry point, in the same way that the
.thumb_func directive does.
The current implementation fails one test due to an unrelated issue. Functions
within .thumb sections are not marked as thumb_func. The result is that
the aliasee function is not valued correctly.
llvm-svn: 204059
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&. At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
llvm-svn: 203865
This is a follow-up to r203635. Saleem pointed out that since symbolic register
names are much easier to read, it would be good if we could turn them off only
when we really need to because we're using an external assembler.
Differential Revision: http://llvm-reviews.chandlerc.com/D3056
llvm-svn: 203806
Support to the IAS was added to actually parse and handle the complex SO
expressions. However, the object file lowering was not updated to compensate
for the fact that the shift operand may be an absolute expression.
When trying to assemble to an object file, the lowering would fail while
succeeding when emitting purely assembly. Add an appropriate test.
The test case is inspired by the test case provided by Jiangning Liu who also
brought the issue to light.
llvm-svn: 203762
When the list of VFP registers to be saved was non-contiguous (so multiple
vpush/vpop instructions were needed) these were being ordered oddly, as in:
vpush {d8, d9}
vpush {d11}
This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't
match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be
broken).
This switches the order of vpush/vpop (in both prologue and epilogue,
obviously) so that the Dwarf locations are correct again.
rdar://problem/16264856
llvm-svn: 203655
It seems gas can't handle CFI directives with VFP register names ("d12", etc.).
This broke us trying to build Chromium for Android after 201423.
A gas bug has been filed: https://sourceware.org/bugzilla/show_bug.cgi?id=16694
compnerd suggested making this conditional on whether we're using the integrated
assembler or not. I'll look into that in a follow-up patch.
Differential Revision: http://llvm-reviews.chandlerc.com/D3049
llvm-svn: 203635
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not. Previously, this option was entirely ignored on the ARM
target and only honoured on x86.
This option is mostly useful in profiling scenarios. The default remains that
tail call optimisations will be applied.
llvm-svn: 203577
This option is from 2010, designed to work around a linker issue on Darwin for
ARM. According to grosbach this is no longer an issue and this option can
safely be removed.
llvm-svn: 203576
Tail call optimisation was previously disabled on all targets other than
iOS5.0+. This enables the tail call optimisation on all Thumb 2 capable
platforms.
The test adjustments are to remove the IR hint "tail" to function invocation.
The tests were designed assuming that tail call optimisations would not kick in
which no longer holds true.
llvm-svn: 203575
ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no
need for any code to handle them specially.
There should be no functionality change so no tests.
llvm-svn: 203567
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.
This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.
llvm-svn: 203437
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags. Instead, grab the definitions out of
support.
No functionality change.
Reviewers: grosbach, Bigcheese, rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2998
llvm-svn: 203211
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
This is a preliminary setup change to support a renaming of Windows target
triples. Split the object file format information out of the environment into a
separate entity. Unfortunately, file format was previously treated as an
environment with an unknown OS. This is most obvious in the ARM subtarget where
the handling for macho on an arbitrary platform switches to AAPCS rather than
APCS (as per Apple's needs).
llvm-svn: 203160
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.
Another step of modularizing the support library.
llvm-svn: 202815
This is a temporary workaround for native arm linux builds:
PR18996: Changing regalloc order breaks "lencod" on native arm linux builds.
llvm-svn: 202433
scan the register file for sub- and super-registers.
No functionality change intended.
(Tests are updated because the comments in the assembler output are
different.)
llvm-svn: 202416
.align is handled specially on certain targets. .align without any parameters
on ARM indicates a default alignment (4). Handle the special case in the target
parser, but fall back to the generic parser for the normal version.
llvm-svn: 201988
This adds support for the .short and its alias .hword for adding literal values
into the object file. This is similar to the .word directive, however, rather
than inserting a value of 4 bytes, adds a 2-byte value.
llvm-svn: 201968
This commit moves getSLEB128Size() and getULEB128Size() from
MCAsmInfo to LEB128.h and removes some copy-and-paste code.
Besides, this commit also adds some unit tests for the LEB128
functions.
llvm-svn: 201937
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.
This fixes pr18900.
llvm-svn: 201711
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.
They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.
llvm-svn: 201700
The IR
@foo = private constant i32 42
is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.
One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.
What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.
One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).
llvm-svn: 201608
ldrd r6, r7 [r2, #15]
simply gives an error and does not triggers an assertion.
As Jim points out, the diagnostic is really strange here,
but fixing that would be more complicated. The missing
comma results in the parser expecting a construct like r2[2],
which is the vector index thing the error message is talking
about. That's not what the user intended, though, and there's
nothing else in the instruction that looks at all like a vector.
Yet more fallout from not having a real parser here and trying
to do context-free generic matching for addressing modes.
rdar://15097243
llvm-svn: 201531
NaCl's ARM ABI uses 16 byte stack alignment, so set that in
ARMSubtarget.cpp.
Using 16 byte alignment exposes an issue in code generation in which a
varargs function leaves a 4 byte gap between the values of r1-r3 saved
to the stack and the following arguments that were passed on the
stack. (Previously, this code only needed to support 4 byte and 8
byte alignment.)
With this issue, llc generated:
varargs_func:
sub sp, sp, #16
push {lr}
sub sp, sp, #12
add r0, sp, #16 // Should be 20
stm r0, {r1, r2, r3}
ldr r0, .LCPI0_0 // Address of va_list
add r1, sp, #16
str r1, [r0]
bl external_func
Fix the bug by checking for "Align > 4". Also simplify the code by
using OffsetToAlignment(), and update comments.
Differential Revision: http://llvm-reviews.chandlerc.com/D2677
llvm-svn: 201497
This adds a partial implementation of the .arch_extension directive to the
integrated ARM assembler. There are a number of limitations to this
implementation arising from the target backend support rather than the
implementation itself. Namely, iWMMXT (v1 and v2), Maverick, and XScale support
is not present in the ARM backend. Currently, there is no check for A-class
only (needed for virt), and no ARMv6k detection (needed for os and sec). The
remainder of the extensions are fully supported.
llvm-svn: 201471
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201237
* CPRCs may be allocated to co-processor registers or the stack – they may never be allocated to core registers
* When a CPRC is allocated to the stack, all other VFP registers should be marked as unavailable
The difference is only noticeable in rare cases where there are a large number of floating point arguments (e.g.
7 doubles + additional float, double arguments). Although it's probably still better to avoid vmov as it can cause
stalls in some older ARM cores. The other, more subtle benefit, is to minimize difference between the various
calling conventions.
rdar://16039676
llvm-svn: 201193
Similarly to the vshrn instructions, these are simple zext/sext + trunc
operations. Using normal LLVM IR should allow for better code, and more sharing
with the AArch64 backend.
llvm-svn: 201093
For A- and R-class processors, r12 is not normally callee-saved, but is for
interrupt handlers. See AAPCS, 5.3.1.1, "Use of IP by the linker".
llvm-svn: 201089
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.
llvm-svn: 201085
These methods normally call each other and it is really annoying if the
arguments are in different order. The more common rule was that the arguments
specific to call are first (GV, Encoding, Suffix) and the auxiliary objects
(Mang, TM) come after. This patch changes the exceptions.
llvm-svn: 201044
According to the AAPCS, when a CPRC is allocated to the stack, all other
VFP registers should be marked as unavailable.
I have also modified the rules for allocating non-CPRCs to the stack, to make
it more explicit that all GPRs must be made unavailable. I cannot think of a
case where the old version would produce incorrect answers, so there is no test
for this.
llvm-svn: 200970
In a previous commit (r199818) we added a const_cast to an existing
subtarget info instead of creating a new one so that we could reuse
it when creating the TargetAsmParser for parsing inline assembly.
This cast was necessary because we needed to reuse the existing STI
to avoid generating incorrect code when the inline asm contained
mode-switching directives (e.g. .code 16).
The root cause of the failure was that there was an implicit sharing
of the STI between the parser and the MCCodeEmitter. To fix a
different but related issue, we now explicitly pass the STI to the
MCCodeEmitter (see commits r200345-r200351).
The const_cast is no longer necessary and we can now create a fresh
STI for the inline asm parser to use.
Differential Revision: http://llvm-reviews.chandlerc.com/D2709
llvm-svn: 200929
In Thumb1 mode, bl instruction might be selected for branches between
basic blocks in the function if the offset is greater than 2KB.
However, this might cause SEGV because the destination symbol
is not marked as thumb function and the execution mode will be reset
to ARM mode.
Since we are sure that these symbols are in the same data fragment, we
can simply resolve these local symbols, and don't emit any relocation
information for this bl instruction.
llvm-svn: 200842
This patch fixes the ldr-pseudo implementation to work when used in
inline assembly. The fix is to move arm assembler constant pools
from the ARMAsmParser class to the ARMTargetStreamer class.
Previously we kept the assembler generated constant pools in the
ARMAsmParser object. This does not work for inline assembly because
a new parser object is created for each blob of inline assembly.
This patch moves the constant pools to the ARMTargetStreamer class
so that the constant pool will remain alive for the entire code
generation process.
An ARMTargetStreamer class is now required for the arm backend.
There was no existing implementation for MachO, only Asm and ELF.
Instead of creating an empty MachO subclass, we decided to make the
ARMTargetStreamer a non-abstract class and provide default
(llvm_unreachable) implementations for the non constant-pool related
methods.
Differential Revision: http://llvm-reviews.chandlerc.com/D2638
llvm-svn: 200777
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.
llvm-svn: 200768
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.
This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.
llvm-svn: 200706
The .object_arch directive indicates an alternative architecture to be specified
in the object file. The directive does *not* effect the enabled feature bits
for the object file generation. This is particularly useful when the code
performs runtime detection and would like to indicate a lower architecture as
the requirements than the actual instructions used.
llvm-svn: 200451
.movsp is an ARM unwinding directive that indicates to the unwinder that a
register contains an offset from the current stack pointer. If the offset is
unspecified, it defaults to zero.
llvm-svn: 200449
This enhances the ARMAsmParser to handle .tlsdescseq directives. This is a
slightly special relocation. We must be able to generate them, but not consume
them in assembly. The relocation is meant to assist the linker in generating a
TLS descriptor sequence. The ELF target streamer is enhanced to append
additional fixups into the current segment and that is used to emit the new
R_ARM_TLS_DESCSEQ relocations.
llvm-svn: 200448
Add support for tlsdesc relocations which are part of the ABI, marked as
experimental. These relocations permit the linker to perform TLS reference
optimizations.
llvm-svn: 200447
This adds support for TLS CALL relocations. TLS CALL relocations are used to
indicate to the linker to generate appropriate entries to resolve TLS references
via an appropriate function invocation (e.g. __tls_get_addr(PLT)).
In order to accomodate the linker relaxation of the TLS access model for the
references (GD/LD -> IE, IE -> LE), the relocation addend must be incomplete.
This requires that the partial inplace value is also incomplete (i.e. 0). We
simply avoid the offset value calculation at the time of the fixup adjustment in
the ARM assembler backend.
llvm-svn: 200446
After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.
This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.
Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.
The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.
llvm-svn: 200388
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.
llvm-svn: 200350
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.
Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.
llvm-svn: 200271
This brings MC into line with GNU 'as' on ARM, and it brings the ARM
target into line with most other LLVM targets, which declare the
initial CFI state with addInitialFrameState().
Without this, functions generated with .cfi_startproc/endproc on ARM
will tend to cause GDB to abort with:
gdb/dwarf2-frame.c:1132: internal-error: Unknown CFA rule.
I've also tested this by comparing the output of "readelf -w" on the
object files produced by llvm-mc and gas when given the .s file added
here.
This change is part of addressing PR18636.
Differential Revision: http://llvm-reviews.chandlerc.com/D2597
llvm-svn: 200255
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.
This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.
Differential Revision: http://llvm-reviews.chandlerc.com/D2622
llvm-svn: 200248
If a complex expression was passed to the .word directive and the first part of
the directive failed to parse, a secondary diagnostic would be produced that
would clutter the error diagnostics. Improve the diagnostics by consuming the
remainder of the statement.
llvm-svn: 200160
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.
llvm-svn: 200129
There is no inline asm in a .s file. Therefore, there should be no logic to
handle it in the streamer. Inline asm only exists in bitcode files, so the
logic can live in the (long misnamed) AsmPrinter class.
llvm-svn: 200011
Originally, BLX was passed as operand #0 in MachineInstr and as operand
#2 in MCInst. But now, it's operand #2 in both cases.
This patch also removes unnecessary FileCheck in the test case added by r199127.
llvm-svn: 199928
With constant-sharing, litpool loads consume 4 + N*2 bytes of code, but
movw/movt pairs consume 8*N. This means litpools are better than movw/movt even
with just one use. Other materialisation strategies can still be better though,
so the logic is a little odd.
llvm-svn: 199891
This patch restores the ARM mode if the user's inline assembly
does not. In the object streamer, it ensures that instructions
following the inline assembly are encoded correctly and that
correct mapping symbols are emitted. For the asm streamer, it
emits a .arm or .thumb directive.
This patch does not ensure that the inline assembly contains
the ADR instruction to switch modes at runtime.
The problem we need to solve is code like this:
int foo(int a, int b) {
int r = a + b;
asm volatile(
".align 2 \n"
".arm \n"
"add r0,r0,r0 \n"
: : "r"(r));
return r+1;
}
If we compile this function in thumb mode then the inline assembly
will switch to arm mode. We need to make sure that we switch back to
thumb mode after emitting the inline assembly or we will incorrectly
encode the instructions that follow (i.e. the assembly instructions
for return r+1).
Based on patch by David Peixotto
Change-Id: Ib57f6d2d78a22afad5de8693fba6230ff56ba48b
llvm-svn: 199818
This implements the unwind_raw directive for the ARM IAS. The unwind_raw
directive takes the form of a stack offset value followed by one or more bytes
representing the opcodes to be emitted. The opcode emitted will interpreted as
if it were assembled by the opcode assembler via the standard unwinding
directives.
Thanks to Logan Chien for an extra test!
llvm-svn: 199707
The .personalityindex directive is equivalent to the .personality directive with
the ARM EABI personality with the specific index (0, 1, 2). Both of these
directives indicate personality routines, so enhance the personality directive
handling to take into account personalityindex.
Bonus fix: flush the UnwindContext at the beginning of a new function.
Thanks to Logan Chien for additional tests!
llvm-svn: 199706
optional DWARF sections, so compiling with -g does not result in
different code being generated for PC-relative loads.
This is reapplying a diet r197922 (__TEXT-only).
llvm-svn: 199681
Ensure that the tag types are reflected on a replacement. This is particularly
important for the compatibility tag which has multiple representations where the
last definition wins.
llvm-svn: 199577
This moves the ARM build attributes definitions and support routines into the
Support library. The support routines simply permit the conversion of the value
to and from a string representation.
The movement is prompted in order to permit access to the constants and string
representations from readobj in order to facilitate decoding of the attributes
section.
llvm-svn: 199575
Fix MLA defs to use register class GPRnopc.
Add encoding tests for multiply instructions.
(Alias for MUL/SMLAL/UMLAL added by r199026.)
Patch by Zhaoshi.
llvm-svn: 199491
When expanding neon pseudo stores, it may miss the implicit uses of sub
regs, which may cause post RA scheduler reorder instructions that
breakes anti dependency.
For example:
VST1d64QPseudo %R0<kill>, 16, %Q9_Q10, pred:14, pred:%noreg
will be expanded to
VST1d64Q %R0<kill>, 16, %D18, pred:14, pred:%noreg;
An instruction that defines %D20 may be scheduled before the store by
mistake.
This patches adds implicit uses for such case. For the example above, it
emits:
VST1d64Q %R0<kill>, 8, %D18, pred:14, pred:%noreg, %Q9_Q10<imp-use>
llvm-svn: 199282
The changes caused by folding an sp-adjustment into a "pop" previously
disrupted the forward search for the final real instruction in a
terminating block. This switches to a backward search (skipping debug
instrs).
This fixes PR18399.
Patch by Zhaoshi.
llvm-svn: 199266
The already allocatable DPair superclass contains odd-even D register
pair in addition to the even-odd pairs in the QPR register class. There
is no reason to constrain the set of D register pairs that can be used
for NEON values. Any NEON instructions that require a Q register will
automatically constrain the register class to QPR.
The allocation order for DPair begins with the QPR registers, so
register allocation is unlikely to change much.
llvm-svn: 199186
This will allow it to be called from target independent parts of the main
streamer that don't know if there is a registered target streamer or not. This
in turn will allow targets to perform extra actions at specified points in the
interface: add extra flags for some labels, extra work during finalization, etc.
llvm-svn: 199174
The issue is caused when Post-RA scheduler reorders a bundle instruction
(IT block). However, it only flips the CPSR liveness of the bundle instruction,
leaves the instructions inside the bundle unchanged, which causes inconstancy and crashes
Thumb2SizeReduction.cpp::ReduceMBB().
llvm-svn: 199127
Previously we only used GPR for the destination placeholder in "ldr rD, [pc,
incorrect codegen under the integrated assembler.
This should fix both issues (which probably only affect MachO targets at the
moment).
rdar://problem/15800156
llvm-svn: 199108
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics. This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.
Conformance to the semantics improves diagnostics emitted for the invalid
directives. X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.
llvm-svn: 199068
An improper qualifier would result in a superfluous error due to the parser not
consuming the remainder of the statement. Simply consume the remainder of the
statement to avoid the error.
llvm-svn: 199035
The implicit immediate 0 forms are assembly aliases, not distinct instruction
encodings. Fix the initial implementation introduced in r198914 to an alias to
avoid two separate instruction definitions for the same encoding.
An InstAlias is insufficient in this case as the necessary due to the need to
add a new additional operand for the implicit zero. By using the AsmPsuedoInst,
fall back to the C++ code to transform the instruction to the equivalent
_POST_IMM form, inserting the additional implicit immediate 0.
llvm-svn: 199032
A 32-bit immediate value can be formed from a constant expression and loaded
into a register. Add support to emit this into an object file. Because this
value is a constant, a relocation must *not* be produced for it.
llvm-svn: 199023
The disassembler would no longer be able to disambiguage between the two
variants (explicit immediate #0 vs implicit, omitted #0) for the ldrt, strt,
ldrbt, strbt mnemonics as both versions indicated the disassembler routine.
llvm-svn: 198944
The GNU assembler supports prefixing the expression with a '#' to indiciate that
the value that is being moved is infact a constant. This improves the
compatibility of the integrated assembler's parser for this.
llvm-svn: 198916
The GNU assembler has an extension that allows for the elision of the paired
register (dt2) for the LDRD and STRD mnemonics. Add support for this in the
assembly parser. Canonicalise the usage during the instruction parsing from
the specified version.
llvm-svn: 198915
The ARM ARM indicates the mnemonics as follows:
ldrbt{<c>}{<q>} <Rt>, [<Rn>], {, #+/-<imm>}
ldrt{<c>}{<q>} <Rt>, [<Rn>] {, #+/-<imm>}
strbt{<c>}{<q>} <Rt>, [<Rn>] {, #<imm>}
strt{<c>}{<q>} <Rt>, [<Rn>] {, #+/-<imm>}
This improves the parser to deal with the implicit immediate 0 for the mnemonics
as per the specification.
Thanks to Joerg Sonnenberger for the tests!
llvm-svn: 198914
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.
This removes the 'Writer.h' header which contained only a single function
declaration.
llvm-svn: 198836
Operands which involved label arithemetic would previously fail to parse. This
corrects that by adding the additional case for the shift operand validation.
llvm-svn: 198735
This commit adds the pre-UAL aliases of fconsts and fconstd for
vmov.f32 and vmov.f64. They use an InstAlias rather than a
MnemonicAlias to properly support the predicate operand.
We need to support encoded 8-bit constants in order to implement the
pre-UAL fconsts/fconstd aliases for vmov.f32/vmov.f64, so this
commit also fixes parsing of encoded floating point constants used
in vmov.f32/vmov.f64 instructions. Now we can support assembly code
like this:
fconsts s0, #0x70
which is equivalent to vmov.f32 s0, #1.0.
Most of the code was already in place to support this feature.
Previously the code was trying to accept encoded 8-bit float
constants for the vmov.f32/vmov.f64 instructions. It looks like the
support for parsing encoded floats was lost in a refactoring in
commit r148556 and we did not have any tests in place to catch it.
The change in this commit is to keep the parsed value as a 32-bit
float instead of a 64-bit double because that is what the isFPImm()
function expects to find. There is no loss of precision by using a
32-bit float here because we are still limited to an 8-bit encoded
value in the end.
Additionally, we explicitly reject encoded 8-bit floats for
vmovf.32/64. This is the same as the current behavior, but we now do
it explicitly rather than accidently.
llvm-svn: 198697
are part of the core IR library in order to support dumping and other
basic functionality.
Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.
Update all of the #includes to match.
All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.
llvm-svn: 198688
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.
Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.
llvm-svn: 198685
Move the unwinding context for the ARM IAS into a helper class. This is purely
a structural refactoring. A follow up change allows for recording additional
depth to improve diagnostics.
llvm-svn: 198664
Parse tag names as well as expressions. The former is part of the
specification, the latter is for improved compatibility with the GNU assembler.
Fix attribute value handling to be comformant to the specification.
llvm-svn: 198662
Introduce a new virtual method Note into the AsmParser. This completements the
existing Warning and Error methods. Use the new method to clean up the output
of the unwind routines in the ARM AsmParser.
llvm-svn: 198661
The ARM backend has been using most of the MachO related subtarget
checks almost interchangeably, and since the only target it's had to
run on has been IOS (which is all three of MachO, Darwin and IOS) it's
worked out OK so far.
But we'd like to support embedded targets under the "*-*-none-macho"
triple, which means everything starts falling apart and inconsistent
behaviours emerge.
This patch should pick a reasonably sensible set of behaviours for the
new triple (and any others that come along, with luck). Some choices
were debatable (notably FP == r7 or r11), but we can revisit those
later when deficiencies become apparent.
llvm-svn: 198617
Longer term, we want to move users to "*-*-*-macho" for embedded work, but for
now people are relying on the last thing we told them, which is unfortunately
"*-*-darwin-eabi".
rdar://problem/15703934
llvm-svn: 198602
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.
llvm-svn: 198579
Move the ARM EHABI unwind opcode definitions from the ARM MCTargetDesc into LLVM
Support. This enables sharing of the definitions across the ARM target code as
well as llvm-readobj. This will allow implementation of the unwind decoding in
llvm-readobj.
llvm-svn: 198576
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.
llvm-svn: 198531
Before this patch any program that wanted to know the final symbol name of a
GlobalValue had to link with Target.
This patch implements a compromise solution where the mangler uses DataLayout.
This way, any tool that already links with Target (llc, clang) gets the exact
behavior as before and new IR files can be mangled without linking with Target.
With this patch the mangler is constructed with just a DataLayout and DataLayout
is extended to include the information the Mangler needs.
llvm-svn: 198438
Checking the trailing letter of the mnemonic is insufficient. Be more thorough
in the scanning of the instruction to ensure that we correctly work with the
predicated mnemonics.
llvm-svn: 198235
The DPR and SPR register lists are also register lists. Furthermore, the
registers need not be checked individually since the register type can be
checked via the list kind. Use that to simplify the logic and fix the incorrect
assertion.
llvm-svn: 198174
Directive parsers must return false if the target assembler is interested in
handling the directive. The Error member function returns true always. Using
the 'return Error()' pattern would incorrectly indicate to the general parser
that the target was not interested in the directive, when in reality it simply
encountered a badly formed directive or some other error. This corrects the
behaviour to ensure that the parser behaves appropriately.
llvm-svn: 198132
Schedule more conservatively to account for stalls on floating point
resources and latency. Use the AGU resource to model latency stalls
since it's shared between FP and LD/ST instructions. This might not be
completely accurate but should work well in practice.
llvm-svn: 198125
Many vector operations never had itineraries. Since the new machine
model was a mapping from existing itinerary classes, we don't have a
model for these. We still want to migrate A9 even though no one has
invested in a complete model, so mark it incomplete to avoid the
scheduler asserting.
llvm-svn: 198123
The bkpt mnemonic has an implicit immediate constant of 0 unless otherwise
specified. Add an instruction alias for the unvalued breakpoint mnemonic to
treat it as a 0. This improves compatibility with GNU AS.
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 197913
We dump any non-empty assembler constant pools after a successful
parse of an assembly file that uses the ldr pseudo opcode. These
per-section constant pools should be output in a deterministic order
to ensure that we always generate the same output when printing the
output with an AsmStreamer.
This patch changes the map data struture used to associate a section
with its constant pool to a MapVector to ensure deterministic
output. Because this map type does not support deletion, we now
check that the constant pool is not empty before dumping its entries
and clear the entries after emitting them with the streamer.
llvm-svn: 197735
This directive will write out the assembler-maintained constant
pool for the current section. These constant pools are created to
support the ldr-pseudo instruction (e.g. ldr r0, =val).
The directive can be used by the programmer to place the constant
pool in a location that can be reached by a pc-relative offset in
the ldr instruction.
llvm-svn: 197711
The ldr-pseudo opcode is a convenience for loading 32-bit constants.
It is converted into a pc-relative load from a constant pool. For
example,
ldr r0, =0x10001
ldr r1, =bar
will generate this output in the final assembly
ldr r0, .Ltmp0
ldr r1, .Ltmp1
...
.Ltmp0: .long 0x10001
.Ltmp1: .long bar
Sketch of the LDR pseudo implementation:
Keep a map from Section => ConstantPool
When parsing ldr r0, =val
parse val as an MCExpr
get ConstantPool for current Section
Label = CreateTempSymbol()
remember val in ConstantPool at next free slot
add operand to ldr that is MCSymbolRef of Label
On finishParse() callback
Write out all non-empty constant pools
for each Entry in ConstantPool
Emit Entry.Label
Emit Entry.Value
Possible improvements to be added in a later patch:
1. Does not convert load of small constants to mov
(e.g. ldr r0, =0x1 => mov r0, 0x1)
2. Does reuse constant pool entries for same constant
The implementation was tested for ARM, Thumb1, and Thumb2 targets on
linux and darwin.
llvm-svn: 197708
This adds support for the .inst directive. This is an ARM specific directive to
indicate an instruction encoded as a constant expression. The major difference
between .word, .short, or .byte and .inst is that the latter will be
disassembled as an instruction since it does not get flagged as data.
llvm-svn: 197657
Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for
such selection, it needs to inverse cc and swap op1 and op2. To inverse
cc, both L/G and E bits should be flipped.
llvm-svn: 197615
According to "Addenda to ABI for ARM architecture", Tag_FP_arch is the
new name for the equivalent Tag_VFP_arch. This commit renames
Tag_VFP_arch to Tag_FP_arch.
llvm-svn: 197587
Clang sets the float-abi target option manually, but no longer
annotates each function with its ABI. This can lead to confusing
mistmatch between "clang -emit-llvm | llc" and normal clang
invocations.
Besides which, gnueabihf actually *is* hard-float. Defaulting to soft
was just perverse.
llvm-svn: 197554
were falling into the cases for 24-bit branch kinds which are not 24-bit
branches. The routine is to return false for fixups are expected to always
be resolvable at assembly time. Which these three fixups are as they have
limited displacement and are for local references within a function.
rdar://15586725
llvm-svn: 197282
The tests were no longer using fast-isel at all (MachO needs an "ios" rather
than "darwin" triple at the moment and Linux needs ARM mode). Once that was
corrected, the verifier complained about a t2ADDri created for the alloca.
llvm-svn: 197046
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.
This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.
llvm-svn: 196934
When trying to eliminate an "sub sp, sp, #N" instruction by folding
it into an existing push/pop using dummy registers, we need to account
for the fact that this might affect precisely how "fp" gets set in the
prologue.
We were attempting this, but assuming that *whenever* we performed a
fold it would make a difference. This is false, for example, in:
push {r4, r7, lr}
add fp, sp, #4
vpush {d8}
sub sp, sp, #8
we can fold the "sub" into the "vpush", forming "vpush {d7, d8}".
However, in that case the "add fp" instruction mustn't change, which
we were getting wrong before.
Should fix PR18160.
llvm-svn: 196725
- krait processor currently modeled with the same features as A9.
- Krait processor additionally has VFP4 (fused multiply add/sub)
and hardware division features enabled.
- krait has currently the same Schedule model as A9
- krait cpu flag is not recognized by the GNU assembler yet,
it is replaced with march=armv7-a to avoid a lower march
from being used.
llvm-svn: 196619
The current peephole optimizing for compare inst assumes an instr that
uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
(vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
they support the modification of Cond Code.
llvm-svn: 196588
We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.
Should fix PR18136.
llvm-svn: 196493
ARM symbol variants are written with parens instead of @ like this:
.word __GLOBAL_I_a(target1)
This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.
The MCAsmInfo parens flag is enabled only for ARM on ELF.
By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.
To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.
Updated Tests:
Changed case of symbol variant to match the generic kind.
test/CodeGen/ARM/tls-models.ll
test/CodeGen/ARM/tls1.ll
test/CodeGen/ARM/tls2.ll
test/CodeGen/Thumb2/tls1.ll
test/CodeGen/Thumb2/tls2.ll
PR18080
llvm-svn: 196424
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.
Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.
llvm-svn: 196111
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.
With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.
llvm-svn: 196090
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.
This should fix PR18081.
llvm-svn: 196046
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
llvm-svn: 195929
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
llvm-svn: 195927
These are handled almost identically to static mode (and ELF's global address
materialisation), except that a symbol may have "$non_lazy_ptr" appended. This
can be handled by passing appropriate flags along with the instruction instead
of using entirely separate pseudo-instructions.
llvm-svn: 195655
There is no sane way for an LEApcrel (= single ADR) instruction to generate a
global address on any ARM target I know of. Fortunately, no-one was trying to
any more, but there were vestigial patterns.
llvm-svn: 195644
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
llvm-svn: 195064
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
Base *foo = new Child();
delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.
llvm-svn: 194997
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
llvm-svn: 194865
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
llvm-svn: 194840
By default, the behavior of IT block generation will be determinated
dynamically base on the arch (armv8 vs armv7). This patch adds backend
options: -arm-restrict-it and -arm-no-restrict-it. The former one
restricts the generation of IT blocks (the same behavior as thumbv8) for
both arches. The later one allows the generation of legacy IT block (the
same behavior as ARMv7 Thumb2) for both arches.
Clang will support -mrestrict-it and -mno-restrict-it, which is
compatible with GCC.
llvm-svn: 194592
The system LDM and STM instructions can't usually writeback to the base
register. The one exception is when an LDM is actually an exception-return
(i.e. contains PC in the register list).
(There's already a test that "ldm sp!, {r0-r3, pc}^" works, which is why there
is no positive test).
rdar://problem/15223374
llvm-svn: 194512
This commit cleans up some comments in ARMBuildAttrs.h.
Besides, this commit fixes an error related to AllowWMMXv1
and AllowWMMXv2 (although they are not used currently.)
llvm-svn: 194327
ARM prologues usually look like:
push {r7, lr}
sub sp, sp, #4
If code size is extremely important, this can be optimised to the single
instruction:
push {r6, r7, lr}
where we don't actually care about the contents of r6, but pushing it subtracts
4 from sp as a side effect.
This should implement such a conversion, predicated on the "minsize" function
attribute (-Oz) since I've yet to find any code it actually makes faster.
llvm-svn: 194264
Cortex-M0 supports these 32-bit instructions despite being Thumb1 only
(mostly). We knew about that but not that the aliases without the default "sy"
operand were also permitted.
llvm-svn: 194094
ResolveFrameIndex had what appeared to be a very nasty hack for when the
frame-index referred to a callee-saved register. In this case it "adjusted" the
offset so that the address was correct if (and only if) the MachineInstr
immediately followed the respective push.
This "worked" for all forms of GPR & DPR but was only ever used to set the
frame pointer itself, and once this was put in a more sensible location the
entire state-tracking machinery it relied on became redundant. So I stripped
it.
The only wrinkle is that "add r7, sp, #0" might theoretically be slower (need
an actual ALU slot) compared to "mov r7, sp" so I added a micro-optimisation
that also makes emitARMRegUpdate and emitT2RegUpdate also work when NumBytes ==
0.
No test changes since there shouldn't be any functionality change.
llvm-svn: 194025
Add a Virtualization ARM subtarget feature along with adding proper build
attribute emission for Tag_Virtualization_use (encodes Virtualization and
TrustZone) and Tag_MPextension_use.
Also rework test/CodeGen/ARM/2010-10-19-mc-elf-objheader.ll testcase to
something that is more maintainable. This changes the focus of this
testcase away from testing CPU defaults (which is tested elsewhere), onto
specifically testing that attributes are encoded correctly.
llvm-svn: 193859
Fix Tag_ABI_HardFP_use build attribute to handle single precision FP,
replace deprecated Tag_ABI_HardFP_use value of 3 with 0 and also add
some tests for Tag_ABI_VFP_args.
llvm-svn: 193856
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
%tmp = zext <16 x i8> %a to <16 x i32>
store <16 x i32> %tmp, <16 x i32>*%p
ret void
}
Generates:
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__const
.align 5
LCPI0_0:
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.section __TEXT,__text,regular,pure_instructions
.globl _bar
.align 4, 0x90
_bar:
vpunpckhbw %xmm0, %xmm0, %xmm1
vpunpckhwd %xmm0, %xmm1, %xmm2
vpmovzxwd %xmm1, %xmm1
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vmovaps LCPI0_0(%rip), %ymm2
vandps %ymm2, %ymm1, %ymm1
vpmovzxbw %xmm0, %xmm3
vpunpckhwd %xmm0, %xmm3, %xmm3
vpmovzxbd %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vandps %ymm2, %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
vmovaps %ymm1, 32(%rdi)
vzeroupper
ret
So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
- the number of vector elements is even,
- the source type is legal,
- the type of a split source is illegal,
- the type of an extended (by doubling element size) source is legal, and
- the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."
With this change, the above example now compiles to:
_bar:
vpxor %xmm1, %xmm1, %xmm1
vpunpcklbw %xmm1, %xmm0, %xmm2
vpunpckhwd %xmm1, %xmm2, %xmm3
vpunpcklwd %xmm1, %xmm2, %xmm2
vinsertf128 $1, %xmm3, %ymm2, %ymm2
vpunpckhbw %xmm1, %xmm0, %xmm0
vpunpckhwd %xmm1, %xmm0, %xmm3
vpunpcklwd %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vmovaps %ymm0, 32(%rdi)
vmovaps %ymm2, (%rdi)
vzeroupper
ret
This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.
rdar://14735100
llvm-svn: 193727
Helper functions are added:
emitPostLd: emit a post-increment load operation with given size.
emitPostSt: emit a post-increment store operation with given size.
No functionality change.
llvm-svn: 193656
Adds a subtarget feature for the CRC instructions (optional in v8-A) to the ARM (32-bit) backend.
Differential Revision: http://llvm-reviews.chandlerc.com/D2036
llvm-svn: 193599
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.
radar://15336950
llvm-svn: 193573
an MCExpr, in order to avoid writing an encoded zero value in the immediate
field.
When getUnconditionalBranchTargetOpValue is called with an MCExpr target, we
don't know what the final immediate field value should be. We shouldn't
explicitly set the immediate field to an encoded zero value as zero is encoded
with a non-zero bit pattern. This leads to bits being set that pollute the
final immediate value. The nature of the encoding is such that the polluted
bits only affect very large immediate values, explaining why this hasn't
caused problems earlier.
Fixes <rdar://problem/15155975>.
llvm-svn: 193535
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
llvm-svn: 193524
When assembling, a .thumb_func directive is supposed to be applicable to the
next symbol definition, even if there are intervening directives. We were
racing ahead to try and find it, and this commit should fix the issue.
Patch by Gabor Ballabas
llvm-svn: 193403
There's a barrier instruction so that should still be used, but most actual
atomic operations are going to need a platform decision on the correct
behaviour (either nop if single-threaded or OS-support otherwise).
rdar://problem/15287210
llvm-svn: 193399
This commit changes the struct byval lowering for arm to use inline
checks for the subtarget instead of a class abstraction to represent
the differences. The class abstraction was judged to be too much
code for this task.
No intended functionality change.
llvm-svn: 193357
This prevents us from silently accepting invalid instructions on (for example)
Cortex-M4 with just single-precision VFP support.
No tests for the extra Pat Requires because they're essentially assertions: the
affected code should have been lowered to libcalls before ISel.
rdar://problem/15302004
llvm-svn: 193354
The fused multiply instructions were added in VFPv4 but are still NEON
instructions, in particular they shouldn't be available on a Cortex-M4 not
matter how floaty it is.
llvm-svn: 193342
If an alias inherits directly from InstAlias then it doesn't get any default
"Requires" values, so llvm-mc will allow it even on architectures that don't
support the underlying instruction.
This tidies up the obvious VFP and NEON cases I found.
llvm-svn: 193340
The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1
code to make use of VFP instructions by switching back to ARM mode, they make
no sense for M-class processors which don't even have an ARM mode.
Given that justification, in practice this is a platform ABI decision so the
actual check is based on that rather than CPU features.
rdar://problem/15302004
llvm-svn: 193327
POP instructions are aliased to the ARM LDM variants but have different syntax.
This caused two problems: we tried to access a non-existent operand to annotate
the '!', and the error message didn't make much sense.
With some vigorous hand-waving in the error message both problems can be
fixed.
llvm-svn: 193322
The set of circumstances where the writeback register is allowed to be in the
list of registers is rather baroque, but I think this implements them all on
the assembly parsing side.
For disassembly, we still warn about an ARM-mode LDM even if the architecture
revision is < v7 (the required architecture information isn't available). It's
a silly instruction anyway, so hopefully no-one will mind.
rdar://problem/15223374
llvm-svn: 193185
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
Error: cannot honor width suffix -- `ldrb r3,[r0],#1'
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
llvm-svn: 192916
This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32
pseudo-instruction in the ARM backend. We introduce a new helper
class that encapsulates all of the operations needed during the
lowering. The operations are implemented for each subtarget in
different subclasses. Currently only arm and thumb2 subtargets are
supported.
This refactoring was done to easily implement support for thumb1
subtargets. This initial patch does not add support for thumb1, but
is only a refactoring. A follow on patch will implement the support
for thumb1 subtargets.
No intended functionality change.
llvm-svn: 192915
When we had a sequence like:
s1 = VLDRS [r0, 1], Q0<imp-def>
s3 = VLDRS [r0, 2], Q0<imp-use,kill>, Q0<imp-def>
s0 = VLDRS [r0, 0], Q0<imp-use,kill>, Q0<imp-def>
s2 = VLDRS [r0, 4], Q0<imp-use,kill>, Q0<imp-def>
we were gathering the {s0, s1} loads below the s3 load. This is fine,
but confused the verifier since now the s3 load had Q0<imp-use> with
no definition above it.
This should mark such uses <undef> as well. The liveness structure at
the beginning and end of the block is unaffected, and the true sN
definitions should prevent any dodgy reorderings being introduced
elsewhere.
rdar://problem/15124449
llvm-svn: 192344
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.
The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.
I will send an email to llvmdev with instructions on how to use this.
llvm-svn: 192181
from struct byval to registers.
We used to pass 0 which means the alignment of PtrVT. Even when the alignment
of the struct is smaller than 4, the LOADs would have alignment of 4, and
further optimizations could combine the LOADs into a ldm, which would
cause crash.
The fix is to pass the alignment of the struct byval.
rdar://problem/15144402
llvm-svn: 192126
The hint instructions ("nop", "yield", etc) are mostly Thumb2-only, but have
been ported across to the v6M architecture. Fortunately, v6M seems to sit
nicely between v6 (thumb-1 only) and v6T2, so we can add a feature for it
fairly easily.
rdar://problem/15144406
llvm-svn: 192097
When MC was first added, targets could use hasRawTextSupport to keep features
working before they were added to the MC interface.
The design goal of MC is to provide an uniform api for printing assembly and
object files. Short of relaxations and other corner cases, a object file is
just another representation of the assembly.
It was never the intention that targets would keep doing things like
if (hasRawTextSupport())
Set flags in one way.
else
Set flags in another way.
When they do that they create two code paths and the object file is no longer
just another representation of the assembly. This also then requires testing
with llc -filetype=obj, which is extremelly brittle.
This patch removes some of these hacks by replacing them with smaller ones.
The ARM flag setting is trivial, so I just moved it to the constructor. For
Mips, the patch adds two temporary hack directives that allow the assembly
to represent the same things as the object file was already able to.
The hope is that the mips developers will replace the hack directives with
the same ones that gas uses and drop the -print-hack-directives flag.
I will also try to implement a target streamer interface, so that we can
move this out of the common code.
In summary, for any new work, two rules of the thumb are
* Don't use "llc -filetype=obj" in tests.
* Don't add calls to hasRawTextSupport.
llvm-svn: 192035
optimizeSelect folds (predicated) copy instructions, it must not ignore
the original register class of the operand when replacing the register
with the copies dest register.
llvm-svn: 191963
The jump doesn't really kill the registers, the following call does but
we never get back anyway.
This avoids some verify-machineinstrs problems when TAILJUMPs are
if-converted.
llvm-svn: 191962
This function-attribute modifies the callee-saved register list and function
epilogue (specifically the return instruction) so that a routine is suitable
for use as an interrupt-handler of the specified type without disrupting
user-mode applications.
rdar://problem/14207019
llvm-svn: 191766
For targets that have instruction itineraries this means no change. Targets
that move over to the new schedule model will use be able the new schedule
module for instruction latencies in the if-converter (the logic is such that if
there is no itineary we will use the new sched model for the latencies).
Before, we queried "TTI->getInstructionLatency()" for the instruction latency
and the extra prediction cost. Now, we query the TargetSchedule abstraction for
the instruction latency and TargetInstrInfo for the extra predictation cost. The
TargetSchedule abstraction will internally call "TTI->getInstructionLatency" if
an itinerary exists, otherwise it will use the new schedule model.
ATTENTION: Out of tree targets!
(I will also send out an email later to LLVMDev)
This means, if your target implements
unsigned getInstrLatency(const InstrItineraryData *ItinData,
const MachineInstr *MI,
unsigned *PredCost);
and returns a value for "PredCost", you now also need to implement
unsigned getPredictationCost(const MachineInstr *MI);
(if your target uses the IfConversion.cpp pass)
radar://15077010
llvm-svn: 191671
As specified in A8.8.72/A8.8.73/A8.8.74 in the ARM ARM, all variants of the ARM LDRD instruction have the following two constraints:
LDRD<c> <Rt>, <Rt2>, ...
(a) Rt must be even-numbered and not r14
(b) Rt2 must be R(t+1)
If those two constraints are not met the result of executing the instruction will be unpredictable.
Constraint (b) was already enforced, this commit adds support for constraint (a).
Fixes rdar://14479793.
llvm-svn: 191520
LDRD<c> <Rt>, <Rt2>, <label>
LDRD<c> <Rt>, <Rt2>, [<Rn>{, #+/-<imm>}]
LDRD<c> <Rt>, <Rt2>, [<Rn>], #+/-<imm>
LDRD<c> <Rt>, <Rt2>, [<Rn>, #+/-<imm>]!
As specified in A8.8.72/A8.8.73 in the ARM ARM, the T1 encoding has a constraint which enforces that Rt != Rt2.
If this constraint is not met the result of executing the instruction will be unpredictable.
Fixes rdar://14479780.
llvm-svn: 191504
Generally, it is desirable to distribute (a + b) * c to a*c + b*c for
ARM with VMLx forwarding, where a, b and c are vectors.
However, for (a + b)*(a + b), distribution will result in one extra
instruction.
With distribution:
x = a + b (add)
y = a * x (mul)
z = y + b * y (mla)
Without distribution:
x = a + b (add)
z = x * x (mul)
This patch checks if a mul is a square of add/sub. If yes, skip
distribution.
llvm-svn: 191410
This is being disabled because it is no longer needed for
performance. It is only used by postRAscheduler which is also planned
for removal, and it is implemented with an out-dated view of register
liveness. It consideres aliases instead of register units, assumes
valid kill flags, and assumes implicit uses on partial register
defs. Kill flags and implicit operands are error prone and impossible
to verify. We should gradually eliminate dependence on them in the
postRA phases.
Targets that still benefit from this should move to the MI
scheduler. If that doesn't solve the problem, then we should add a
hook to regalloc to optimize reload placement.
llvm-svn: 191348
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
The 'Deprecated' class allows you to specify a SubtargetFeature that the
instruction is deprecated on.
The 'ComplexDeprecationPredicate' class allows you to define a custom
predicate that is called to check for deprecation.
For example:
ComplexDeprecationPredicate<"MCR">
would mean you would have to define the following function:
bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI,
std::string &Info)
Which returns 'false' for not deprecated, and 'true' for deprecated
and store the warning message in 'Info'.
The MCTargetAsmParser constructor was chaned to take an extra argument of
the MCInstrInfo class, so out-of-tree targets will need to be changed.
llvm-svn: 190598
We were figuring out whether to use tPICADD or PICADD, then just using
tPICADD unconditionally anyway. Oops.
A testcase from someone familiar enough with ELF to produce one would
be appreciated. The existing PIC testcase correctly verifies the .s
generated, but that doesn't catch this bug, which only showed up in
direct-to-object mode.
http://llvm.org/bugs/show_bug.cgi?id=17180
llvm-svn: 190417
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.
Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.
<rdar://problem/13623355>
llvm-svn: 190290
These were pretty straightforward instructions, with some assembly support
required for HLT.
The ARM assembler is keen to split the instruction mnemonic into a
(non-existent) 'H' instruction with the LT condition code. An exception for
HLT is needed.
HLT follows the same rules as BKPT when in IT blocks, so the special BKPT
hadling code has been adapted to handle HLT also.
Regression tests added including diagnostic tests for out of range immediates
and illegal condition codes, as well as negative tests for pre-ARMv8.
llvm-svn: 190053
Solution is not sufficient to prevent 'mov pc, lr' being emitted for jump table code.
Test case doesn't trigger the added functionality.
llvm-svn: 190047
This improves code generation for jump tables by avoiding the emission of "mov pc, lr" which could fool the processor into believing this is a return from a function causing mispredicts. The code generation logic for jump tables uses ADR to materialize the address of the jump target.
Patch by Daniel Stewart!
llvm-svn: 190043
These instructions, such as vmul.f32, require the second source operand to
be in D0-D15 rather than the full D0-D31. When optimizing, make sure to
account for that by constraining the register class of a replacement virtual
register to be compatible with the virtual register(s) it's replacing.
I've been unsuccessful in creating a non-fragile regression test. This issue
was detected by the LLVM nightly test suite running on an A15 (Bullet).
PR17093: http://llvm.org/bugs/show_bug.cgi?id=17093
llvm-svn: 189972
This reverts commit r189648.
Fixes for the previously failing clang-side arm_neon_intrinsics test
cases will be checked in separately.
llvm-svn: 189841
What we really want is to enable Swift by default for *v7s triples (and there already seems to be some logic which attempts to do that). In that case the iOS version doesn't matter.
llvm-svn: 189763
In addition to recognizing when the multiply's second argument is
coming from an explicit VDUPLANE, also look for a plain scalar
f32 reference and reference it via the corresponding vector
lane.
rdar://14870054
llvm-svn: 189619
Fix a few things in one swoop.
# Add some negative tests.
# Fix some formatting issues.
# Add some missing IsThumb / ARMv8
# Fix some outs / ins mistakes.
llvm-svn: 189490
The usual default of "dmb ish" (inner-shareable) isn't even a valid instruction
on v6M or v7M (well, it does the same thing but software is strongly
discouraged from using it) so we should emit a full-system barrier there.
llvm-svn: 189483
The vqdmlal and vqdmlls instructions are really just a fused pair consisting of
a vqdmull.sN and a vqadd.sN. This adds patterns to LLVM so that we can switch
Clang's CodeGen over to generating these instead of the special vqdmlal
intrinsics.
llvm-svn: 189480
These instructions aren't particularly complicated and it's well worth having
patterns for some reasonably useful LLVM IR that will match them. Soon we
should be able to switch Clang over to producing this natural version.
llvm-svn: 189335
Get the register class right for the TST instruction. This keeps the
machine verifier happy, enabling us to turn it on for another test.
rdar://12594152
llvm-svn: 189274
The create machine code wasn't properly in SSA, which the machine verifier
properly complains about. Now that fast-isel is closer to verifier clean,
errors like this show up more clearly.
Additionally, the Thumb pseudo tPICADD was used for both ARM and Thumb
mode functions, which is obviously wrong. Fix that along the way.
Test case is part of the following commit which will finish making an
additional fast-isel test verifier clean an enable it for the
regression test suite. This commit is separate since its not just
a verifier cleanup, but an actual correctness issue.
rdar://12594152 (for the fast-isel verifier aspects)
llvm-svn: 189269
I'd forgotten that "Requires" blocks override rather than add to the
constraints, so my pseudo-instruction was being selected in Thumb mode leading
to nonsense instructions.
rdar://problem/14817358
llvm-svn: 189096
The instruction to convert between floating point and fixed point representations
takes an immediate operand for the number of fractional bits of the fixed point
value. ARMARM specifies that when that number of bits is zero, the assembler
should encode floating point/integer conversion instructions.
This patch adds the necessary instruction aliases to achieve this behaviour.
llvm-svn: 189009
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.
TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.
llvm-svn: 188995
According to the ARM specification, "mov" is a valid mnemonic for all Thumb2 MOV encodings.
To achieve this, the patch adds one instruction alias with a special range condition to avoid collision with the Thumb1 MOV.
llvm-svn: 188901
Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.
rdar://14779513
llvm-svn: 188819
Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.
llvm-svn: 188773
The Thumb2 add immediate is in fact defined for SP. The manual is misleading as it points to a different section for add immediate with SP, however the encoding is the same as for add immediate with register only with the SP operand hard coded. As such add immediate with SP and add immediate with register can safely be treated as the same instruction.
All the patch does is adjust a register constraint on an instruction alias.
llvm-svn: 188676
When patching inlineasm nodes to use GPRPair for 64-bit values, we
were dropping the information that two operands were tied, which
effectively broke the live-interval of vregs affected.
llvm-svn: 188643
Properly constrain the operand register class for instructions used
in [sz]ext expansion. Update more tests to use the verifier now that
we're getting the register classes correct.
rdar://12594152
llvm-svn: 188594
Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.
rdar://12594152
llvm-svn: 188593
Lots of machine verifier errors result from using a plain GPR regclass
for incoming argument copies. A more restrictive rGPR class is more
appropriate since it more accurately represents what's happening, plus
it lines up better with isel later on so the verifier is happier.
Reduces the number of ARM fast-isel tests not running with the verifier
enabled by over half.
rdar://12594152
llvm-svn: 188592
This unbreaks PIC with fast isel on ELF targets (PR16717). The output matches
what GCC and SDag do for PIC but may not cover all of the many flavors of PIC
that exist.
llvm-svn: 188551
Thumb2 literal loads use an offset encoding which allows for
negative zero. This fixes parsing and encoding so that #-0
is correctly processed. The parser represents #-0 as INT32_MIN.
llvm-svn: 188549
There are many Thumb instructions which take 12-bit immediates encoded in a special
8-byte value + 4-byte rotator form. Not all numbers are represented, and it's legal
to transform an assembly instruction to be able to encode the immediate.
For example: AND and BIC are complementary instructions; one can switch the AND
to a BIC as long as the immediate is complemented.
The intent is to switch one instruction into its complementary one when the immediate
cannot be encoded in the form requested in the original assembly and when the
complementary immediate is encodable.
The patch addresses two issues:
1. definition of t2SOImmNot immediate - it has to check that the orignal value is
not encoded naturally
2. t2AND and t2BIC instruction aliases which should use the Thumb2 SOImm operand
rather than the ARM one.
llvm-svn: 188548
Before this patch this flag is IOS specific, but is also
useful for bare project like bootloaders / kernels etc,
since movw / movt prevents simple relocation. Therefore
make this flag more commonly available.
note: this patch depends on a similiar rename in clang
Patch by Jeroen Hofstee.
llvm-svn: 188487
r9 is defined as a platform-specific register in the ARM EABI.
It can be reserved for a special purpose or be used as a general
purpose register. Add support for reserving r9 for all ARM, while
leaving the IOS usage unchanged.
Patch by Jeroen Hofstee.
llvm-svn: 188485
1. The offset range for Thumb1 PC relative loads is [0..1020] and not [-1024..1020]
2. Thumb2 PC relative loads may define the PC, so the restriction placed on target register is removed
3. Removes unneeded alias between "ldr.n" and t1LDRpci. ".n" is actually stripped by both tablegen
and the ASM parser, so this alias rule really does nothing
llvm-svn: 188466
When determining if two different loads are from the same base address,
this patch allows one load to use a t2LDRi8 address mode and another to
use a t2LDRi12 address mode. The current implementation is very
conservative and this allows the case of differing Thumb2 byte loads to
be considered. Allowing these differing modes instead of forcing the exact
same opcode is useful for situations where one opcodes loads from a base
address+1 and a second opcode loads for a base address-1.
Patch by Daniel Stewart.
llvm-svn: 188385
Use it to avoid repeating ourselves too often. Also store MVT::SimpleValueType
in the TTI tables so they can be statically initialized, MVT's constructors
create bloated initialization code otherwise.
llvm-svn: 188095
In Thumb1, only one variant is supported: CPS{effect} {flags}
Thumb2 supports three:
CPS{effect}.W {flags}
CPS{effect} {flags} {mode}
CPS {mode}
Canonically, .W should be used only when ambiguity is present between encodings of different width.
The wide suffix is still accepted for the latter two forms via aliases.
llvm-svn: 188071
The long encoding for Thumb2 unconditional branches is broken.
Additionally, there is no range checking for target operands; as such
for instructions originating in assembly code, only short Thumb encodings
are generated, regardless of the bitsize needed for the offset.
Adding range checking is non trivial due to the representation of Thumb
branch instructions. There is no true difference between conditional and
unconditional branches in terms of operands and syntax - even unconditional
branches have a predicate which is expected to match that of the IT block
they are in. Yet, the encodings and the permitted size of the offset differ.
Due to this, for any mnemonic there are really 4 encodings to choose for.
The problem cannot be handled in the parser alone or by manipulating td files.
Because the parser builds first a set of match candidates and then checks them
one by one, whatever tablegen-only solution might be found will ultimately be
dependent of the parser's evaluation order. What's worse is that due to the fact
that all branches have the same syntax and the same kinds of operands, that
order is governed by the lexicographical ordering of the names of operand
classes...
To circumvent all this, any necessary disambiguation is added to the instruction
validation pass.
llvm-svn: 188067
This is the only Thumb2 instruction defined with "t" prefix; all other Thumb2 instructions have "t2" prefix (e.g. "t2CDP2" which is defined immediately afterwards).
Patch by Artyom Skrobov.
llvm-svn: 187973
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.
llvm-svn: 187780
This patch fixes the multiple breakages on ARM test-suite after the SLP
vectorizer was introduced by default on O3. The problem was an illegal
vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple.
The guard protects this code from breaking (cause of the problems) but
doesn't fix the issue that is generating the odd vector in the first
place, which also needs to be investigated.
llvm-svn: 187658
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.
llvm-svn: 187618
While the .td entry is nice and all, it takes a pretty gross hack in
ARMAsmParser::ParseInstruction() because of handling of other "subs"
instructions to get it to match. Ran it by Jim Grosbach and he said it was
about what he expected to make this work given the existing code.
rdar://14214063
llvm-svn: 187530
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.
Split the check of the value of the second operands to the ands. Move the cast
and variable declaration slightly higher to make it slightly easier to follow.
Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
do in the SDag when lowering references to the GOT: use
ARMConstantPoolSymbol rather than creating a dummy global variable. The
computation of the alignment still feels weird (it uses IR types and
datalayout) but it preserves the exact previous behavior. This change
fixes the memory leak of the global variable detected on the valgrind
leak checking bot.
Thanks to Benjamin Kramer for pointing me at ARMConstantPoolSymbol to
handle this use case.
llvm-svn: 187303
me) should start watching this bot more as its catching lots of bugs.
The fix here is to not construct the global if we aren't going to need
it. That's cheaper anyways, and globals have highly predictable types in
practice. I've added an assert to catch skew between our manual testing
of the type and the actual type just for paranoia's sake.
Note that this pattern is actually fine in most globals because when you
build a global with a module it automatically is moved to be owned by
that module. But here, we're in isel and don't really want to do that.
The solution of not creating a global is simpler anyways.
llvm-svn: 187302
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
llvm-svn: 186999
instructions. With this patch:
1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset
llvm-svn: 186831
After Ulrich's r180677 (thanks!) TableGen is intelligent enough to
handle tied constraints involving complex operands properly, so
virtually all of the ARM custom converters are now unnecessary.
llvm-svn: 186810
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.
This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.
For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll).
<rdar://problem/14464830>
llvm-svn: 186735
My patch 'r183551 - ARM FastISel integer sext/zext improvements' was incorrect when emitting ARM register-immediate ASR, LSL, LSR instructions: they are pseudo-instructions in ARMInstrInfo.td and I should have used MOVsi instead.
This is not an issue when code is generated through a .s file, but is an issue when generated straight to a .o (-filetype=obj).
llvm-svn: 186489
block. Blocks that have an indirect branch terminator, even if it's not the
last terminator, should still be treated as unanalyzable.
<rdar://problem/14437274>
Reducing a useful regression test case is proving difficult - I hope to have
one soon.
llvm-svn: 186461
This adds an instruction alias to make the assembler recognize the alternate literal form: pli [PC, #+/-<imm>]
See A8.8.129 in the ARM ARM (DDI 0406C.b).
Fixes <rdar://problem/14403733>.
llvm-svn: 186459