1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed.
2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012).
3) I added gather/scatter prefetch intrinsics.
4) I fixed MRMm encoding for masked instructions.
llvm-svn: 208522
Support for the intrinsics that read from and write to global named registers
is added for r1, r2 and r13 (depending on the subtarget).
llvm-svn: 208509
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
llvm-svn: 208508
The counter-loops formation pass needs to know what operations might be
function calls (because they can't appear in counter-based loops). On PPC32,
128-bit shifts might be runtime calls (even though you can't use __int128 on
PPC32, it seems that SROA might form them).
Fixes PR19709.
llvm-svn: 208501
When lowering build_vector to an insertps, we would still lower it, even
if the source vectors weren't v4x32. This would break on avx if the source
was a v8x32. We now check the type of the source vectors.
llvm-svn: 208487
We were swapping the true & false results while testing for FMAX/FMIN,
but not putting them back to the original state if the later checks
failed.
Should fix PR19700.
llvm-svn: 208469
This reverts commit r200561.
This calling convention was an attempt to match the MSVC C++ ABI for
methods that return structures by value. This solution didn't scale,
because it would have required splitting every CC available on Windows
into two: one for methods and one for free functions.
Now that we can put sret on the second arg (r208453), and Clang does
that (r208458), revert this hack.
llvm-svn: 208459
MSVC always places the implicit sret parameter after the implicit this
parameter of instance methods. We used to handle this for
x86_thiscallcc by allocating the sret parameter on the stack and leaving
the this pointer in ecx, but that doesn't handle alternative calling
conventions like cdecl, stdcall, fastcall, or the win64 convention.
Instead, change the verifier to allow sret on the second parameter.
This also requires changing the Mips and X86 backends to return the
argument with the sret parameter, instead of assuming that the sret
parameter comes first.
The Sparc backend also returns sret parameters in a register, but I
wasn't able to update it to handle secondary sret parameters. It
currently calls report_fatal_error if you feed it an sret in the second
parameter.
Reviewers: rafael.espindola, majnemer
Differential Revision: http://reviews.llvm.org/D3617
llvm-svn: 208453
This patch adds support to ARM for custom lowering of the
llvm.{u|s}add.with.overflow.i32 intrinsics for i32/i64. This is particularly useful
for handling idiomatic saturating math functions as generated by
InstCombineCompare.
Test cases included.
rdar://14853450
llvm-svn: 208435
Summary:
This required a new instruction group representing the 32-bit subset of
MIPS-IV that was available in MIPS32
A small number of instructions are correctly rejected but with the wrong error
message. These have been placed in a separate test for now.
Depends on D3676
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3677
llvm-svn: 208414
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
llvm-svn: 208413
Summary:
This required a new instruction group representing the 32-bit subset of
MIPS-III that was available in MIPS32
A small number of instructions are correctly rejected but with the wrong error
message. These have been placed in a separate test for now.
There's some obvious InstAlias's that ought to be marked MIPS-III but arent.
This is because they are not currently tested. I intend to catch these with
a final pass through the tablegen records to find tablegen records without
ISA annotations.
Depends on D3674
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3675
llvm-svn: 208408
Summary:
Adds MIPS32r6/MIPS64r6 and checks the compatibility requirements for these
processors.
I've also included comments to describe removed and re-encoded instructions,
along with placeholder def's for the new instructions but there are no
functional changes to codegen at this point.
Reviewers: jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3622
llvm-svn: 208399
Handle lowering of global addresses for PIC mode compilation on Windows. Always
use the movw/movt load to load the address as Windows on ARM requires ARMv7+ and
is a pure Thumb environment.
llvm-svn: 208385
Summary:
Also ran clang-format on the function. The code added is the last else
if block.
Reviewers: nadav, craig.topper, delena
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3518
llvm-svn: 208372
This patch teaches the backend how to combine packed SSE2/AVX2 arithmetic shift
intrinsics.
The rules are:
- Always fold a packed arithmetic shift by zero to its first operand;
- Convert a packed arithmetic shift intrinsic dag node into a ISD::SRA only if
the shift count is known to be smaller than the vector element size.
This patch also teaches to function 'getTargetVShiftByConstNode' how fold
target specific vector shifts by zero.
Added two new tests to verify that the DAGCombiner is able to fold
sequences of SSE2/AVX2 packed arithmetic shift calls.
llvm-svn: 208342
The parsing of ADD/SUB shifted immediates needs to be done explicitly so
that better diagnostics can be emitted, as a side effect this also
removes some of the hacks in the current method of handling this operand
type.
Additionally remove manual CMP aliasing to ADD/SUB and use InstAlias
instead.
llvm-svn: 208329
Summary:
These instructions were added in MIPS-I, and MIPS-II but were removed in
MIPS-III. Interestingly, GAS continues to accept them when assembling for
MIPS-III.
For the moment, these instructions will follow GAS and accept them for
MIPS-III and newer but this will be tightened up when the invalid-*.s
tests are added.
Depends on D3647
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3648
llvm-svn: 208311
Summary:
A small number of instructions are rejected with the wrong error message.
These have been placed in a separate test for now. There seems to be some
parsing quirk that triggers when these instructions are disabled.
Depends on D3571
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3647
llvm-svn: 208305
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
llvm-svn: 208289
This adds FK_SecRel_2 relocation support to ARM. This enables the building of
object files for armv7-windows-msvc which enables CodeView line tables for
debugging as opposed to armv7-windows-itanium which currently uses DWARF.
llvm-svn: 208273
Summary:
Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Also optimize when we move one element in a vector to a different place
in that vector while zeroing out some of the other elements.
Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.
Added some tests for different cases where this optimization triggers.
Reviewers: nadav, delena, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3521
llvm-svn: 208271
The loop stream detector (LSD) on modern Intel cores, which optimizes the
execution of small loops, has limits on the number of taken branches in
addition to uop-count limits (modern AMD cores have similar limits).
Unfortunately, at the IR level, estimating the number of branches that will be
taken is difficult. For one thing, it strongly depends on later passes (block
placement, etc.). The original implementation took a conservative approach and
limited the maximal BB DFS depth of the loop. However, fairly-extensive
benchmarking by several of us has revealed that this is the wrong approach. In
fact, there are zero known cases where the branch limit prevents a detrimental
unrolling (but plenty of cases where it does prevent beneficial unrolling).
While we could improve the current branch counting logic by incorporating
branch probabilities, this further complication seems unjustified without a
motivating regression. Instead, unless and until a regression appears, the
branch counting will be removed.
llvm-svn: 208255
Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or
memory) are commutable.
E.g., for the 213 family (with the syntax src1, src2, src3):
fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3
Now consider the 231 family:
fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B
But
fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B
Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231.
Working on a reduced test case!
<rdar://problem/16800495>
llvm-svn: 208252
default architecture for reasonable modern x86 processors, actually be
modern. This processor model should essentially be "tuned" for modern
x86 chips as much as possible without undue penalties on any specific
architecture. Previously we weren't even using the nice scheduling
models. There are a few other tweaks needed here, but this change at
least I have benchmarked across a decent swatch of chips (intel's
clovertown, westmere, and sandybridge; amd's istanbul) and seen no
significant regressions.
If anyone has suggested ways to test this, just let me know. Somewhat
alarmingly, no existing tests failed.
llvm-svn: 208230
this patch disables the dead register elimination pass and the load/store pair
optimization pass at -O0. The ILP optimizations don't require the optimization
level to be checked because the call to addILPOpts is predicated with the
necessary check. The AdvSIMDScalar pass is disabled by default at all
optimization levels. This patch leaves that pass disabled by default.
Also, move command-line options into ARM64TargetMachine.cpp and add a few
additional flags to aid in debugging. This fixes an issue with the
-debug-pass=Structure flag where passes were printed, but not actually run
(i.e., AdvSIMDScalar pass).
llvm-svn: 208223
Summary:
These processors will only be available for the integrated assembler at
first (CodeGen will emit a fatal error saying they are not implemented).
The intention is to work through the existing instructions and correctly
annotate the ISA they were added in so that we have a sufficiently good
base to start MIPS64r6 development. MIPS64r6 removes/re-encodes certain
instructions and I believe it is best to define ISA's using set-union's
as far as possible rather than using set-subtraction.
Reviewers: vmedic
Subscribers: emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D3569
llvm-svn: 208221
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.
llvm-svn: 208210
Summary:
One small functional change. The recently added PAUSE instruction now has
the HasStdEnc predicate which was accidentally removed by a Requires<>.
Depends on D3640
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3641
llvm-svn: 208209
Summary:
Move IsGP64bit into GPRPredicates, and IsFP64bit/NotFP64bit into FGRPredicates
No functional change (confirmed by diffing tablegen-erated files).
Depends on D3639
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3640
llvm-svn: 208201
The AAPCS states that values passed in registers must have a value as though
they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is,
loading scalars to vector registers and loading 1-element vectors is equivalent.
The logic implemented here is to ensure that at all call boundaries and during
formal argument lowering all vectors are treated as their bitwidth-based floating
point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64,
v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be
generated during code generation.
llvm-svn: 208198
Summary:
This makes it easier to prove a more complicated change in the next commit
is non-functional.
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3639
llvm-svn: 208197
Because we've canonicalised on using LD1/ST1, every time we do a bitcast
between vector types we must do an equivalent lane reversal.
Consider a simple memory load followed by a bitconvert then a store.
v0 = load v2i32
v1 = BITCAST v2i32 v0 to v4i16
store v4i16 v2
In big endian mode every memory access has an implicit byte swap. LDR and
STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that
is, they treat the vector as a sequence of elements to be byte-swapped.
The two pairs of instructions are fundamentally incompatible. We've decided
to use LD1/ST1 only to simplify compiler implementation.
LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes
the original code sequence: v0 = load v2i32
v1 = REV v2i32 (implicit)
v2 = BITCAST v2i32 v1 to v4i16
v3 = REV v4i16 v2 (implicit)
store v4i16 v3
But this is now broken - the value stored is different to the value loaded
due to lane reordering. To fix this, on every BITCAST we must perform two
other REVs:
v0 = load v2i32
v1 = REV v2i32 (implicit)
v2 = REV v2i32
v3 = BITCAST v2i32 v2 to v4i16
v4 = REV v4i16
v5 = REV v4i16 v4 (implicit)
store v4i16 v5
This means an extra two instructions, but actually in most cases the two REV
instructions can be combined into one. For example:
(REV64_2s (REV64_4h X)) === (REV32_4h X)
There is also no 128-bit REV instruction. This must be synthesized with an
EXT instruction.
Most bitconverts require some sort of conversion. The only exceptions are:
a) Identity conversions - vNfX <-> vNiX
b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX
Even though there are hundreds of changed lines, I have a fairly high confidence
that they are somewhat correct. The changes to add two REV instructions per
bitcast were pretty mechanical, and once I'd done that I threw the resulting
.td at a script I wrote which combined the two REVs together (and added
an EXT instruction, for f128) based on an instruction description I gave it.
This was much less prone to error than doing it all manually, plus my brain
would not just have melted but would have vapourised.
llvm-svn: 208194
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.
llvm-svn: 208192
Summary:
The overall idea is to chop the Predicates list into subsets that are
usually overridden independently. This allows subclasses to partially
override the predicates of their superclasses without having to re-add all
the existing predicates.
This patch starts the process by moving HasStdEnc into a new
EncodingPredicates list and almost everything else into
AdditionalPredicates.
It has revealed a couple likely bugs where 'let Predicates' has removed
the HasStdEnc predicate.
No functional change (confirmed by diffing tablegen-erated files).
Depends on D3549, D3506
Reviewers: vmedic
Differential Revision: http://reviews.llvm.org/D3550
llvm-svn: 208184
Summary:
This will make it easier to prove that a more complicated change in the
following commit is non-functional.
No functional change.
Depends on D3506
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3549
llvm-svn: 208179
Mark up additional instructions which are part of the function prologue as
MachineFrameSetup. These instructions are part of the function prologue,
emitted by the PEI pass to setup the stack for use in the activating frame.
llvm-svn: 208153
The ARM::BLX instruction is an ARM mode instruction. The Windows on ARM target
is limited to Thumb instructions. Correctly use the thumb mode tBLXr
instruction. This would manifest as an errant write into the object file as the
instruction is 4-bytes in length rather than 2. The result would be a corrupted
object file that would eventually result in an executable that would crash at
runtime.
llvm-svn: 208152
remove it from the list of unspilled registers. Otherwise the following
attempt to keep the stack aligned by picking an extra GPR register to
spill will not work as it picks up r11.
llvm-svn: 208129
Before this patch, the backend always emitted a store+load sequence to
bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that
performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting
i64 node was then used to build a v2i32 vector.
With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from
MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free"
bitcast to type MVT::v4i32. The elements of the resulting
v4i32 are then extracted to build a v2i32 vector (which is illegal and
therefore promoted to MVT::v2i64).
This is in general cheaper than emitting a stack store+load sequence
to bitconvert the operand from type f64 to type i64.
llvm-svn: 208107
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
The Win64 docs are very clear that anything larger than 8 bytes is
passed by reference, and GCC MinGW64 honors that for __modti3 and
friends.
Patch by Jameson Nash!
llvm-svn: 208029
Summary:
Also ran clang-format on the function. The code added is the last else
if block.
Reviewers: nadav, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3518
llvm-svn: 207992
Windows on ARM does not conform to AEABI. However, memset would be emitted
using the AEABI signature, resulting in inverted parameters. Handle this
special case appropriately.
llvm-svn: 207943
Add handling for FK_SecRel_4 (4-byte section relative relocations). These are
used by the generation of DWARF debug information (the abbrevations use section
relative relocations). This will also be used in generation of CodeView line
tables.
llvm-svn: 207941
Both MinGW and cygwin (i686) construct export directives without the global
leader prefix. This is mostly due to the fact that they use GNU ld which does
not correctly handle the export directive. This apparently has been been broken
for a while. However, this was recently reported as being broken by
mingwandroid and diorcety of the msys2 project.
Remove the global leader prefix if targeting MinGW or cygwin, otherwise, retain
the global leader prefix. Add an explicit test for cygwin's behaviour of export
directives.
llvm-svn: 207926
Create a helper function to generate the export directive. This was previously
duplicated inline to handle export directives for variables and functions. This
also enables the use of range-based iterators for the generation of the
directive rather than the traditional loops. NFC.
llvm-svn: 207925
The fix itself is fairly simple: move getAccessVariant to MCValue so that we
replace the old weak expression evaluation with the far more general
EvaluateAsRelocatable.
This then requires that EvaluateAsRelocatable stop when it finds a non
trivial reference kind. And that in turn requires the ELF writer to look
harder for weak references.
Last but not least, this found a case where we were being bug by bug
compatible with gas and accepting an invalid input. I reported pr19647
to track it.
llvm-svn: 207920
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
llvm-svn: 207846
The register spiller assumes that only one new instruction is created
when spilling and restoring registers, so we need to emit pseudo
instructions for vector register spills and lower them after
register allocation.
v2:
- Fix calculation of lane index
- Extend VGPR liveness to end of program.
v3:
- Use SIMM16 field of S_NOP to specify multiple NOPs.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
llvm-svn: 207843
Previously, LLVM had no knowledge that these instructions actually
modified their address register: fine if they never end up in CodeGen,
but when I'd rather like to write some patterns for them it becomes a
disaster.
The change is mostly straightforward, I think the most significant
design decision was to *always* put the address write-back first. This
allows loads and stores to be accessed more uniformly, for example
permitting the continued sharing of the InstAlias definitions.
I also discovered that the custom Decode logic is no longer needed, so
I removed it.
No tests, because there should be no functionality change.
llvm-svn: 207839
While post-indexed LD1/ST1 instructions do exist for vector loads,
this patch makes use of the more flexible addressing-modes in LDR/STR
instructions.
llvm-svn: 207838
This creates a lot of core infrastructure in which to add, with little
effort, quite a bit more to mips fast-isel
Test Plan: simplestore.ll
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3527
llvm-svn: 207790
This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.
The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.
Review: http://reviews.llvm.org/D3462
Patch by Jingyue Wu.
llvm-svn: 207783
We currently force symbols to be globals in .thumb_set. The intent
seems to be that given
.thumb_set foo, bar
we emit an undefined symbol to bar if it is never defined. The side
effect is that we mark bar as global, even if it is defined, which gas
does not.
Producing an undefined reference to bar is a general difference from MC and gas.
For example, given
a = b
gas will produce an undefined reference to b, MC will not. I would be surprised
if any code depends on this, but it it does, we should fix the general
difference, not special case .thumb_set.
llvm-svn: 207757
The canonical form of the BFM instruction is always one of the more explicit
extract or insert operations, which makes reading output much easier.
llvm-svn: 207752
Summary:
There are two functional changes:
1) The directive is not expanded for the ASM->ASM code path.
2) If PIC is not set, there's no expansion for the ASM->OBJ code path (same behaviour as GAS).
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3482
llvm-svn: 207741
This fixes the memory leak introduced with the initial addition of support for
WoA stack probing. Now that the pseudo-instruction expansion can handle an
external symbol, use that to generate the load which simplifies the logic as
well as avoids the memory leak.
llvm-svn: 207737
This enhances the expansion of the mov32imm pseudo-instruction to support an
external symbol reference. This is motivated by a simplification of the stack
probe emission for Windows on ARM (and fixing a leak).
llvm-svn: 207736
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
%shr = lshr i64 %x, 4
%and = and i64 %shr, 15
%arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
%0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
lsr x8, x0, #1
and x8, x8, #0x78
add x8, x9, x8
If using ubfx, it only needs 2 instrs:
ubfx x8, x0, #4, #4
add x8, x9, x8, lsl #3
This fixes bug 19589
llvm-svn: 207702
There is no need to check if we want to hoist the immediate value of an
shift instruction. Simply return TCC_Free right away.
This change is like r206101, but for X86.
rdar://problem/16190769
llvm-svn: 207692
Summary: negu $reg is equivalent to negu $reg, $reg.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3510
llvm-svn: 207673
Summary:
The pattern sltu $r1, $r2, $imm is found in handwritten assembly which
is just a shorthand version of sltui $r1, $r2, $imm.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3508
llvm-svn: 207671
It's been decided that in the future, the floating-point immediate in
instructions like "fcmeq v0.2s, v1.2s, #0.0" will be canonically "0.0", which
has been implemented on AArch64 already but not ARM64.
This fixes that issue.
llvm-svn: 207666
Summary:
The pattern dsll/dsrl $rd, $rt, $rs is found in handwritten assembly which
is just a shorthand version of dsllv/dsrlv $rd, $rt, $rs.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3486
llvm-svn: 207664
Summary:
The pattern sll/srl $rd, $rt, $rs is found in handwritten assembly which
is just a shorthand version of sllv/srlv $rd, $rt, $rs.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3483
llvm-svn: 207657
target cannot be determined accurately. This is the case for NaCl where the
sandboxing instructions are added in MC layer, after the MipsLongBranch pass.
It is also the case when the code has inline assembly. Instead of calculating
offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2)
expressions that are resolved during the fixup.
This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll
and implements microMIPS CHECKs in a much simpler way in a file
test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64.
llvm-svn: 207656
Only emit calls to compiler-rt asm routines on platforms where they are
present (currently limited to linux i386/x86_64).
Patch by Yuri Gorshenin.
llvm-svn: 207651
The canonical syntax for shifts by a variable amount does not end with 'v', but
that syntax should be supported as an alias (presumably for legacy reasons).
llvm-svn: 207649
AArch64 does not have a CPSR register in the same way that AArch32 does. Most
of its compiler-relevant roles have been taken over by the more specific NZCV
register (representing just the flags set by normal instructions).
Its system control functions still remain, but are now under the
pseudo-register referred to as "PSTATE". They're accessed via various MRS & MSR
instructions described in the reference manual.
llvm-svn: 207645
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.
llvm-svn: 207644
Summary:
This isn't supported directly so we rotate the vector by the desired number of
elements, insert to element zero, then rotate back.
The i64 case generates rather poor code on MIPS32. There is an obvious
optimisation to be made in future (do both insert.w's inside a shared
rotate/unrotate sequence) but for now it's sufficient to select valid code
instead of aborting.
Depends on D3536
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3537
llvm-svn: 207640
Summary:
This directive is used for setting up $gp in the beginning of a function.
It expands to three instructions if PIC is enabled:
lui $gp, %hi(_gp_disp)
addui $gp, $gp, %lo(_gp_disp)
addu $gp, $gp, $reg
_gp_disp is a special symbol that the linker sets to the distance between
the lui instruction and the context pointer (_gp).
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3480
llvm-svn: 207637
Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to
piece together an immediate in 16-bit chunks, hex is probably the most
appropriate format.
llvm-svn: 207635
This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they
accept weird shifts which are more naturally understandable in hex notation).
Also changes BRK/HINT etc, which is probably a neutral change, but easier than
the alternative.
llvm-svn: 207634
Since these instructions only accept a 12-bit immediate, possibly shifted left
by 12, the canonical syntax used by the architecture reference manual is "#N {,
lsl #12 }". We should accept an immediate that has already been shifted, (e.g.
Also, print a comment giving the full addend since it can be helpful.
llvm-svn: 207633
This introduces the stack lowering emission of the stack probe function for
Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where
any page allocation which crosses a page boundary of the following guard page
will cause a page fault. This page fault must be handled by the kernel to
ensure that the page is faulted in. If this does not occur and a write access
any memory beyond that, the page fault will go unserviced, resulting in an
abnormal program termination.
The watermark for the stack probe appears to be at 4080 bytes (for
accommodating the stack guard canaries and stack alignment) when SSP is
enabled. Otherwise, the stack probe is emitted on the page size boundary of
4096 bytes.
llvm-svn: 207615
Emit the COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.
llvm-svn: 207613
When building with -Werror=covered-switch-default (as on the buildbots), the
build would fail since all cases are covered by the switch. Move the
llvm_unreachable to the end of the function as an annotation.
llvm-svn: 207609
IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise
relocation is not split up and reordered. When expanding the mov32imm
pseudo-instruction, create a bundle if the machine operand is referencing an
address. This helps ensure that the relocatable address load is not reordered
by subsequent passes.
Unfortunately, this only partially handles the case as the Constant Island Pass
occurs after the instructions are unbundled and does not properly handle
bundles. That is a more fundamental issue with the pass itself and beyond the
scope of this change.
llvm-svn: 207608
Currently, musttail codegen is relying on sibcall optimization, and
reporting a fatal error if fails. Sibcall optimization fails when stack
arguments need to be modified, which is insufficient for musttail.
The logic for moving arguments in memory safely is already implemented
for GuaranteedTailCallOpt. This change merely arranges for musttail
calls to use it.
No functional change for GuaranteedTailCallOpt.
Reviewers: espindola
Differential Revision: http://reviews.llvm.org/D3493
llvm-svn: 207598
It's already set in AMDGPUISelLowering for all GPUs
Patch By: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207592
SI_IF and SI_ELSE are terminators which also produce a value. For
these instructions ISel always inserts a COPY to move their value
to another basic block. This COPY ends up between SI_(IF|ELSE)
and the S_BRANCH* instruction at the end of the block.
This breaks MachineBasicBlock::getFirstTerminator() and also the
machine verifier which assumes that terminators are grouped together at
the end of blocks.
To solve this we coalesce the copy away right after ISel to make sure
there are no instructions in between terminators at the end of blocks.
llvm-svn: 207591
SALU instructions ignore control flow, so it is not always safe to use
them within branches. This is a partial solution to this problem
until we can come up with something better.
llvm-svn: 207590
This is a squash of several optimization commits:
- calculate DIV_Lo and DIV_Hi separately
- use BFE_U32 if we are operating on 32bit values
- use precomputed constants instead of shifting in UDVIREM
- skip the first 32 iterations of udivrem
v2: Check whether BFE is supported before using it
Patch by: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207589
Initial implementation, rather slow
Patch by: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207588
When legalizing ops, with UDIV/UREM set to expand, they automatically
expand to UDIVREM (if legal or custom).
We need to do this manually for legalize types.
v2:
SI should be set to Expand because the type is legal, and it is
automatically lowered to UDIVREM if UDIVREM is Legal/Custom
R600 should set to UDIV/UREM to Custom because it needs to lower them
during type legalization
Patch by: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207587
Summary:
The InstSE class already initializes Predicates to [HasStdEnc].
No functional change (confirmed by diffing tablegen-erated files before and
after)
Differential Revision: http://reviews.llvm.org/D3548
llvm-svn: 207558
Summary:
The InstSE class already initializes Predicates to [HasStdEnc].
No functional change (confirmed by diffing tablegen-erated files before and
after)
Differential Revision: http://reviews.llvm.org/D3547
llvm-svn: 207551
Summary:
The MipsPat class already initializes Predicates to [HasStdEnc].
No functional change (confirmed by diffing tablegen-erated files before and
after)
Differential Revision: http://reviews.llvm.org/D3546
llvm-svn: 207548
Summary:
This isn't supported directly so we splat the vector element and extract
the most convenient copy.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3530
llvm-svn: 207524
This patch centralizes the handling of the thumb bit around
MCStreamer::isThumbFunc and makes isThumbFunc handle aliases.
This fixes a corner case, but the main advantage is having just one
way to check if a MCSymbol is thumb or not. This should still be
refactored to be ARM only, but at least now it is just one predicate
that has to be refactored instead of 3 (isThumbFunc,
ELF_Other_ThumbFunc, and SF_ThumbFunc).
llvm-svn: 207522
It's bad enough that I have to look up 5 different levels of TableGen class
definitions to work out what bits go where in a simple NEON instruction anyway,
without having to keep track of umpteen unused parameters.
llvm-svn: 207420
X86_MAX_OPERANDS is changed to unsigned.
Also, add range-based for loops for affected loops. This in turn
needed an ArrayRef instead of a pointer-to-array in
InternalInstruction.
llvm-svn: 207413
Someone couldn't bear to have a completely orthogonal set of floating-point
registers, so we've got some instructions that only accept v0-v15 (coming in
ARMv9, V128_prime: you're allowed v2, v3, v5, v7, ...).
Anyway, we were permitting even the out of range registers during assembly
(CodeGen handled it correctly). This adds a diagnostic.
llvm-svn: 207412
Only the object streamers need to track if a symbol should be marked thumb or
not. This ports the ELF case. The COFF case is not ported since it is currently
not working for some other reason (I will report a bug).
llvm-svn: 207366
Introduce support for WoA PE/COFF object file emission from LLVM. Add the new
target specific PE/COFF Streamer (ARMWinCOFFStreamer) that handles the ARM
specific behaviour of PE/COFF object emission. ARM exception information is not
yet emitted and is a TODO item.
The ARM specific object writer (ARMWinCOFFObjectWriter) handles the ARM specific
relocation handling in conjunction with the WinCOFFObjectWriter in the MC layer.
The MC layer needs to be updated to deal with the relocation adjustments.
Branch relocations are adjusted by 4 bytes (unlikely their ELF counterparts).
Minor tweaks to switch multiple conditional checks into equivalent switch
statements. The ObjectFileInfo is updated to relax the object file setup for
Windows COFF. Move the architecture checks into an assertion. Windows COFF is
currently only supported on x86, x86_64, and ARM (thumb). Rather than
defaulting to ELF, we will refuse to generate an object file. This is better
though as you do not get an (arbitrary) object file which is different from the
request.
llvm-svn: 207345
This introduces a target specific streamer, X86WinCOFFStreamer, which handles
the target specific behaviour (e.g. WinEH). This is mostly to ensure that
differences between ARM and X86 remain disjoint and do not accidentally cross
boundaries. This is the final staging change for enabling object emission for
Windows on ARM.
llvm-svn: 207344
Currently, the integrated assembler is the only choice for assembling Windows on
ARM binaries. IAS supports the .file <filename> directive which emits the file
symbol into the resulting object binary. Mark the GNU COFF information to
indicate support for this feature.
llvm-svn: 207341
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
llvm-svn: 207315
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.
<rdar://problem/16730541>
llvm-svn: 207301
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.
Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.
Reviewers: nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3475
llvm-svn: 207291
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit. First, update the users.
<rdar://problem/14292693>
llvm-svn: 207252
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions. This
functionality can now be represented as @llvm.arm.hint(i32 5).
llvm-svn: 207246
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.
llvm-svn: 207242
There's no need for local symbols to go through the GOT, in fact it seems GNU ld is not even emitting GOT entries for local symbols and will error out when trying to resolve a GOT relocation for a local symbol.
This bug triggers when bootstrapping clang on AArch64 Linux with -fPIC and the ARM64 backend. The AArch64 backend is not affected.
With this commit it's now possible to bootstrap clang on AArch64 Linux with the ARM64 backend (-fPIC, -O3).
llvm-svn: 207226
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.
llvm-svn: 207215
Change the object streamer selection to a switch from a series of if conditions.
Rather than defaulting to ELF, require that an ELF format is requested. The
Windows/!ELF is maintained as MachO would have been selected first and will
still provide a MachO format. Add an assertion that if COFF is requested that
the target platform is Windows as only WinCOFF object emission is currently
supported.
llvm-svn: 207200
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
llvm-svn: 207143
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
correctly verifies that both READCYCLECOUNTER and the two new intrinsics
work fine for both 64bit and 32bit Subtargets.
llvm-svn: 207127
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.
llvm-svn: 207124
This matches ARM64 behaviour, which I think is clearer. It also puts all the
churn from that difference into one easily ignored commit.
llvm-svn: 207116
These can have different relocations in ELF. In particular both:
b.eq global
ldr x0, global
are valid, giving different relocations. The only possible way to distinguish
them is via a different fixup, so the operands had to be separated throughout
the backend.
llvm-svn: 207105
ARM64 was not producing pure BFI instructions for bitfield insertion
operations, unlike AArch64. The approach had to be a little different (in
ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but
hopefully this gives it similar power.
This should address PR19424.
llvm-svn: 207102
This allows us to compile
return (mask & 0x8 ? a : b);
into
testb $8, %dil
cmovnel %edx, %esi
instead of
andl $8, %edi
shrl $3, %edi
cmovnel %edx, %esi
which we formed previously because dag combiner canonicalizes setcc of and into shift.
llvm-svn: 207088
Added support for bytes replication feature, so it could be GAS compatible.
E.g. instructions below:
"vmov.i32 d0, 0xffffffff"
"vmvn.i32 d0, 0xabababab"
"vmov.i32 d0, 0xabababab"
"vmov.i16 d0, 0xabab"
are incorrect, but we could deal with such cases.
For first one we should emit:
"vmov.i8 d0, 0xff"
For second one ("vmvn"):
"vmov.i8 d0, 0x54"
For last two instructions it should emit:
"vmov.i8 d0, 0xab"
P.S.: In ARMAsmParser.cpp I have also fixed few nearby style issues in old code.
Just for keeping method bodies in harmony with themselves.
llvm-svn: 207080
ANDS does not use the same encoding scheme as other xxxS instructions (e.g.,
ADDS). Take that into account to avoid wrong peephole optimization.
<rdar://problem/16693089>
llvm-svn: 207020
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
llvm-svn: 206971
This model is not final and work is still in progress.
However there are substantial improvements on integer tests mainly because of better RAL with new scheduler.
Differential Revision: http://reviews.llvm.org/D3451
llvm-svn: 206957
AArch64 has feature predicates for NEON, FP and CRYPTO instructions.
This allows the compiler to generate code without using FP, NEON
or CRYPTO instructions.
llvm-svn: 206949
diagnostic that includes location information.
Currently if one has this assembly:
.quad (0x1234 + (4 * SOME_VALUE))
where SOME_VALUE is undefined ones gets the less than
useful error message with no location information:
% clang -c x.s
clang -cc1as: fatal error: error in backend: expected relocatable expression
With this fix one now gets a more useful error message
with location information:
% clang -c x.s
x.s:5:8: error: expected relocatable expression
.quad (0x1234 + (4 * SOME_VALUE))
^
To do this I plumbed the SMLoc through the MCObjectStreamer
EmitValue() and EmitValueImpl() interfaces so it could be used
when creating the MCFixup.
rdar://12391022
llvm-svn: 206906
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.
rdar://problem/13766161
llvm-svn: 206874
of a '.inc' file before including actual headers. In this case we had
both duplicated a header's include and were including a standard header.
llvm-svn: 206840
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...
llvm-svn: 206838
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
can do so by defining the macro before their inline code and undef-ing
it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
different DEBUG_TYPE definitions. This may be mostly an academic
violation today, but with modules these types of violations are easy
to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
llvm-svn: 206822
The comment claimed that the register class information wasn't available
in the assembly parser, but that's not really true. It's just annoying to
get to. Replace the helper functions with references to the auto-generated
information.
llvm-svn: 206802
The canonical form for the extended addressing mode (e.g.,
"[x1, w2, uxtw #3]" is for the MCInst to have the second register be the
full 64-bit GPR64 register class. The instruction printer cleans up
the output for display to show the 32-bit register instead, per the
specification.
This simplifies 205893 now that the aliasing is handled in the printer
in 206495 so that the codegen path and the disassembler path give the
same MCInst form.
llvm-svn: 206797
With this MC is able to handle _GLOBAL_OFFSET_TABLE_ in 64 bit mode, which is
needed for medium and large code models.
This fixes pr19470.
llvm-svn: 206793
Summary:
The INSERTPS pattern fragment was called insrtps (mising 'e'), which
would make it harder to grep for the patterns related to this instruction.
Renaming it to use the proper instruction name.
Reviewers: nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3443
llvm-svn: 206779
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.
<rdar://problem/15480077>
llvm-svn: 206738
reason to expose a global symbol 'decodeInstruction' nor to pollute the global
scope with a bunch of external linkage entities (some of which conflict with
others elsewhere in LLVM).
This is just the initial transition to C++; more cleanups to follow.
llvm-svn: 206717
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
llvm-svn: 206684
expressions for mov instructions instead of silently truncating by default.
For the ARM assembler, we want to avoid misleadingly allowing something
like "mov r0, <symbol>" especially when we turn it into a movw and the
expression <symbol> does not have a :lower16: or :upper16" as part of the
expression. We don't want the behavior of silently truncating, which can be
unexpected and lead to bugs that are difficult to find since this is an easy
mistake to make.
This does change the previous behavior of llvm but actually matches an
older gnu assembler that would not allow this but print less useful errors
of like “invalid constant (0x927c0) after fixup” and “unsupported relocation on
symbol foo”. The error for llvm is "immediate expression for mov requires
:lower16: or :upper16" with correct location information on the operand
as shown in the added test cases.
rdar://12342160
llvm-svn: 206669
Summary:
This port includes the rudimentary latencies that were provided for
the Cortex-A53 Machine Model in the AArch64 backend. It also changes
the SchedAlias for COPY in the Cyclone model to an explicit
WriteRes mapping to avoid conflicts in other subtargets.
Differential Revision: http://reviews.llvm.org/D3427
Patch by Dave Estes <cestes@codeaurora.org>!
llvm-svn: 206652
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps. Consider:
(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
(extract_vector_elt %vreg0, Constant<2>),
(extract_vector_elt %vreg0, Constant<3>),
(extract_vector_elt %vreg0, Constant<4>),
(extract_vector_elt %vreg0, Constant<5>),
(extract_vector_elt %vreg0, Constant<6>),
(extract_vector_elt %vreg0, Constant<7>),
%vreg1))
a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.
b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.
c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.
Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:
(v4f32 (BUILD_VECTOR (extract_vector_elt
(vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
(extract_vector_elt
(vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
(extract_vector_elt
(vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
%vreg1))
In order to figure out the underlying vector and their identity we need to see
through the shuffles.
[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.
Fixes <rdar://problem/16296956>
llvm-svn: 206634
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.
This also enables the AArch64 tests which inspired the previous
untested commits.
llvm-svn: 206574
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.
llvm-svn: 206573
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.
More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.
llvm-svn: 206570
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.
Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.
llvm-svn: 206569
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.
llvm-svn: 206568
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.
llvm-svn: 206558