Commit Graph

35772 Commits

Author SHA1 Message Date
Matthew Simpson 11c4de6054 [AArch64] Add additional extract-extend patterns for smov
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.

This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.

llvm-svn: 256176
2015-12-21 18:31:25 +00:00
Chad Rosier 353d71914a Remove extra whitespace. NFC.
llvm-svn: 256173
2015-12-21 18:08:05 +00:00
Dan Gohman d544e0c100 [WebAssembly] Convert a regular for loop to a range-based for loop.
llvm-svn: 256169
2015-12-21 17:22:02 +00:00
Dan Gohman d9b4cdb68d [WebAssembly] Clean up comments and fix a missing #include dependency.
llvm-svn: 256168
2015-12-21 17:19:31 +00:00
Dan Gohman 979b766fef [WebAssembly] Remove an unneeded empty destructor.
llvm-svn: 256167
2015-12-21 17:12:40 +00:00
Dan Gohman d587aa5917 [WebAssembly] Enclose the operand variables for load and store instructions in braces.
This allows the AsmMatcherEmitter to properly tokenize the AsmStrings for
load and store instructions. This is a step towards asm parsing.

llvm-svn: 256166
2015-12-21 16:58:49 +00:00
Dan Gohman a783f10c16 [WebAssembly] Mark the ARGUMENT pseudo-instructions as CodeGenOnly.
llvm-svn: 256165
2015-12-21 16:53:29 +00:00
Dan Gohman dd20c70b61 [WebAssembly] Add some comments and make some minor source cleanups.
llvm-svn: 256164
2015-12-21 16:50:41 +00:00
Jun Bum Lim 4bb171c8da Revert "[AArch64] Promote loads from stores"
This reverts commit r256004 due to a failure in cortex-a53.

llvm-svn: 256160
2015-12-21 15:36:49 +00:00
Chad Rosier d016574df8 [AArch64] Enable PostRAScheduler for AArch64 generic build.
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.

http://reviews.llvm.org/D15652

llvm-svn: 256158
2015-12-21 14:43:45 +00:00
Igor Breger 44b60a3687 AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored.
llvm-svn: 256157
2015-12-21 14:40:36 +00:00
Amjad Aboud 60b5e1b6c0 Implemented Support of IA interrupt and exception handlers:
http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html

Differential Revision: http://reviews.llvm.org/D15567

llvm-svn: 256155
2015-12-21 14:07:14 +00:00
Zlatko Buljan 5da2f6cd03 [mips][microMIPS] Implement DERET and DI instructions and check size operand for EXT and DEXT* instructions
Differential Revision: http://reviews.llvm.org/D15570

llvm-svn: 256152
2015-12-21 13:08:58 +00:00
NAKAMURA Takumi 9ec6a826dd [Cygwin] Enable TLS as emutls.
It resolves clang selfhosting with std::once() for Cygwin.

FIXME: It may be EmulatedTLS-generic also for X86-Android.
FIXME: Pass EmulatedTLS to LLVM CodeGen from Clang with -femulated-tls.
llvm-svn: 256134
2015-12-21 02:37:23 +00:00
Dylan McKay f061e9b7b2 [AVR] Added AVRCallingConv.td
llvm-svn: 256130
2015-12-20 23:17:44 +00:00
Craig Topper ca66fc5473 [X86] Use range-based for loop. NFC
llvm-svn: 256127
2015-12-20 18:41:57 +00:00
Craig Topper 074e845260 [X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift.
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.

Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.

llvm-svn: 256126
2015-12-20 18:41:54 +00:00
Dylan McKay 029346f438 Add AVR.td and AVRRegisterInfo.td
Summary:
This adds the core AVR TableGen file, along with the register descriptions.

Lines in AVR.td which require other TableGen files which haven't been committed
yet are commented out.

This is a fairly trivial patch, and should only require a quick review.

I kept the line width smaller than 80 columns, but there are a few exceptions
because I'm not sure how to split a string over several lines.

Reviewers: stoklund

Subscribers: dylanmckay, agnat

Differential Revision: http://reviews.llvm.org/D14684

llvm-svn: 256120
2015-12-20 12:16:20 +00:00
Weiming Zhao 613c6862fa Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for Thumb2
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.

r250697: http://reviews.llvm.org/rL250697

Reviewers: rmaprath

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15653

llvm-svn: 256115
2015-12-20 06:41:44 +00:00
Tom Stellard ffc1a5aef7 AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders
Summary:
The analysis of shader inputs was completely wrong.  We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15608

llvm-svn: 256082
2015-12-19 02:54:15 +00:00
Matt Arsenault 2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Nicolai Haehnle 6bcf8b2890 AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg
Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15629

llvm-svn: 256073
2015-12-19 01:36:26 +00:00
Nicolai Haehnle dd58705af6 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

llvm-svn: 256072
2015-12-19 01:16:06 +00:00
Krzysztof Parzyszek 21dc8bdd9e [Hexagon] Add PIC support
llvm-svn: 256025
2015-12-18 20:19:30 +00:00
Changpeng Fang c9963936e7 AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

    Reviewers: none

    Subscribers: none

    Differential Revision: none

llvm-svn: 256022
2015-12-18 20:04:28 +00:00
Changpeng Fang ef735b74c1 Revert "AMDGPU/SI: Test commit"
This reverts commit a493cb636e0152ad28210934a47c6c44b1437193.

llvm-svn: 256021
2015-12-18 20:04:26 +00:00
Changpeng Fang 7fdf674c2e AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

Reviewers: none

Subscribers: none

Differential Revision: none

llvm-svn: 256020
2015-12-18 19:57:41 +00:00
Jun Bum Lim 3509d64c24 [AArch64] Promote loads from stores
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256004
2015-12-18 18:08:30 +00:00
Zlatko Buljan 252cca555f [mips][microMIPS][DSP] Implement PACKRL.PH, PICK.PH, PICK.QB, SHILO, SHILOV and WRDSP instructions
Differential Revision: http://reviews.llvm.org/D14429

llvm-svn: 255991
2015-12-18 08:59:37 +00:00
Eric Christopher 8c2adf6b49 Remove unused class variables.
llvm-svn: 255939
2015-12-17 23:43:40 +00:00
Hans Wennborg a6a2e512cf [X86] Use push-pop for materializing small constants under 'minsize'
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.

This is a follow-up to r255656.

Differential Revision: http://reviews.llvm.org/D15549

llvm-svn: 255936
2015-12-17 23:18:39 +00:00
Matthew Simpson 13dddb0799 Revert "[AArch64] Add DAG combine for extract extend pattern"
This reverts commit r255895. The patch breaks internal tests. Reverting until a
fix is ready.

llvm-svn: 255928
2015-12-17 21:29:47 +00:00
Dan Gohman 670a60ed52 [WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF.
llvm-svn: 255925
2015-12-17 20:50:45 +00:00
Tom Stellard caaa3aa07c AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15583

Patch by: Changpeng Fang

llvm-svn: 255908
2015-12-17 17:05:09 +00:00
Nicolai Haehnle 87323da6eb AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.

I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15542

llvm-svn: 255906
2015-12-17 16:46:42 +00:00
Rafael Espindola f44db24e1f Avoid explicit relocation sorting most of the time.
These days relocations are created and stored in a deterministic way.
The order they are created is also suitable for the .o file, so we don't
need an explicit sort.

The last remaining exception is MIPS.

llvm-svn: 255902
2015-12-17 16:22:06 +00:00
Rafael Espindola 9e1cae510f Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build"
This reverts commit r255896. It broke the tests.

llvm-svn: 255899
2015-12-17 15:12:26 +00:00
Rafael Espindola d0e16522c7 Always sort by offset first. NFC.
Every target changing sortRelocs was first calling the parent
implementation. Just run that first.

llvm-svn: 255898
2015-12-17 15:08:24 +00:00
Diego Novillo 8561841875 Fix unused variable warning in release builds. NFC.
llvm-svn: 255897
2015-12-17 14:58:34 +00:00
MinSeong Kim d05e9fd194 [AArch64] Enable PostRAScheduler for AArch64 generic build
This patch enables PostRAScheduler specifically for AArch64 generic build,
which is beneficial from the performance perspective.
Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed.
Also benchmarks from LLVM test-suite did not regress.

Differential Revision: http://reviews.llvm.org/D15557

llvm-svn: 255896
2015-12-17 14:51:22 +00:00
Matthew Simpson 4355e404d5 [AArch64] Add DAG combine for extract extend pattern
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.

Differential Revision: http://reviews.llvm.org/D15515

llvm-svn: 255895
2015-12-17 14:30:55 +00:00
Rafael Espindola 850ba46dd6 Simplify. NFC.
llvm-svn: 255894
2015-12-17 14:19:52 +00:00
Alexey Bataev 7b72b658cc [X86] Add option for enabling LEA optimization pass, by Andrey Turetsky
Add option to enable/disable LEA optimization pass. By default the pass is disabled.
Differential Revision: http://reviews.llvm.org/D15573

llvm-svn: 255881
2015-12-17 07:34:39 +00:00
Dan Gohman 5bf22fc84a [WebAssembly] Convert WebAssemblyTargetObjectFile to TargetLoweringObjectFileELF
llvm-svn: 255877
2015-12-17 04:55:44 +00:00
Matthias Braun 454192917b AArch64: Simplify emitEpilogue() and related code; NFC
This is in preparation to an upcoming patch.

llvm-svn: 255872
2015-12-17 03:18:47 +00:00
Dan Gohman 05ac43fec3 [WebAssembly] Experimental ELF writer support
This creates the initial infrastructure for writing ELF output files. It
doesn't yet have any implementation for encoding instructions.

Differential Revision: http://reviews.llvm.org/D15555

llvm-svn: 255869
2015-12-17 01:39:00 +00:00
JF Bastien eefff9ccc5 WebAssembly: update expected torture test failures
We now have 240 expected failures.

llvm-svn: 255858
2015-12-17 00:12:06 +00:00
Dan Gohman 4172953813 [WebAssembly] Fix legalization of shift operators on large integer types.
llvm-svn: 255847
2015-12-16 23:25:51 +00:00
Derek Schuff 8bb5f2927a [WebAssembly] Implement eliminateCallFramePseudo
Summary:
Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN
pseudo-instructions. Add a test calling a vararg function which causes non-0
adjustments. This revealed an issue with RegisterCoalescer wherein it
eliminates a COPY from SP32 to a vreg but failes to update the live ranges
of EXPR_STACK, causing a machineinstr verifier failure (so this test
is commented out).

Also add a dynamic alloca test, which causes a callseq_end dag node with
a 0 (instead of undef) second argument to be generated. We currently fail to
select that, so adjust the ADJCALLSTACKUP tablegen code to handle it.

Differential Revision: http://reviews.llvm.org/D15587

llvm-svn: 255844
2015-12-16 23:21:30 +00:00
Ahmed Bougacha 66834ec6e1 [AArch64] Simplify some TRI/TII getters. NFC.
We don't need static_casts when we use the right Subtarget.

llvm-svn: 255836
2015-12-16 22:54:06 +00:00
Ahmed Bougacha cecb6b0865 [CodeGen] Make MachineInstrBuilder::copyImplicitOps const. NFC.
This matches the other MIB methods, none of which modify the builder.
Without this, we can't chain copyImplicitOps.
Also reformat the few users, in PPCEarlyReturn.

llvm-svn: 255828
2015-12-16 22:15:30 +00:00
Manman Ren cbe4f9417d CXX_FAST_TLS calling convention: performance improvement for AArch64.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

The target independent portion was committed as r255353.
rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15341

llvm-svn: 255821
2015-12-16 21:04:19 +00:00
Derek Schuff 993d35b4aa Remove now-unused include
llvm-svn: 255817
2015-12-16 20:43:10 +00:00
Derek Schuff 83717cc297 Iterate over phys regs instead
llvm-svn: 255816
2015-12-16 20:43:08 +00:00
Derek Schuff 45cd5a79b2 [WebAssembly] Print an extra local decl when the user stack pointer is used
Differential Revision: http://reviews.llvm.org/D15546

llvm-svn: 255815
2015-12-16 20:43:06 +00:00
Krzysztof Parzyszek 4f9164d9b3 [Hexagon] Misc fixes to r255807
llvm-svn: 255811
2015-12-16 20:07:04 +00:00
Krzysztof Parzyszek 56bbf54b43 [Hexagon] Update the Hexagon packetizer
llvm-svn: 255807
2015-12-16 19:36:12 +00:00
Reid Kleckner 187d33ee74 Revert "[ARM] Add ARMv8.2-A FP16 scalar instructions"
This reverts commit r255762.

llvm-svn: 255806
2015-12-16 19:21:03 +00:00
Dan Gohman b3aa1ecab0 [WebAssembly] Fix the CFG Stackifier to handle unoptimized branches
If a branch both branches to and falls through to the same block, treat it as
an explicit branch.

llvm-svn: 255803
2015-12-16 19:06:41 +00:00
Matt Arsenault e05ff15186 AMDGPU: Override getCFInstrCost
The default cost was 0 with the assumption that it is predictable.

llvm-svn: 255796
2015-12-16 18:37:19 +00:00
Dan Gohman e2831b4e27 [WebAssembly] Use the new offset syntax for memory operands in inline asm.
llvm-svn: 255788
2015-12-16 18:14:49 +00:00
Ulrich Weigand 88a7a2eac7 [SystemZ] Sort relocs to avoid code corruption by linker optimization
The SystemZ linkers provide an optimization to transform a general-
or local-dynamic TLS sequence into an initial-exec sequence if possible.
Do do that, the compiler generates a function call to __tls_get_offset,
which is a brasl instruction annotated with *two* relocations:

- a R_390_PLT32DBL to install __tls_get_offset as branch target
- a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that
  the TLS optimization should be performed if possible

If the optimization is performed, the brasl is replaced by an ld load
instruction.

However, *both* relocs are processed independently by the linker.
Therefore it is crucial that the R_390_PLT32DBL is processed *first*
(installing the branch target for the brasl) and the R_390_TLS_GDCALL
is processed *second* (replacing the whole brasl with an ld).

If the relocs are swapped, the linker will first replace the brasl
with an ld, and *then* install the __tls_get_offset branch target
offset.  Since ld has a different layout than brasl, this may even
result in a completely different (or invalid) instruction; in any
case, the resulting code is corrupted.

Unfortunately, the way the MC common code sorts relocations causes
these two to *always* end up the wrong way around, resulting in
wrong code generation by the linker and crashes.

This patch overrides the sortRelocs routine to detect this particular
pair of relocs and enforce the required order.

llvm-svn: 255787
2015-12-16 18:12:40 +00:00
Ulrich Weigand 47f3649374 [SystemZ] Fix assertion failure in adjustSubwordCmp
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used.  This is
why the code in adjustSubwordCmp had this assertion:

    assert(C.ICmpType == SystemZICMP::Any &&
           "Signedness shouldn't matter here.");

assuming the the caller had already detected that fact.  However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.

Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.

llvm-svn: 255786
2015-12-16 18:04:06 +00:00
Tobias Edler von Koch b51460cf86 [Hexagon] Make memcpy lowering thread-safe
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.

llvm-svn: 255785
2015-12-16 17:29:37 +00:00
Dan Gohman 30a42bf585 [WebAssembly] Support more kinds of inline asm operands
llvm-svn: 255782
2015-12-16 17:15:17 +00:00
Oliver Stannard 2de8c16913 [ARM] Add ARMv8.2-A FP16 vector instructions
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.

Differential Revision: http://reviews.llvm.org/D15039

llvm-svn: 255764
2015-12-16 12:37:39 +00:00
Oliver Stannard 48568cbe18 [ARM] Add ARMv8.2-A FP16 scalar instructions
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.

These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.

Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.

New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.

Differential Revision: http://reviews.llvm.org/D15038

llvm-svn: 255762
2015-12-16 11:35:44 +00:00
Michael Kuperstein e75e6e2a23 [X86] Improve shift combining
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into       (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into    (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])

sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.

This fixes PR24373.

Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161

llvm-svn: 255761
2015-12-16 11:22:37 +00:00
Reid Kleckner 7850c9f5ca [WinEH] Make llvm.x86.seh.recoverfp work on x64
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.

Fixes SEH filter captures, which were broken in r250088.

Issue reported by Alex Crichton.

llvm-svn: 255707
2015-12-15 23:40:58 +00:00
Hans Wennborg 7036e503d7 Fix "Not having LAHF/SAHF" assert.
It wants to assert that the subtarget is 64-bit, not the register.

llvm-svn: 255703
2015-12-15 23:21:46 +00:00
Tom Stellard 7750f4ed9e AMDGPU/SI: Set the code object work group segment size when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15493

llvm-svn: 255702
2015-12-15 23:15:25 +00:00
Sanjay Patel 271efcdf20 [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475

but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.

The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.

Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend 
already has in PerformSELECTCombine().

Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the 
vector.

Differential Revision: http://reviews.llvm.org/D15294

llvm-svn: 255700
2015-12-15 23:11:43 +00:00
James Y Knight 99fcb721b2 [Sparc] Tweak r255668: Use llvm_unreachable.
llvm-svn: 255698
2015-12-15 23:07:16 +00:00
Tom Stellard a495307e5e AMDGPU/SI: Set the code objects private segment size when targeting HSA.
Summary: I'm not sure how things worked before without this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15492

llvm-svn: 255692
2015-12-15 22:55:30 +00:00
Tom Stellard 29dd05e92f AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15426

llvm-svn: 255689
2015-12-15 22:39:36 +00:00
Dan Gohman 4b9d7916ee [WebAssembly] Implement instruction selection for constant offsets in addresses.
Add instruction patterns for matching load and store instructions with constant
offsets in addresses. The code is fairly redundant due to the need to replicate
everything between imm, tglobaldadr, and texternalsym, but this appears to be
common tablegen practice. The main alternative appears to be to introduce
matching functions with C++ code, but sticking with purely generated matchers
seems better for now.

Also note that this doesn't yet support offsets from getelementptr, which will
be the most common case; that will depend on a change in target-independent code
in order to set the NoUnsignedWrap flag, which I'll submit separately. Until
then, the testcase uses ptrtoint+add+inttoptr with a nuw on the add.

Also implement isLegalAddressingMode with an approximation of this.

Differential Revision: http://reviews.llvm.org/D15538

llvm-svn: 255681
2015-12-15 22:01:29 +00:00
Reid Kleckner d7045faa10 [WinEH] Remove unused intrinsic llvm.x86.seh.restoreframe
We can clean this up now that we have the X86 CATCHRET instruction to
restore the FP, SP, and BP.

llvm-svn: 255677
2015-12-15 21:41:34 +00:00
Tom Stellard a6f24c6565 AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.

This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.

We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15425

llvm-svn: 255672
2015-12-15 20:55:55 +00:00
Justin Bogner 843fb204b7 LPM: Stop threading `Pass *` through all of the loop utility APIs. NFC
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:

- The APIs have access to pretty well any Pass state they want, so
  it's hard to tell what they may or may not do.

- Other APIs have copied these and pass around a `Pass *` even though
  they don't even use it. Some of these just hand a nullptr to the API
  since the callers don't even have a pass available.

- Passes in the new pass manager don't work like the current ones, so
  the APIs can't be used as is there.

Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.

llvm-svn: 255669
2015-12-15 19:40:57 +00:00
James Y Knight 33beb24318 [Sparc] Fix handling of double incoming arguments on sparc little-endian.
On SparcV8, doubles get passed in two 32-bit integer registers. The call
code was already handling endianness correctly, but the incoming
argument code was not -- it got the two halves in opposite order.

Also remove some dead code in LowerFormalArguments_32 to handle
less-than-32bit values, which can't actually happen.

Finally, add some test cases for the 32-bit calling convention, cribbed
from the 64abi.ll test, and run for both big and little-endian.

llvm-svn: 255668
2015-12-15 19:23:12 +00:00
Michael Kuperstein 53946bf8c6 [X86] MOVPC32r should only emit CFI adjustments when needed
We only want to emit CFI adjustments when actually using DWARF.
This fixes PR25828.

Differential Revision: http://reviews.llvm.org/D15522

llvm-svn: 255664
2015-12-15 18:50:32 +00:00
Tom Stellard dbe374b2c5 AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence()
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15476

llvm-svn: 255661
2015-12-15 18:04:38 +00:00
Tom Stellard 8f307217c3 AMDGPU/SI: Fix bitcast between v2f32 and f64
The radeonsi fp64 support can hit these now that some redundant bitcasts
are folded.

Patch by: Michel Dänzer

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 255657
2015-12-15 17:11:17 +00:00
Hans Wennborg 08d5905bac [X86] Smaller code for materializing 32-bit 1 and -1 constants
"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes.
This commit makes LLVM use the latter when optimizing for size.

Differential Revision: http://reviews.llvm.org/D14971

llvm-svn: 255656
2015-12-15 17:10:28 +00:00
JF Bastien dac806c783 WebAssembly: update expected torture test failures
We now have 252 expected failures.

llvm-svn: 255654
2015-12-15 17:07:07 +00:00
Krzysztof Parzyszek 372bd80834 [Hexagon] Preprocess mapped instructions before lowering to MC
llvm-svn: 255653
2015-12-15 17:05:45 +00:00
Tom Stellard 43f52df0b5 AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15475

llvm-svn: 255652
2015-12-15 17:02:52 +00:00
Tom Stellard ad7d03daa6 AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics
Summary:
These are meant to be used instead of the llvm.SI.fs.interp intrinsic which
will be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15474

llvm-svn: 255651
2015-12-15 17:02:49 +00:00
Tom Stellard ac00eb5470 AMDGPU/SI: Add getShaderType() function to Utils/
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15424

llvm-svn: 255650
2015-12-15 16:26:16 +00:00
Nemanja Ivanovic 8922476bcb Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

This patch was meant to land in revision 255246, but I accidentally uploaded
the patch that corresponds to http://reviews.llvm.org/D15372 in that revision
accidentally.

Thereby, this patch is the actual Bitcasts using direct moves patch, whereas
http://reviews.llvm.org/rL255246 actually corresponds to
http://reviews.llvm.org/D15372.

llvm-svn: 255649
2015-12-15 14:50:34 +00:00
Asaf Badouh 5acf66ff97 [x86] adding PKU feature flag
the feature flag is essential for RDPKRU and WRPKRU instruction 
more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm

Differential Revision: http://reviews.llvm.org/D15491

llvm-svn: 255644
2015-12-15 13:35:29 +00:00
Nemanja Ivanovic b033f67df0 Define a feature for __float128 support in the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D15117

In preparation for supporting IEEE Quad precision floating point,
this patch simply defines a feature to specify the target supports this.
For now, nothing is done with the target feature, we just don't want
warnings from the Clang FE when a user specifies -mfloat128.
Calling convention and other related work will add to this patch in
the near future.

llvm-svn: 255642
2015-12-15 12:19:34 +00:00
Elena Demikhovsky 6015f5c823 Type legalizer for masked gather and scatter intrinsics.
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).

This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.

Differential Revision: http://reviews.llvm.org/D13633

llvm-svn: 255629
2015-12-15 08:40:41 +00:00
Dan Gohman e3d7b58ba6 [WebAssembly] Use an immediate OperandType for offset operands.
llvm-svn: 255612
2015-12-15 03:21:48 +00:00
Dan Gohman dcba338188 [WebAssembly] Remove .import printing.
For now, LLVM doesn't know about wasm module imports, so it shouldn't
emit .import directives.

llvm-svn: 255602
2015-12-15 02:20:44 +00:00
Quentin Colombet 25b43f3624 [X86] Add relaxtion logic for SBB instructions.
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.

rdar://problem/23785506

llvm-svn: 255583
2015-12-15 00:09:23 +00:00
Quentin Colombet 2cb8a51c1f [X86] Add relaxtion logic for ADC instructions.
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.

rdar://problem/23785506

llvm-svn: 255570
2015-12-14 23:12:40 +00:00
Dan Gohman c7c0445443 [WebAssembly] Add type prefixes to call instructions
Add return type information to call and call_indirect instructions. This
allows them to be disambiguated without knowledge of the callee.

Differential Revision: http://reviews.llvm.org/D15484

llvm-svn: 255565
2015-12-14 22:56:51 +00:00
Dan Gohman 8fe7e86bf5 [WebAssembly] Implement a new algorithm for placing BLOCK markers
Implement a new BLOCK scope placement algorithm which better handles
early-return blocks and early exists from nested scopes.

Differential Revision: http://reviews.llvm.org/D15368

llvm-svn: 255564
2015-12-14 22:51:54 +00:00
Dan Gohman a712a6c4ce [WebAssembly] Avoid adding redundant EXPR_STACK uses.
llvm-svn: 255563
2015-12-14 22:37:23 +00:00
Chih-Hung Hsieh 7993e18e80 [X86] Part 2 to fix x86-64 fp128 calling convention.
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
  Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
  When target has MMX registers, configure MVT::f128 in FR128RegClass,
  with TypeSoftenFloat action, and custom actions for some opcodes.
  Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
  Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
  Fix infinite loop when f128 type can have
  VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.

Differential Revision: http://reviews.llvm.org/D11438

llvm-svn: 255558
2015-12-14 22:08:36 +00:00
Dan Gohman 87b4aa8914 [WebAssembly] Add an assert to sanity-check dead flags.
The WebAssemblyStoreResults pass runs before LiveVariables, so it doesn't
expect to have to keep dead flags up to date; check this with an assert.

llvm-svn: 255551
2015-12-14 21:53:54 +00:00
Krzysztof Parzyszek 5e6f2bd0cb [Hexagon] Add "const" to function parameters in HexagonInstrInfo
llvm-svn: 255544
2015-12-14 21:32:25 +00:00
Krzysztof Parzyszek dac7102874 [Packetizer] Add AliasAnalysis as a parameter to the packetizer
This will make the depedence graph more accurate if an alias analysis
is provided. If nullptr is specified in its place, the behavior will
remain as it is currently.

llvm-svn: 255540
2015-12-14 20:35:13 +00:00
Yaron Keren 45ea8fa1f4 Save several std::string constructions using llvm::Twine.
llvm-svn: 255535
2015-12-14 19:28:40 +00:00
Krzysztof Parzyszek d44a1fd506 Add "const" to function arguments in DFAPacketizer
llvm-svn: 255526
2015-12-14 18:54:44 +00:00
Petar Jovanovic 280f7101e8 [Power PC] llvm soft float support for ppc32
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D13700

llvm-svn: 255516
2015-12-14 17:57:33 +00:00
Matt Arsenault d079285e05 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Geoff Berry 8f5acb1bd1 Remove dead function AArch64TargetLowering::getFunctionAlignment. NFC.
Reviewers: t.p.northover, jmolloy, mcrosier

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15458

llvm-svn: 255509
2015-12-14 17:01:10 +00:00
Matt Arsenault 52a52a564b AMDGPU: Fix splitting vector loads with existing offsets
If the original MMO had an offset, it was dropped.
Also use the correct alignment after adding the new offset.

llvm-svn: 255508
2015-12-14 16:59:40 +00:00
Krzysztof Parzyszek 759a7d0ed7 [Hexagon] Subtarget features/default CPU corrections
llvm-svn: 255501
2015-12-14 15:03:54 +00:00
Chad Rosier bc9d4f9947 [PPC] Early exit loop. NFC.
llvm-svn: 255497
2015-12-14 14:44:06 +00:00
Michael Zuckerman 02ecd43c63 [X86][inline asm] support even directive
The .even directive aligns content to an evan-numbered address.

In at&t syntax .even 
In Microsoft syntax even (without the dot).

Differential Revision: http://reviews.llvm.org/D15413

llvm-svn: 255462
2015-12-13 17:07:23 +00:00
Simon Pilgrim 3e0c022aed Fix line endings
llvm-svn: 255459
2015-12-13 12:49:48 +00:00
Cong Hou c106989fd5 Normalize MBB's successors' probabilities in several locations.
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).


Differential revision: http://reviews.llvm.org/D15259

llvm-svn: 255455
2015-12-13 09:26:17 +00:00
Saleem Abdulrasool 778c268594 ARM: only emit EABI attributes on EABI targets
EABI attributes should only be emitted on EABI targets.  This prevents the
emission of the optimization goals EABI attribute on Windows ARM.

llvm-svn: 255448
2015-12-13 05:27:45 +00:00
Simon Pilgrim 052191dd82 [X86][AVX512] Added support for VMOVQ shuffle comments
llvm-svn: 255442
2015-12-12 21:46:23 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Hal Finkel 98347d3f2c [PowerPC] OutStreamer cleanup in PPCAsmPrinter
We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and
LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already
available. NFC.

llvm-svn: 255418
2015-12-12 01:47:08 +00:00
Chen Li 1b26b9ec9d [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255415
2015-12-12 01:04:15 +00:00
Hal Finkel 65539e3c94 [PowerPC] Add Branch Hints for Highly-Biased Branches
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.

Patch by Tom Jablin!

llvm-svn: 255398
2015-12-12 00:32:00 +00:00
Derek Schuff 8f55497264 [WebAssembly] Update test expectations
Many tests are now passing due to eliminateFrameIndex implementation and
the list needs to be re-triaged because it unblocks other failures, and
some previous failures are different. However I'm about to churn it more
by implementing more lowering, so will wait on that.

llvm-svn: 255396
2015-12-12 00:18:40 +00:00
Chen Li 02ef2e1385 Revert rL255391: [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
because it broke buildbot.

llvm-svn: 255395
2015-12-12 00:08:37 +00:00
Derek Schuff 9769debf88 [WebAssembly] Implement prolog/epilog insertion and FrameIndex elimination
Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.

Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).

Differential Revision: http://reviews.llvm.org/D15344

llvm-svn: 255392
2015-12-11 23:49:46 +00:00
Chen Li e8f9387e0c [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255391
2015-12-11 23:39:32 +00:00
Hal Finkel cd8664c3c2 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Matthias Braun 60d69e2865 CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.

This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).

Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.

This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033

Differential Revision: http://reviews.llvm.org/D15320

llvm-svn: 255362
2015-12-11 19:42:09 +00:00
Matt Arsenault fbd9bbfda3 Start replacing vector_extract/vector_insert with extractelt/insertelt
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359
2015-12-11 19:20:16 +00:00
Derek Schuff 5a14306323 [WebAssembly] Fix ADJCALLSTACKDOWN/UP use/defs
Summary:
ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use
and def the stack pointer. Since they do not, all the nodes are being
eliminated by DeadMachineInstructionElim, so they aren't in the IR when
PrologEpilogInserter/eliminateCallFramePseudo needs them.

This change fixes that, but since RegStackify will not stackify across
them (and it runs early, before PEI), change LowerCall to only emit them
when the call frame size is > 0. That makes the current code work the
same way and makes code handled by D15344 also work the same way. We can
expand the condition beyond NumBytes > 0 in the future if needed.

Reviewers: sunfish, jfb

Subscribers: jfb, dschuff, llvm-commits

Differential Revision: http://reviews.llvm.org/D15459

llvm-svn: 255356
2015-12-11 18:55:34 +00:00
Hans Wennborg a8e6b3ecb7 Fix build after r255319.
llvm-svn: 255322
2015-12-11 00:58:32 +00:00
Kyle Butt 1452b76f1f [PPC]: Peephole optimize small accesss to aligned globals.
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
        addis 3, 2, b4v@toc@ha
        addi 4, 3, b4v@toc@l
        lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
        lbz 6, 1(4)         ; optimizer
        lbz 7, 2(4)
        lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
        addis 3, 2, b4v@toc@ha
        lbz 4, b4v@toc@l(3)
        lbz 5, b4v@toc@l+1(3)
        lbz 6, b4v@toc@l+2(3)
        lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.

llvm-svn: 255319
2015-12-11 00:47:36 +00:00
Cong Hou 59898d8c68 [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.
Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.


Differential revision: http://reviews.llvm.org/D15132

llvm-svn: 255315
2015-12-11 00:31:39 +00:00
Kyle Butt 28b01a51b3 PPC: Teach FMA mutate to respect register classes.
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.

llvm-svn: 255299
2015-12-10 21:28:40 +00:00
Pirama Arumuga Nainar 1317d5f311 Fix fptosi, fptoui from f16 vectors to i8, i16 vectors
Summary:
Convert f16 vectors to corresponding f32 vectors before doing the
conversion to int.

Add tests for v4f16, v8f16.

Reviewers: ab, jmolloy

Subscribers: llvm-commits, srhines

Differential Revision: http://reviews.llvm.org/D14936

llvm-svn: 255263
2015-12-10 17:16:49 +00:00
Dan Gohman b949b9c01b [WebAssembly] Make WebAssemblyStoreResults only return true when it has a change.
llvm-svn: 255253
2015-12-10 14:17:36 +00:00
Dan Gohman a87629d6d7 [WebAssembly] Fix WebAssemblyPeephole to set Changed to true when making changes.
llvm-svn: 255252
2015-12-10 14:16:34 +00:00
Dan Gohman acc0941bd1 [WebAssembly] Declare that WebAssemblyPeephole does not modify the CFG.
llvm-svn: 255251
2015-12-10 14:12:04 +00:00
Dan Gohman 6d63f96749 [WebAssembly] Remove an unneeded getAnalysisUsage override.
llvm-svn: 255250
2015-12-10 14:10:04 +00:00
Nemanja Ivanovic ac8d01add0 Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.

llvm-svn: 255246
2015-12-10 13:35:28 +00:00
Jonas Paulsson e451eeff5c [PostRA scheduling] Allow a target to do scheduling when it wants post RA.
SystemZ needs to do its scheduling after branch relaxation, which can
only happen after block placement, and therefore the standard
PostRAScheduler point in the pass sequence is too early.

TargetMachine::targetSchedulesPostRAScheduling() is a new method that
signals on returning true that target will insert the final scheduling
pass on its own.

Reviewed by Hal Finkel

llvm-svn: 255234
2015-12-10 09:10:07 +00:00
Craig Topper 8e44b9a4d1 [X86] Fix a couple cases were bitwise and logical operations were being mixed. NFC
llvm-svn: 255224
2015-12-10 06:09:41 +00:00
Dan Gohman f170ba08af [WebAssembly] Implement mixed-type ISD::FCOPYSIGN.
ISD::FCOPYSIGN permits its operands to have differing types, and DAGCombiner
uses this. Add some def : Pat rules to expand this out into an explicit
conversion and a normal copysign operation.

llvm-svn: 255220
2015-12-10 04:55:31 +00:00
Dan Gohman 9341c1d4b3 [WebAssembly] Implement fma.
It is lowered to a libcall for now, but this is expected to change in the future.

llvm-svn: 255219
2015-12-10 04:52:33 +00:00
Tom Stellard c2d654322b AMDGPU/SI: Fix warning introduced by r255204
llvm-svn: 255205
2015-12-10 03:10:46 +00:00
Tom Stellard c93fc11f36 AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204
2015-12-10 02:13:01 +00:00
Tom Stellard b3c3bda512 AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraints
Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs.

Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15342

llvm-svn: 255203
2015-12-10 02:12:53 +00:00
Dan Gohman 60bddf17c5 [WebAssembly] Fix legalization of f32->f64 EXTLOAD.
llvm-svn: 255202
2015-12-10 02:07:53 +00:00
Derek Schuff 6fd28dfe5d [WebAssembly] Update known test failures
We can now select sign_extend_inreg

llvm-svn: 255197
2015-12-10 01:09:40 +00:00
Dan Gohman a5603b835b [WebAssembly] Also legalize sign_extend_inreg of i32->i64.
llvm-svn: 255191
2015-12-10 01:00:19 +00:00
Derek Schuff 71d0eae609 [WebAssembly] Update test failure expectations
llvm-svn: 255190
2015-12-10 00:56:18 +00:00
Dan Gohman a8483755d3 [WebAssembly] Fix legalization of shift operators with illegal types.
llvm-svn: 255181
2015-12-10 00:26:26 +00:00
Dan Gohman 7935fa3d1b [WebAssembly] Fix copy+pastos.
llvm-svn: 255180
2015-12-10 00:22:40 +00:00
Dan Gohman df00a9ebc2 [WebAssembly] Implement anyext.
llvm-svn: 255179
2015-12-10 00:17:35 +00:00
Quentin Colombet 5d2f7cfd44 [X86] Enable shrink-wrapping by default, but keep it disabled for stack frames
without a frame pointer when unwind may happen.
This is a workaround for a bug in the way we emit the CFI directives for
frameless unwind information. See PR25614.

llvm-svn: 255175
2015-12-09 23:08:18 +00:00
Dan Gohman 1cf96c0c34 [WebAssembly] Reintroduce ARGUMENT moving logic
Reinteroduce the code for moving ARGUMENTS back to the top of the basic block.
While the ARGUMENTS physical register prevents sinking and scheduling from
moving them, it does not appear to be sufficient to prevent SelectionDAG from
moving them down in the initial schedule. This patch introduces a patch that
moves them back to the top immediately after SelectionDAG runs.

This is still hopefully a temporary solution. http://reviews.llvm.org/D14750 is
one alternative, though the review has not been favorable, and proposed
alternatives are longer-term and have other downsides.

This fixes the main outstanding -verify-machineinstrs failures, so it adds
-verify-machineinstrs to several tests.

Differential Revision: http://reviews.llvm.org/D15377

llvm-svn: 255125
2015-12-09 16:23:59 +00:00
Tim Northover d91d635b36 ARM: don't use a deleted node as the BaseReg in complex pattern.
We mutated the DAG, which invalidated the node we were trying to use
as a base register. Sometimes we got away with it, but other times the
node really did get deleted before it was finished with.

Should fix PR25733

llvm-svn: 255120
2015-12-09 15:54:50 +00:00
JF Bastien 88f8014e8e WebAssembly: add missing failure to the list.
llvm-svn: 255119
2015-12-09 15:52:57 +00:00
Oliver Stannard 86f729296a [AArch64] Fix FP16 vector instructions that should only accept low registers
llvm-svn: 255113
2015-12-09 14:32:11 +00:00
Daniel Sanders 3c7223133d [mips][ias] Range check uimm10 operands
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15229

llvm-svn: 255112
2015-12-09 13:48:05 +00:00
JF Bastien c2b30484ae WebAssembly: add known failures
The bots are now running the torture tests properly. Bin all failures from the GCC C torture tests so that we can tackle failures and make the tree go red on regressions.

llvm-svn: 255111
2015-12-09 13:29:32 +00:00
Vasileios Kalintiris ddf7e6885a [mips] Use multiclass patterns for f32/f64 comparisons and i32 selects.
Summary:
Although the multiclass for i32 selects might seem redundant as it has
only one instantiation, we will use it to replace the correspondent
patterns in Mips64r6InstrInfo.td in follow-up commits.

Reviewers: dsanders

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14612

llvm-svn: 255110
2015-12-09 13:24:22 +00:00
Zlatko Buljan 48f1f39bfe Revert r254897 "[mips][microMIPS] Implement LH, LHE, LHU and LHUE instructions"
Commited patch was intended to implement LH, LHE, LHU and LHUE instructions.
After commit test-suite failed with error message in the form of:
fatal error: error in backend: Cannot select: t124: i32,ch = load<LD2[%d](tbaa=<0x94acc48>), sext from i16> t0, t2, undef:i32
For that reason I decided to revert commit r254897 and make new patch which besides implementation and standard regression tests will also have dedicated tests (CodeGen) for the above error. 

llvm-svn: 255109
2015-12-09 13:07:45 +00:00
Ahmed Bougacha 97564c3a1b [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

llvm-svn: 255089
2015-12-09 01:19:50 +00:00
Vyacheslav Klochkov a3cd08b05c X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes.
Reviewer: Simon Pilgrim.
Differential Revision: http://reviews.llvm.org/D15317

llvm-svn: 255080
2015-12-09 00:12:13 +00:00
Pirama Arumuga Nainar e6ccd7b66a Define selection for v4f16, v8f16 scalar_to_vector
Summary:
This fixes failure when trying to select
    insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.

The accompanying v4 and v8 tests fail instruction selection without this
patch.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15322

llvm-svn: 255072
2015-12-08 23:07:06 +00:00
Simon Pilgrim 323e00d9c7 [X86][AVX] Fold loads + splats into broadcast instructions
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.

This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.

Fix for PR23022

Differential Revision: http://reviews.llvm.org/D15310

llvm-svn: 255061
2015-12-08 22:17:11 +00:00
Artyom Skrobov 0a37b80bcb Fix ARMv4T (Thumb1) epilogue generation
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.

Reviewers: jroelofs, qcolombet

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15126

llvm-svn: 255047
2015-12-08 19:59:01 +00:00
Tim Northover 614e8ff855 X86: produce more friendly errors during MachO relocation handling
llvm-svn: 255036
2015-12-08 18:31:35 +00:00
Renato Golin 412ee3d45d [ARM] Allowing SP/PC for AND/BIC mod_imm_not
AND/BIC instructions do accept SP/PC, so the register class should be
more generic (rGPR -> GPR) to cope with that case. Adding more tests.

llvm-svn: 255034
2015-12-08 18:10:58 +00:00
Ron Lieberman e6540e244a [Hexagon] Add NewValueJump support for C4_cmpneq, C4_cmplte, C4_cmplteu
llvm-svn: 255027
2015-12-08 16:28:32 +00:00
Daniel Sanders 106d2d4693 [mips][ias] Range check uimm8 operands
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15226

llvm-svn: 255018
2015-12-08 14:42:10 +00:00
Daniel Sanders 59d092f883 [mips][ias] Range check uimm6 operands and fix a bug this revealed.
Summary:
We don't check the size operand on ext/dext*/ins/dins* yet because the
permitted range depends on the pos argument and we can't check that using
this mechanism.

The bug was that dextu/dinsu accepted 0..31 in the pos operand instead of 32..63.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15190

llvm-svn: 255015
2015-12-08 13:49:19 +00:00
Oliver Stannard e4c3d21ea6 [AArch64] Add ARMv8.2-A FP16 vector instructions
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.

The ".2h" vector type specifier is now legal (for the scalar pairwise
reduction instructions), so some unrelated tests have been modified as
different error messages are emitted. This is not a problem as the
invalid operands are still caught.

llvm-svn: 255010
2015-12-08 12:16:10 +00:00
Dan Gohman 31448f16b6 [WebAssembly] Fix a typo in a comment.
llvm-svn: 254999
2015-12-08 03:43:03 +00:00
Dan Gohman fd98ea89d9 [WebAssembly] Remove an unneeded static_cast.
llvm-svn: 254998
2015-12-08 03:42:50 +00:00
Dan Gohman 7f970765ea [WebAssembly] Fix an emacs syntax highlighting comment.
llvm-svn: 254997
2015-12-08 03:36:00 +00:00
Dan Gohman ad664b3bda [WebAssembly] Convert a file-level comment to doxygen style.
llvm-svn: 254996
2015-12-08 03:33:51 +00:00
Dan Gohman d70e5907cd [WebAssembly] Assert MRI.isSSA() in passes that depend on SSA form.
llvm-svn: 254995
2015-12-08 03:30:42 +00:00
Dan Gohman a8551b4d7e [WebAssembly] Trim some unneeded #includes.
llvm-svn: 254994
2015-12-08 03:25:35 +00:00
Dan Gohman 4a84b7322f [WebAssembly] Remove the override of haveFastSqrt.
The default implementation in BasicTTI already checks TLI and does
the right thing.

llvm-svn: 254993
2015-12-08 03:22:33 +00:00
Manman Ren cb8470b4b5 [CXX TLS calling convention] Add support for AArch64.
rdar://9001553

llvm-svn: 254978
2015-12-08 00:14:38 +00:00
Kit Barton a1c712fae5 [PPC64] Convert bool literals to i32
Convert i1 values to i32 values if they should be allocated in GPRs instead of CRs.

Phabricator: http://reviews.llvm.org/D14064
llvm-svn: 254942
2015-12-07 20:50:29 +00:00
Sanjay Patel a6bdd70f4b don't repeat function names in comments; NFC
llvm-svn: 254930
2015-12-07 19:31:34 +00:00
Sanjay Patel e4b9f507cf fix 'the the '; NFC
llvm-svn: 254928
2015-12-07 19:21:39 +00:00
Sanjay Patel f9bdb872bd remove redundant check: optForSize() includes a check for the minsize attribute; NFCI
llvm-svn: 254925
2015-12-07 19:13:40 +00:00
Elena Demikhovsky 291fe0159f VX-512: Fixed a bug in FP logic operation lowering
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.

Differential Revision: http://reviews.llvm.org/D15110

llvm-svn: 254913
2015-12-07 14:33:34 +00:00
Artyom Skrobov e9b3fb8603 [ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM.
Summary: This reverts r254234, and adds a simple fix for the annoying case of use-after-free.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15236

llvm-svn: 254912
2015-12-07 14:22:39 +00:00
Elena Demikhovsky 33e61eceb4 AVX-512: Fixed masked load / store instruction selection for KNL.
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.

This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.

All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.

Differential Revision: http://reviews.llvm.org/D15265

llvm-svn: 254909
2015-12-07 13:39:24 +00:00
Igor Breger 3ab6f17530 AVX-512: implement kunpck intrinsics.
Differential Revision: http://reviews.llvm.org/D14821

llvm-svn: 254908
2015-12-07 13:25:18 +00:00
Marina Yatsina 497d44a081 [X86] Adding support for FWORD type for MS inline asm
Adding support for FWORD type for MS inline asm.

Differential Revision: http://reviews.llvm.org/D15268

llvm-svn: 254904
2015-12-07 13:09:20 +00:00
Bradley Smith d5a1f47a63 [ARM] Flag vcvt{t,b} with an f16 type specifier as part of the FP16 extension
Additionally correct the Cortex-R7 definition to allow the FP16 feature.

llvm-svn: 254900
2015-12-07 10:54:36 +00:00
Zlatko Buljan 1a01c15027 [mips][microMIPS] Implement LH, LHE, LHU and LHUE instructions
Differential Revision: http://reviews.llvm.org/D9824

llvm-svn: 254897
2015-12-07 08:29:31 +00:00
Dan Gohman 5e0886beb7 [WebAssembly] Factor out a TypeToString function, since we need it in multiple places.
llvm-svn: 254884
2015-12-06 19:42:29 +00:00
Dan Gohman 770f0d0a40 [WebAssembly] Make tableswitch's 'default' operand explicit. NFC.
llvm-svn: 254883
2015-12-06 19:34:57 +00:00
Dan Gohman a4b710a74f [WebAssembly] Enable folding of offsets into global variable addresses.
llvm-svn: 254882
2015-12-06 19:33:32 +00:00
Dan Gohman 753abf8de5 [WebAssembly] Add some more ideas to README.txt.
llvm-svn: 254880
2015-12-06 19:29:54 +00:00
Marina Yatsina 1d1aa0b0a8 [X86] Add support for loopz, loopnz for Intel syntax
According to x86 spec, loopz and loopnz should be supported for Intel syntax, where loopz is equivalent to loope and loopnz is equivalent to loopne.

Differential Revision: http://reviews.llvm.org/D15148

llvm-svn: 254877
2015-12-06 15:31:47 +00:00
Asaf Badouh 41ecf460fa [X86][AVX512] add vmovss/sd missing encoding
Differential Revision: http://reviews.llvm.org/D14701

llvm-svn: 254875
2015-12-06 13:26:56 +00:00
Michael Kuperstein 77ce9d3b1a [X86] Always generate precise CFA adjustments.
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.

Differential Revision: http://reviews.llvm.org/D14948

llvm-svn: 254874
2015-12-06 13:06:20 +00:00
Igor Breger 076dfe5c12 AVX512: support AVX512BW Intrinsic in 32bit mode.
Differential Revision: http://reviews.llvm.org/D15076

llvm-svn: 254873
2015-12-06 11:35:18 +00:00
Craig Topper 15576e1c8f Use make_range to reduce mentions of iterator type. NFC
llvm-svn: 254872
2015-12-06 05:08:07 +00:00
Dan Gohman d85c3b1fbc [WebAssembly] Don't perform the returned-argument optimization on constants.
llvm-svn: 254866
2015-12-05 22:12:39 +00:00
Dan Gohman 3bb55e98e2 [WebAssembly] Replace the fake JUMP_TABLE instruction with a def : Pat. NFC.
llvm-svn: 254864
2015-12-05 20:46:53 +00:00
Dan Gohman e2a7a8278f [WebAssembly] Implement direct calls to external symbols.
llvm-svn: 254863
2015-12-05 20:41:36 +00:00
Dan Gohman 284384b640 [WebAssembly] Support inline asm constraints of type i16 and similar.
llvm-svn: 254861
2015-12-05 20:03:44 +00:00
Dan Gohman 905bef5cf9 [WebAssembly] Update a stale comment. NFC.
llvm-svn: 254859
2015-12-05 19:43:19 +00:00
JF Bastien f05f6fd1e2 WebAssembly: improve readme, add placeholder for tests.
llvm-svn: 254857
2015-12-05 19:36:33 +00:00
Dan Gohman 7615e46919 [WebAssembly] Move useAA() out of line to make it more convenient to experiment with.
llvm-svn: 254856
2015-12-05 19:27:18 +00:00
Dan Gohman b0921ca9e1 [WebAssembly] Call TargetPassConfig base class functions in overriding functions.
llvm-svn: 254855
2015-12-05 19:24:17 +00:00
Dan Gohman ebb23545de [WebAssembly] Expand frem as a floating point library function.
llvm-svn: 254854
2015-12-05 19:15:57 +00:00
Craig Topper 5c32279bee [Hexagon] Don't call getNumImplicitDefs and then iterate over the count. getNumImplicitDefs contains a loop so its better to just loop over the null terminated implicit def list. NFC
llvm-svn: 254852
2015-12-05 17:34:07 +00:00
Simon Pilgrim 4ba5969224 [X86][ADX] Added memory folding patterns and stack folding tests
llvm-svn: 254844
2015-12-05 07:27:50 +00:00
Craig Topper e5e035a3a8 Replace uint16_t with the MCPhysReg typedef in many places. A lot of physical register arrays already use this typedef.
llvm-svn: 254843
2015-12-05 07:13:35 +00:00
Simon Pilgrim 5a64d98303 [X86][FMA4] Explicitly set the domain of FMA4 float/double scalar instructions
Both were defaulting to the float domain - now matches the packed instructions.

llvm-svn: 254841
2015-12-05 07:07:42 +00:00
Dan Gohman f0b165a7f8 [WebAssembly] Implement ReverseBranchCondition, and re-enable MachineBlockPlacement
This patch introduces a codegen-only instruction currently named br_unless,
which makes it convenient to implement ReverseBranchCondition and re-enable
the MachineBlockPlacement pass. Then in a late pass, it lowers br_unless
back into br_if.

Differential Revision: http://reviews.llvm.org/D14995

llvm-svn: 254826
2015-12-05 03:03:35 +00:00
Dan Gohman 4da4abd87f [WebAssembly] Fix scheduling dependencies in register-stackified code
Add physical register defs to instructions used from stackified
instructions to prevent them from being scheduled into the middle of
a stack sequence. This is a conservative measure which may be loosened
in the future.

Differential Revision: http://reviews.llvm.org/D15252

llvm-svn: 254811
2015-12-05 00:51:40 +00:00
Derek Schuff 9d77952332 [WebAssembly] Support constant offsets on loads and stores
This is just prototype for load/store for i32 types. I'll add them to
the rest of the types if we like this direction.

Differential Revision: http://reviews.llvm.org/D15197

llvm-svn: 254807
2015-12-05 00:26:39 +00:00
Philip Reames 7c6692de16 [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access.  Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing.  Note that the actual implementation was always bailing if the load or store wasn't simple.  

Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared.  In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above.  These are normal loads and what most of the optimizer works with.

llvm-svn: 254805
2015-12-05 00:18:33 +00:00
Hans Wennborg fbf2822e6d Add FeatureLAHFSAHF to amdfam10 as well.
llvm-svn: 254801
2015-12-04 23:32:19 +00:00
Dan Gohman 35bfb24c28 [WebAssembly] Initial varargs support.
Full varargs support will depend on prologue/epilogue support, but this patch
gets us started with most of the basic infrastructure.

Differential Revision: http://reviews.llvm.org/D15231

llvm-svn: 254799
2015-12-04 23:22:35 +00:00
Hans Wennborg 5000ce8a63 X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.

(GCC puts these instructions behind -msahf on 64-bit for the same reason.)

This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.

Differential Revision: http://reviews.llvm.org/D15240

llvm-svn: 254793
2015-12-04 23:00:33 +00:00
Chad Rosier f3491496dc [AArch64] Expand vector SDIVREM/UDIVREM operations.
http://reviews.llvm.org/D15214
Patch by Ana Pazos <apazos@codeaurora.org>!

llvm-svn: 254773
2015-12-04 21:38:44 +00:00
Dan Gohman 1ce2b1afd6 [WebAssembly] Add several more calling conventions to the supported list.
llvm-svn: 254741
2015-12-04 18:27:03 +00:00
Sanjay Patel 1640c54593 fix formatting; NFC
llvm-svn: 254739
2015-12-04 17:51:55 +00:00
Manman Ren 19c7bbe3b7 [CXX TLS calling convention] Add CXX TLS calling convention.
This commit adds a new target-independent calling convention for C++ TLS
access functions. It aims to minimize overhead in the caller by perserving as
many registers as possible.

The target-specific implementation for X86-64 is defined as following:
  Arguments are passed as for the default C calling convention
  The same applies for the return value(s)
  The callee preserves all GPRs - except RAX and RDI

The access function makes C-style TLS function calls in the entry and exit
block, C-style TLS functions save a lot more registers than normal calls.
The added calling convention ties into the existing implementation of the
C-style TLS functions, so we can't simply use existing calling conventions
such as preserve_mostcc.

rdar://9001553

llvm-svn: 254737
2015-12-04 17:40:13 +00:00
Dan Gohman 541841e365 [WebAssembly] Give names to the callseq begin and end instructions.
llvm-svn: 254730
2015-12-04 17:19:44 +00:00
Dan Gohman a3f5ce5f1b [WebAssembly] clang-format CallingConvSupported. NFC.
llvm-svn: 254729
2015-12-04 17:18:32 +00:00
Dan Gohman 85dbdda1ed [WebAssembly] Factor out the list of supported calling conventions.
llvm-svn: 254728
2015-12-04 17:16:07 +00:00
Dan Gohman 2d822e73fa [WebAssembly] Check for more unsupported ABI flags.
llvm-svn: 254727
2015-12-04 17:12:52 +00:00
Dan Gohman cb7940f9f5 [WebAssembly] Use SelectionDAG::getUNDEF. NFC.
llvm-svn: 254726
2015-12-04 17:09:42 +00:00
Krzysztof Parzyszek f1b3e5e52e [Hexagon] Simplify LowerCONCAT_VECTORS, handle different types better
llvm-svn: 254724
2015-12-04 16:18:15 +00:00
Colin LeMahieu 4c606e66a7 [Hexagon] Using multiply instead of shift on signed number which can be UB
llvm-svn: 254719
2015-12-04 15:48:45 +00:00
Jonas Paulsson 7fa69cd5dd [SystemZ] Bugfix: Don't add CC twice to new three-address instruction.
Since BuildMI() automatically adds the implicit operands for a new instruction,
adding the old instructions CC operand resulted in that there were two CC imp-def
operands, where only one was marked as dead. This caused buildSchedGraph() to
miss dependencies on the CC reg.

Review by Ulrich Weigand

llvm-svn: 254714
2015-12-04 12:48:51 +00:00
Alexey Bataev 7cf324772f LEA code size optimization pass (Part 1): Remove redundant address recalculations, by Andrey Turetsky
Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.
Differential Revision: http://reviews.llvm.org/D13294

llvm-svn: 254712
2015-12-04 10:53:15 +00:00
Quentin Colombet 901f036353 [ARM] When a bitcast is about to be turned into a VMOVDRR, try to combine it
with its source instead of forcing the values on GPRs.

This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.

rdar://problem/23691584 

llvm-svn: 254684
2015-12-04 01:53:14 +00:00
JF Bastien 580b6572b5 X86InstrInfo::copyPhysReg: workaround reg liveness
Summary:
computeRegisterLiveness and analyzePhysReg are currently getting
confused about liveness in some cases, breaking copyPhysReg's
calculation of whether AX is dead in some cases. Work around this issue
temporarily by assuming that AX is always live.

See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7
And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201.

This workaround makes the code correct but slightly inefficient, but it
seems to confuse the machine instr verifier which now things EAX was
undefined in some cases where it's being conservatively saved /
restored.

Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15198

llvm-svn: 254680
2015-12-04 01:18:17 +00:00
Dan Gohman 391a98afd5 [WebAssembly] Fix dominance check for PHIs in the StoreResult pass
When a block has no terminator instructions, getFirstTerminator() returns
end(), which can't be used in dominance checks. Check dominance for phi
operands separately.

Also, remove some bits from WebAssemblyRegStackify.cpp that were causing
trouble on the same testcase; they were left behind from an earlier
experiment.

Differential Revision: http://reviews.llvm.org/D15210

llvm-svn: 254662
2015-12-03 23:07:03 +00:00
Chih-Hung Hsieh ed7d81e5d4 [X86] Part 1 to fix x86-64 fp128 calling convention.
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.

* Relax type legalization checks to accept new f128 type configuration,
  whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
  TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
  which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
  to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
  to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
  to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
  to guarantee that returned common sub class will contain the specified
  simple value type.
  This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.

Differential Revision: http://reviews.llvm.org/D15134

llvm-svn: 254653
2015-12-03 22:02:40 +00:00
Colin LeMahieu 15ca65c253 [Hexagon] Adding shuffling resources for HVX instructions and tests for instruction encodings.
llvm-svn: 254652
2015-12-03 21:44:28 +00:00
Reid Kleckner 93fc520339 [X86] Put no-op ADJCALLSTACK markers around all dynamic lowerings
Summary:
These ADJCALLSTACK markers don't generate code, but they keep dynamic
alloca code that calls chkstk out of the prologue.

This slightly pessimizes inalloca calls by preventing some register copy
coalescing, but I can live with that.

Reviewers: qcolombet

Subscribers: hans, llvm-commits

Differential Revision: http://reviews.llvm.org/D15200

llvm-svn: 254645
2015-12-03 20:46:59 +00:00
Krzysztof Parzyszek 7709aa0e07 [Hexagon] Remove variable unused in NDEBUG build
llvm-svn: 254623
2015-12-03 17:53:34 +00:00
Matthias Braun 0d4505c067 AArch64FastISel: Use cbz/cbnz to branch on i1
In the case of a conditional branch without a preceding cmp we used to emit
a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.

Differential Revision: http://reviews.llvm.org/D15122

llvm-svn: 254621
2015-12-03 17:19:58 +00:00
Krzysztof Parzyszek c168c0165c [Hexagon] Implement CONCAT_VECTORS for HVX using V6_vcombine
llvm-svn: 254617
2015-12-03 16:47:20 +00:00
Colin LeMahieu 7c572b2125 [Hexagon] NFC Using canonicalizePacket to compound/duplex/pad packets rather than doing it separately. This also ensures the integrated assembler path matches the assembly parser path.
llvm-svn: 254616
2015-12-03 16:37:21 +00:00
Krzysztof Parzyszek 25ddd2c9e8 [Hexagon] Fix instruction descriptor flags for memory access size
llvm-svn: 254613
2015-12-03 15:41:33 +00:00
Marina Yatsina 4b1aea0802 [X86] MS inline asm: produce error when encountering "<type> ptr <reg name>"
Currently "<type> ptr <reg name>" treated as <reg name> in MS inline asm, ignoring the "<type> ptr" completely and possibly ignoring the intention of the user.
Fixed llvm to produce an error when encountering "<type> ptr <reg name>" operands.

For example: andpd xmm1,xmmword ptr xmm1 --> andpd xmm1, xmm1 
though andpd has 2 possible matching formats - andpd xmm, xmm/m128

Patch by: ziv.izhar@intel.com
Differential Revision: http://reviews.llvm.org/D14607

llvm-svn: 254607
2015-12-03 12:17:03 +00:00
Marina Yatsina 90d9ffa7d6 [X86] Add support for fcomip, fucomip for Intel syntax
According to x86 spec, fcomip and fucomip should be supported for Intel syntax.

Differential Revision: http://reviews.llvm.org/D15104

llvm-svn: 254595
2015-12-03 08:55:33 +00:00
Tom Stellard 9760f03757 AMDGPU/SI: Emit constant arrays in the .hsrodata_readonly_agent section
Summary: This is done only when targeting HSA.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13807

llvm-svn: 254587
2015-12-03 03:34:32 +00:00
Joerg Sonnenberger 48eb197434 Add a TODO item that the nop handling before FP conditional branches is
not enough for SPARCv7.

llvm-svn: 254580
2015-12-03 02:35:24 +00:00
Derek Schuff 5268aaf7b6 [WebAssembly] Add a test for wasm-store-results pass
Differential Revision: http://reviews.llvm.org/D15167

llvm-svn: 254570
2015-12-03 00:50:30 +00:00
Dan Gohman ac132e9305 [WebAssembly] Assert that byval and nest are not used for return types.
llvm-svn: 254567
2015-12-02 23:40:03 +00:00
Krzysztof Parzyszek 8d8b229de9 [Hexagon] Improve lowering of instructions to the MC layer
- Add extenders when necessary.
- Handle some basic relocations.

This should fix the failure in tools/clang/test/CodeGenCXX/crash.cpp

llvm-svn: 254564
2015-12-02 23:08:29 +00:00
David Majnemer 70497c696a Move EH-specific helper functions to a more appropriate place
No functionality change is intended.

llvm-svn: 254562
2015-12-02 23:06:39 +00:00
Alexey Samsonov 44ff204fad Fixup for r254547: use format_hex() to simplify code.
llvm-svn: 254560
2015-12-02 22:59:22 +00:00
Alexey Samsonov 39b7d65d82 [PowerPC] Remove wild call to RegScavenger::initRegState().
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).

llvm-svn: 254548
2015-12-02 21:25:28 +00:00
Alexey Samsonov bcfabaa05b [Hexagon] Remove std::hex in favor of format().
std::hex is not used anywhere in LLVM code base except for this place,
and it has a known undefined behavior (at least in libstdc++ 4.9.3):
https://llvm.org/bugs/show_bug.cgi?id=18156, which fires in UBSan
bootstrap of LLVM.

llvm-svn: 254547
2015-12-02 21:13:43 +00:00
Tom Stellard 00f2f91af4 AMDGPU/SI: Correctly emit agent global segment variables when targeting HSA
Differential Revision: http://reviews.llvm.org/D14508

llvm-svn: 254540
2015-12-02 19:47:57 +00:00
Krzysztof Parzyszek de25ecfa62 [Hexagon] Remove TFRI_V4 instruction, use existing A2_tfrsi instead
llvm-svn: 254539
2015-12-02 19:44:35 +00:00
Kyle Butt 015f4fc854 Test Commit: iteratee
Remove whitespace from blank lines. NFC

llvm-svn: 254531
2015-12-02 18:53:33 +00:00
Tom Stellard e928533dae AMDGPU: Fix msan test failure
llvm-svn: 254527
2015-12-02 18:35:23 +00:00
Tim Northover f520eff782 AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.

llvm-svn: 254524
2015-12-02 18:12:57 +00:00
Dan Gohman 53d1399792 [WebAssembly] Fix comments to say "LIFO" instead of "FIFO" when describing a stack.
llvm-svn: 254523
2015-12-02 18:08:49 +00:00
Tom Stellard e3b5aeaf83 AMDGPU/SI: Don't emit group segment global variables
Summary: Only global or readonly segment variables should appear in object files.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15111

llvm-svn: 254519
2015-12-02 17:00:42 +00:00
Michael Zuckerman 15152a5c41 By intel spec
|9B DD /7| FSTSW m2byte| Valid Valid Store FPU status word at m2byteafter checking for pending unmasked floating-point exceptions.|
|9B DF E0| FSTSW AX| Valid Valid Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.|
|DD /7 |FNSTSW *m2byte| Valid Valid Store FPU status word at m2bytewithout checking for pending unmasked floating-point exceptions.|
|DF E0 |FNSTSW *AX| Valid Valid Store FPU status word in AX register without checking for pending unmasked floating-point exceptions|

m2byte is word register, and therefor instruction operand need to be change from f32mem to i16mem.

Differential Revision: http://reviews.llvm.org/D14953

llvm-svn: 254512
2015-12-02 14:34:34 +00:00
Christof Douma 8b5dc2c94e [AArch64]: Add support for Cortex-A35
Adds support for the new Cortex-A35 ARMv8-A core.

llvm-svn: 254503
2015-12-02 11:53:44 +00:00
Nemanja Ivanovic 74e31bc929 Patch to fix a crash in the PowerPC back end due to ISD::ROTL and ISD::ROTR
not being expanded. Test case included.

llvm-svn: 254501
2015-12-02 10:36:24 +00:00
Hrvoje Varga 672b0f5582 [mips][microMIPS] Implement PREPEND, RADDU.W.QB, RDDSP, REPL.PH, REPL.QB, REPLV.PH, REPLV.QB and MTHLIP instructions
Differential Revision: http://reviews.llvm.org/D14527

llvm-svn: 254496
2015-12-02 09:31:24 +00:00
Simon Pilgrim 3fc3454a0c [X86][FMA] Optimize FNEG(FMUL) Patterns
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)

Fix for PR24366

Differential Revision: http://reviews.llvm.org/D14909

llvm-svn: 254495
2015-12-02 09:07:55 +00:00
Elena Demikhovsky a1a40cce9f AVX-512: Updated cost of FP/SINT/UINT conversion operations
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).

Differential Revision: http://reviews.llvm.org/D15074

llvm-svn: 254494
2015-12-02 08:59:47 +00:00
Asaf Badouh 2489f350c0 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 254493
2015-12-02 08:17:51 +00:00
Craig Topper f419a1f69a [X86] Change getZeroVector to take an MVT instead of EVT. One minor change needed to only try to perform 256-it shuffle combines on legal vector types.
llvm-svn: 254490
2015-12-02 06:39:19 +00:00
Craig Topper 6164297f46 [X86] Fix weird identation. NFC
llvm-svn: 254487
2015-12-02 05:24:38 +00:00
Quentin Colombet bbdebefff6 [X86] Fix a think-o when checking if the eflags needs to be preserved.
llvm-svn: 254480
2015-12-02 02:07:00 +00:00
Quentin Colombet f1e91c8bf1 [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.
This is a superset of the fix done in r254448.

This fixes PR25607.

llvm-svn: 254478
2015-12-02 01:22:54 +00:00
Tim Northover f3be9d5c0b AArch64: fix 128-bit shifts
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).

The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.

rdar://22491037

llvm-svn: 254475
2015-12-02 00:33:54 +00:00
Matt Arsenault 592d068198 AMDGPU: Error on addrspacecasts that aren't actually implemented
llvm-svn: 254469
2015-12-01 23:04:05 +00:00
Matt Arsenault f9bfeafd00 AMDGPU: Implement isNoopAddrSpaceCast
llvm-svn: 254468
2015-12-01 23:04:00 +00:00
Matthias Braun b258d794dd ARM: Change ArchCheck field to uint64_t
The values in this field are compared against getAvailableFeatures()
which returns an uint64_t. This was causing problems in an internal
branch.

llvm-svn: 254462
2015-12-01 21:48:52 +00:00
Matt Arsenault 3b15967008 AMDGPU: Disallow flat_scr in SI assembler
llvm-svn: 254459
2015-12-01 20:31:08 +00:00
Matt Arsenault 856d1928a8 AMDGPU: Optimize VOP2 operand legalization
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.

This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.

llvm-svn: 254452
2015-12-01 19:57:17 +00:00
Quentin Colombet 9cb01aa30a [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.
This fixes PR25629.

llvm-svn: 254448
2015-12-01 19:49:31 +00:00
Artyom Skrobov 5d1f2524a0 Fix Thumb1 epilogue generation
Summary:
This had been broken for a very long time, but nobody noticed until
D14357 enabled shrink-wrapping by default.

Reviewers: jroelofs, qcolombet

Subscribers: tyomitch, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14986

llvm-svn: 254444
2015-12-01 19:25:11 +00:00
Weiming Zhao 56ab51870c [AArch64] Fix a corner case in BitFeild select
Summary:
When not useful bits, BitWidth becomes 0 and APInt will not be happy.

See https://llvm.org/bugs/show_bug.cgi?id=25571

We can just mark the operand as IMPLICIT_DEF is none bits of it is used.

Reviewers: t.p.northover, jmolloy

Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14803

llvm-svn: 254440
2015-12-01 19:17:49 +00:00
Matt Arsenault e830f5427b AMDGPU: Report extractelement as free in cost model
The cost for scalarized operations is computed as N * (scalar operation
cost + 1 extractelement + 1 insertelement). This partially fixes
inflating the cost of scalarized operations since every operation is
scalarized and free. I don't think we want any cost asociated with
scalarization, but for now insertelement is still counted. I'm not sure
if we should pretend that insertelement is also free, or add a way
to compute a custom scalarization cost.

llvm-svn: 254438
2015-12-01 19:08:39 +00:00
Tom Stellard 38b7cbe3e0 AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now dead
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15050

llvm-svn: 254427
2015-12-01 17:45:22 +00:00
Tom Stellard ff63c25753 AMDGPU: Use the default strings for data emission directives
Summary:
This makes the assembly output look nicer and there is no reason to
have custom strings for these.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D14671

llvm-svn: 254426
2015-12-01 17:45:17 +00:00
Sanjay Patel 60216f6943 [x86] add a convenience method to check for FMA capability; NFCI
llvm-svn: 254425
2015-12-01 17:27:55 +00:00
Elena Demikhovsky 0d0692d854 AVX-512: fixed asm string of vsqrtss
(vvsqrtss was generated before)

llvm-svn: 254411
2015-12-01 12:43:46 +00:00
Hrvoje Varga e51b0e13f3 [mips][microMIPS] Implement RECIP.fmt, RINT.fmt, ROUND.L.fmt, ROUND.W.fmt, SEL.fmt, SELEQZ.fmt, SELNEQZ.fmt and CLASS.fmt
Differential Revision: http://reviews.llvm.org/D13885

llvm-svn: 254405
2015-12-01 11:59:21 +00:00
Yury Gribov d7dbb66eb8 Introduce new @llvm.get.dynamic.area.offset.i{32, 64} intrinsics.
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.

Patch by Max Ostapenko.

Differential Revision: http://reviews.llvm.org/D14983

llvm-svn: 254404
2015-12-01 11:40:55 +00:00
Oliver Stannard a34e47066e [AArch64] Add ARMv8.2-A Statistical Profiling Extension
The Statistical Profiling Extension is an optional extension to
ARMv8.2-A. Since it is an optional extension, I have added the
FeatureSPE subtarget feature to control it. The assembler-visible parts
of this extension are the new "psb csync" instruction, which is
equivalent to "hint #17", and a number of system registers.

Differential Revision: http://reviews.llvm.org/D15021

llvm-svn: 254401
2015-12-01 10:48:51 +00:00
Oliver Stannard 4667071574 [ARM] Add ARMv8.2-A to TargetParser
Add ARMv8.2-A to TargetParser, so that it can be used by the clang
command-line options and the .arch directive.

Most testing of this will be done in clang, checking that the
command-line options that this enables work.

Differential Revision: http://reviews.llvm.org/D15037

llvm-svn: 254400
2015-12-01 10:33:56 +00:00
Oliver Stannard 8addbf4350 [ARM] Add subtarget features for ARMv8.2-A
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature.
There is also one large, optional feature, which adds 16-bit floating
point versions of all existing floating-point instructions (VFP and
SIMD), this is represented by the FeatureFullFP16 subtarget feature.

Differential Revision: http://reviews.llvm.org/D15036

llvm-svn: 254399
2015-12-01 10:23:06 +00:00
Craig Topper c458c7c6c9 [X86] Fix patterns for memory forms of FP FSUBR and FDIVR. They need to have memory on the left hand side of the fsub/fdiv operations in their patterns.
Not sure how to test this. I noticed by inspection in the isel tables where the same pattern tried to produce DIV and DIVR or SUB and SUBR.

llvm-svn: 254388
2015-12-01 06:13:16 +00:00
Craig Topper 271f9ded44 [X86] Use range-based for loops. NFC
llvm-svn: 254387
2015-12-01 06:13:15 +00:00
Craig Topper ba894c3c0d [X86] Use array_lengthof instead of calculating manually. Also change index types to size_t to match.
llvm-svn: 254386
2015-12-01 06:13:13 +00:00
Craig Topper ddc76f2bed [Hexagon] Use std::begin() and std::end() instead of doing the same manually. NFC
llvm-svn: 254385
2015-12-01 06:13:10 +00:00
Craig Topper d824f5f0d9 [Hexagon] Use array_lengthof and const correct and type correct the array and array size. NFC
llvm-svn: 254384
2015-12-01 06:13:08 +00:00
Craig Topper 6261e1b94d Use array_lengthof instead of manually calculating it. NFC
llvm-svn: 254383
2015-12-01 06:13:06 +00:00
Craig Topper 3da000c07f [Hexagon] Use ArrayRef to avoid needing to calculate an array size. Interestingly the original code may have had a bug because it was passing the byte size of a uint16_t array instead of the number of entries.
llvm-svn: 254382
2015-12-01 06:13:04 +00:00
Craig Topper 8072081b63 [ARM] Use range-based for loops to avoid the need for calculating an array size that I would have otherwise cconverted to array_lengthof. NFC
llvm-svn: 254381
2015-12-01 06:13:01 +00:00
Cong Hou d97c100dc4 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
(This is the second attempt to submit this patch. The first caused two assertion
 failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)

The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254377
2015-12-01 05:29:22 +00:00
Hans Wennborg 1dbaf67537 Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces."
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."

Asserts were firing in Chromium builds. See PR25687.

llvm-svn: 254366
2015-12-01 03:49:42 +00:00
Matt Arsenault 456fdfcdc2 Squelch unused variable warning in SIRegisterInfo.cpp.
Patch by Justin Lebar

llvm-svn: 254362
2015-12-01 02:14:33 +00:00
Cong Hou fa1917c673 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254348
2015-12-01 00:02:51 +00:00
Simon Pilgrim db26b3ddfa [X86][FMA4] Prefer FMA4 to FMA
We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4).

This patch flips this so FMA4 is preferred; this is for several reasons:

1 - FMA4 is non-destructive reducing the need for mov instructions.
2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference).
3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions.

Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs.

Differential Revision: http://reviews.llvm.org/D14997

llvm-svn: 254339
2015-11-30 22:22:06 +00:00
Matt Arsenault ada6cf1b22 AMDGPU: Fix unused function
llvm-svn: 254333
2015-11-30 21:32:10 +00:00
Matt Arsenault 41003af292 AMDGPU: Error if too many user SGPRs used
llvm-svn: 254332
2015-11-30 21:16:07 +00:00
Matt Arsenault 26f8f3db39 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00
Matt Arsenault ac234b604d AMDGPU: Rename enums to be consistent with HSA code object terminology
llvm-svn: 254330
2015-11-30 21:15:57 +00:00
Matt Arsenault 0e3d38937e AMDGPU: Remove SIPrepareScratchRegs
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

llvm-svn: 254329
2015-11-30 21:15:53 +00:00
Matt Arsenault ff6da2fe89 AMDGPU: Use assert zext for workgroup sizes
llvm-svn: 254328
2015-11-30 21:15:45 +00:00
Quentin Colombet cdad10f333 [ARM] For old thumb ISA like v4t, we cannot use PC directly in pop.
Fix the epilogue emission to account for that.

llvm-svn: 254325
2015-11-30 20:37:58 +00:00
David Majnemer bf4119faf6 [X86] Add RIP to GR64_TCW64
The MachineVerifier wants to check that the register operands of an
instruction belong to the instruction's register class.  RIP-relative
control flow instructions violated this by referencing RIP.  While this
was fixed for SysV, it was never fixed for Win64.

llvm-svn: 254315
2015-11-30 19:04:19 +00:00
Kit Barton f4ce2f3a9e Enable shrink wrapping for PPC64
Re-enable shrink wrapping for PPC64 Little Endian.

One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly.

Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found.

PHabricator: http://reviews.llvm.org/D14778
llvm-svn: 254314
2015-11-30 18:59:41 +00:00
Dan Gohman 96029f7880 [WebAssembly] Fix a few minor compiler warnings. NFC.
llvm-svn: 254311
2015-11-30 18:42:08 +00:00
Sanjay Patel 239be1fb0d fix formatting; NFC
llvm-svn: 254310
2015-11-30 17:52:02 +00:00
Colin LeMahieu e6241798c9 [Hexagon] NFC Reordering headers.
llvm-svn: 254307
2015-11-30 17:32:34 +00:00
Matt Arsenault ea03cf2fa1 AMDGPU: Don't reserve SCRATCH_PTR input register
This hasn't been doing anything since using relocations was added.

llvm-svn: 254304
2015-11-30 15:46:47 +00:00
Aaron Ballman 33c95f08b0 Silencing a 32-bit to 64-bit implicit conversion warning; NFC.
llvm-svn: 254302
2015-11-30 14:52:33 +00:00
Hrvoje Varga c03957f049 [mips][microMIPS] Implement LBUX, LHX, LWX, MAQ_S[A].W.PHL, MAQ_S[A].W.PHR, MFHI, MFLO, MTHI and MTLO instructions
Differential Revision: http://reviews.llvm.org/D14436

llvm-svn: 254297
2015-11-30 12:58:39 +00:00
Zoran Jovanovic a887b36167 [mips][microMIPS] Fix issue with offset operand of BALC and BC instructions
Value of offset operand for microMIPS BALC and BC instructions is currently shifted 2 bits, but it should be 1 bit.
Differential Revision: http://reviews.llvm.org/D14770

llvm-svn: 254296
2015-11-30 12:56:18 +00:00
Zlatko Buljan 56f3b0e410 [mips][microMIPS] Implement PRECR.QB.PH, PRECR_SRA[_R].PH.W, PRECRQ.PH.W, PRECRQ.QB.PH, PRECRQU_S.QB.PH and PRECRQ_RS.PH.W instructions
Differential Revision: http://reviews.llvm.org/D14605

llvm-svn: 254291
2015-11-30 08:37:38 +00:00
Craig Topper 27e2912fa8 Revert r254279 "[X86] Use ArrayRef. NFC". It seems to have upset an MSVC build bot.
llvm-svn: 254280
2015-11-30 02:28:19 +00:00
Craig Topper b84f39865f [X86] Use ArrayRef. NFC
llvm-svn: 254279
2015-11-30 02:08:05 +00:00
Craig Topper aad5f11e5f [AVX512] The vpermi2 instructions require an integer vector for the index vector. This is reflected correctly in the intrinsics, but was not refelected in the isel patterns.
For the floating point types, this requires adding a bitcast to the index vector when its passed through to the output.

llvm-svn: 254277
2015-11-30 00:13:24 +00:00
Craig Topper fbde7aa13a [X86] Remove duplicate entries from intrinsics tables and add asserts to verify there are no others.
llvm-svn: 254274
2015-11-29 23:18:32 +00:00
Dan Gohman 9551a44d0c [WebAssembly] Delete an obsolete TODO comment.
llvm-svn: 254272
2015-11-29 23:09:41 +00:00
Dan Gohman 174b2d83ee [WebAssembly] Set several MCInstrDesc flags.
llvm-svn: 254271
2015-11-29 22:59:19 +00:00
Craig Topper ecae476e4c [X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices.
llvm-svn: 254269
2015-11-29 22:53:22 +00:00
Dan Gohman 5237b3991d [WebAssembly] Delete unused functions. NFC.
llvm-svn: 254268
2015-11-29 22:48:57 +00:00
Dan Gohman 7a6b9825ce [WebAssembly] Minor clang-format and selected clang-tidy cleanups. NFC.
llvm-svn: 254267
2015-11-29 22:32:02 +00:00
Simon Pilgrim 88aa627c0b [X86][SSE] Added support for lowering to ADDSUBPS/ADDSUBPD with commuted inputs
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.

llvm-svn: 254259
2015-11-29 16:41:04 +00:00
Igor Breger e293e83f5d AVX512:Implemented encoding for the vmovq.s instruction.
Differential Revision: http://reviews.llvm.org/D14810

llvm-svn: 254248
2015-11-29 07:41:26 +00:00
Renato Golin 5dbc8a5283 Revert "[ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM."
This reverts commit r254201 and r254202, as it broke test-suite,
self-hosting and sanitizer tests on ARM buildbots.

llvm-svn: 254234
2015-11-28 17:23:46 +00:00
Jonas Paulsson f12b925bb1 [Stack realignment] Handling of aligned allocas.
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.

It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.

SystemZ benefits from this, due to its stack frame layout.

New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.

Review and help from Ulrich Weigand and Hal Finkel.

llvm-svn: 254227
2015-11-28 11:02:32 +00:00
Artyom Skrobov f01a59f9fb Follow-up fix for r254201
llvm-svn: 254202
2015-11-27 16:20:34 +00:00
Artyom Skrobov b955b90509 [ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM.
Summary:
Since this build attribute corresponds to a whole module, and
different functions in a module may differ in the optimizations
enabled for them, this attribute is emitted after all functions,
and only in the case that the optimization goals for all
functions match.

Reviewers: logan, hans

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14934

llvm-svn: 254201
2015-11-27 15:30:51 +00:00
Oliver Stannard b25914e03f [AArch64] Add ARMv8.2-A FP16 scalar instructions
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Most of these instructions are the same as the 32- and 64-bit versions,
but with the type field (bits 23-22) set to 0b11. Previously the top bit
of the size field was always 0, so the instruction classes only provided
a 1-bit size field, which I have widened to 2 bits.

Differential Revision: http://reviews.llvm.org/D15014

llvm-svn: 254198
2015-11-27 13:04:48 +00:00
Craig Topper e38c57a4b8 [X86] Pair a NoVLX with HasAVX512 to match the others and remove a unique predicate check in the isel tables. NFC
llvm-svn: 254191
2015-11-27 05:44:02 +00:00
Craig Topper a47576f297 [X86] Now that X86VPermt2 is used in all the avx512_perm_t_sizes just hardcode it into the patterns instead of passing as an argument. NFC
llvm-svn: 254177
2015-11-26 20:21:29 +00:00
Craig Topper 05858f52fe [X86] Merge X86VPermt2Fp and X86VPermt2Int back together by weakening them just enough. The SDTCisSameSizeAs introduced in r254138 helps here.
llvm-svn: 254176
2015-11-26 20:02:01 +00:00
Craig Topper 0009656335 [X86] Split ISD node for Vfpclass and Vfpclasss so that we can write strong type constraints for each that don't cause ambiguous isel.
llvm-svn: 254172
2015-11-26 19:41:34 +00:00
Craig Topper ff2f14731a [X86] Revert part of r254167 to recover bots.
llvm-svn: 254169
2015-11-26 19:13:05 +00:00
Krzysztof Parzyszek 08ff8883fd [Hexagon] Lowering of V60/HVX vector types
llvm-svn: 254168
2015-11-26 18:38:27 +00:00
Craig Topper 9d1deb4b72 [X86] Strengthen more type constraints to reduce isel table size.
llvm-svn: 254167
2015-11-26 18:31:19 +00:00
Krzysztof Parzyszek 4eb6d4d1f2 [Hexagon] Hexagon V60 HVX intrinsic defintions
Author: Ron Lieberman <ronl@codeaurora.org>
llvm-svn: 254165
2015-11-26 16:54:33 +00:00
Daniel Sanders daa4b6fbd9 [mips][ias] Range check uimm5 operands and fix several bugs this revealed.
Summary:
The bugs were:
* append, prepend, and balign were not tested
* balign takes a uimm2 not a uimm5.
* drotr32 was correctly implemented with a uimm5 but the tests expected
  '52' to be valid.
* li/la were implemented with a uimm5 instead of simm32. simm32 isn't
  completely correct either but I'll fix that when I get to simm32.

A notable omission are some of the shift instructions. Several of these
have been implemented using a single uimm6 instruction (rather than two
uimm5 instructions and a CodeGen-only uimm6 pseudo). These will be updated
in the uimm6 patch.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14712

llvm-svn: 254164
2015-11-26 16:35:41 +00:00
Oliver Stannard 64c167db7a [AArch64] Add ARMv8.2-A new AT instruction variants
ARMv8.2-A adds new variants of the "at" (address translate) system
instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These
are a required part of ARMv8.2-A, so no additional subtarget features
are required.

Differential Revision: http://reviews.llvm.org/D15018

llvm-svn: 254159
2015-11-26 15:34:44 +00:00
Martell Malone d12292480a ARM: address WOA unsigned division overflow crash
Building on r253865 the crash is not limited to signed overflows.

Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.

llvm-svn: 254158
2015-11-26 15:34:03 +00:00