Commit Graph

35251 Commits

Author SHA1 Message Date
Michael Kuperstein 73dc85293f [X86] Generate .cfi_adjust_cfa_offset correctly when pushing arguments
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.

Darwin does not support this well, so don't use pushes whenever it
would be required.

Differential Revision: http://reviews.llvm.org/D13767

llvm-svn: 251904
2015-11-03 08:17:25 +00:00
Igor Breger 4ec5abffae AVX512: add encoding tests for vmovq/d instructions.
llvm-svn: 251903
2015-11-03 07:30:17 +00:00
Matthias Braun f538e133cc Fix build problme introduced in r251883
llvm-svn: 251888
2015-11-03 02:19:07 +00:00
Matthias Braun 93563e7032 ScheduleDAGInstrs: Remove IsPostRA flag; NFC
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.

The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.

Differential Revision: http://reviews.llvm.org/D14245

llvm-svn: 251883
2015-11-03 01:53:29 +00:00
Colin LeMahieu 160f73e36f [Hexagon] Fixing mistaken case fallthrough.
llvm-svn: 251867
2015-11-03 00:21:19 +00:00
Matt Arsenault f1aebbf33a AMDGPU: Stop assuming vreg for build_vector
This was causing a variety of test failures when v2i64
is added as a legal type.

SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.

llvm-svn: 251860
2015-11-02 23:30:48 +00:00
Derek Schuff 43e96c4feb [WebAssembly] Make WebAssemblyCodeGen depend on WebAssemblyAsmPrinter
llvm-svn: 251859
2015-11-02 23:23:16 +00:00
Matt Arsenault d48da14269 AMDGPU: Error on graphics shaders with HSA
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.

llvm-svn: 251858
2015-11-02 23:23:02 +00:00
Matt Arsenault 0de924b76d AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE
Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.

llvm-svn: 251855
2015-11-02 23:15:42 +00:00
Bill Schmidt 8ed7cec170 [PPC64LE] Properly initialize instr-info in PPCVSXSwapRemoval pass
Replace some hacky code with the proper way to get at this data.

No functional change.

llvm-svn: 251848
2015-11-02 22:43:57 +00:00
Tim Northover 155103ec18 WatchOS: update default CPU for triple after t2dsp -> dsp rename
llvm-svn: 251814
2015-11-02 18:21:07 +00:00
Nemanja Ivanovic be5f0c04f1 Fix for bootstrap bug introduced in r244921
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.

llvm-svn: 251798
2015-11-02 14:01:11 +00:00
Igor Breger fa798a9dbb AVX512: Implemented encoding and intrinsics for VBROADCASTI32x2 and VBROADCASTF32x2 instructions.
Differential Revision: http://reviews.llvm.org/D14216

llvm-svn: 251781
2015-11-02 07:39:36 +00:00
Craig Topper 45e83b8ba7 [X86] Remove assertions that check for valid scale values on scatter/gather intrinsics. Nothing upstream prevented illegal values from getting here.
llvm-svn: 251780
2015-11-02 07:24:40 +00:00
Craig Topper e69eb78510 [X86] Fold 'if' followed by just an llvm_unreachable into an assert.
llvm-svn: 251778
2015-11-02 07:24:34 +00:00
Craig Topper aebab7c03f [X86] Use isa instead of dyn_cast in a bool context. NFC
llvm-svn: 251777
2015-11-02 07:24:32 +00:00
Craig Topper c70af642a2 [X86] Remove some llvm_unreachables after switches that already have an unreachable in their default case.
llvm-svn: 251776
2015-11-02 07:24:30 +00:00
Craig Topper d6a77ca4bb [X86] Remove a 'break' after an llvm_unreachable.
llvm-svn: 251775
2015-11-02 07:24:27 +00:00
Craig Topper d49a41793c [X86] Use cast instead of dyn_cast and a null check marked unreachable.
llvm-svn: 251774
2015-11-02 07:24:25 +00:00
Craig Topper 95ceb5a60a [X86] Use MVT instead of EVT when the type is known to be simple. NFC
llvm-svn: 251772
2015-11-02 05:24:22 +00:00
NAKAMURA Takumi 50df0c2037 Untabify.
llvm-svn: 251769
2015-11-02 01:38:12 +00:00
Elena Demikhovsky db738d9cc3 AVX-512: Optimized SIMD truncate operations for AVX512F set.
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.


Differential Revision: http://reviews.llvm.org/D14108

llvm-svn: 251764
2015-11-01 11:45:47 +00:00
Craig Topper ec2ea4817e [X86] Replace getScalarType with getVectorElementType when the type is already known to be a vector. This should result in slightly less code. NFC
llvm-svn: 251751
2015-10-31 21:44:52 +00:00
Craig Topper 476be8f94a [X86] Convert to MVT instead of calling EVT functions since we already know the type is simple. NFC
llvm-svn: 251745
2015-10-31 18:14:17 +00:00
Craig Topper 0fec4d8ce7 [X86] Call getScalarSizeInBits() instead of getScalarType().getScalarSizeInBits(). NFC
llvm-svn: 251744
2015-10-31 18:14:15 +00:00
Craig Topper 0e7680da9f [X86] Remove two const references to the return value of a constructor and just use normal object creation syntax. NFC
llvm-svn: 251743
2015-10-31 17:28:02 +00:00
Craig Topper 7b1d3a8a6c [X86] Replace EVT with MVT in some more places. NFC
llvm-svn: 251742
2015-10-31 17:27:59 +00:00
Craig Topper 63c2925b87 [X86] Fix indentation of case statements in switch. NFC
llvm-svn: 251741
2015-10-31 17:27:56 +00:00
Craig Topper 5c8a378f48 [X86] Reduce math for index calculation for inserting and extracting subvectors and elements by exploiting the fact that all supported vector types have a power 2 number of elements.
llvm-svn: 251740
2015-10-31 17:27:52 +00:00
JF Bastien 5789a69435 [WebAssembly] Fix import statement
Summary:
Imports should be generated like (param i32 f32...) not (param i32) (param f32) ...

Author: binji
Reviewers: jfb
Subscribers: jfb, dschuff
llvm-svn: 251714
2015-10-30 16:41:21 +00:00
Craig Topper 9377f01f21 [X86] Use is128BitVector/is256BitVector/is512BitVector in place of getSizeInBits == in some places. NFC
llvm-svn: 251687
2015-10-30 04:31:18 +00:00
Craig Topper 62c3ed0ae3 [X86] Minor formatting fixes. NFC.
llvm-svn: 251686
2015-10-30 04:31:14 +00:00
Craig Topper 9ef327c962 [X86] Use MVT instead of EVT in some places. NFC
Prior to this the compiled code probably had extra checks for extended types that won't ever execute.

llvm-svn: 251682
2015-10-30 03:19:12 +00:00
Simon Pilgrim ca56a72af9 [X86][SSE] Shuffle blends with zero
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.

Differential Revision: http://reviews.llvm.org/D14050

llvm-svn: 251659
2015-10-29 22:11:28 +00:00
Jonas Paulsson 45d5c673ec [SystemZ] Make the CCRegs regclass non-allocatable.
This was discovered to be necessary while running memchr-01.ll with
-verify-machinstrs, because it is not allowed to have a phys reg live
accross block boundaries while on SSA form, if the register is
allocatable (expect in entry block and landing pads).

In this test case, stringRRE pseudos are expanded after isel by adding
a loop block which produces a live out CC register. To make the test
pass, it was also necessary to not say that StringRRELoop pseudo uses
R0L, this is only true for the StringRRE opcode.

-verify-machineinstrs added to memchr-01.ll test.

New test case int-cmp-51.ll to test that MachineCSE can eliminate
an identical compare (which it couldn't do before).

Reviewed by Ulrich Weigand

llvm-svn: 251634
2015-10-29 16:13:55 +00:00
Marek Olsak 6f6d318e16 AMDGPU/SI: handle undef for llvm.SI.packf16
llvm-svn: 251632
2015-10-29 15:29:09 +00:00
Marek Olsak 74d084f466 AMDGPU/SI: use S_OR for fneg (fabs f32)
llvm-svn: 251631
2015-10-29 15:29:05 +00:00
Marek Olsak f924dd6f3c AMDGPU/SI: use S_AND for i1 trunc
llvm-svn: 251630
2015-10-29 15:05:03 +00:00
Zoran Jovanovic 796ed6d937 [mips] wrong opcode for ll/sc instructions on mipsr6 when -integrated-as is used
Summary:
This commit resolves wrong opcodes for ll and sc instructions for r6 architecutres, which were generated in method MipsTargetLowering::emitAtomicBinary.

Author: Jelena.Losic

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D13593

llvm-svn: 251629
2015-10-29 14:40:19 +00:00
Artyom Skrobov 0ff1ce4038 Recognize that ARM1176JZ[F]-S support TrustZone
Summary:
ARMv6KZ cores were set up incorrectly in ARM.td; also, the SMI mnemonic
(the old name for SMC, as defined in ARMv6KZ) wasn't supported.

Reviewers: jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14154

llvm-svn: 251627
2015-10-29 13:56:19 +00:00
Vasileios Kalintiris 2f412684a9 [mips] Check the register class before replacing materializations of zero with $zero in microMIPS.
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:

  [d]addiu, $dst, $zero, 0

with the $zero register, without checking for membership in the register
class of the target machine operand.

Reviewers: dsanders

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13984

llvm-svn: 251622
2015-10-29 10:17:16 +00:00
JF Bastien 7b452e2c63 [WebAssembly] Update opcode name format for conversions
Summary:
Conversion opcode name format should be f64.convert_u/i64 not f64_convert_u

Author: s3ththompson
Reviewers: jfb
Subscribers: sunfish, jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D14160

llvm-svn: 251613
2015-10-29 04:10:52 +00:00
Benjamin Kramer 4e4ca38bcf Remove CRLF line endings.
llvm-svn: 251594
2015-10-29 02:33:05 +00:00
Hal Finkel 7d0e34eb33 [PowerPC] Recurse through constants when looking for TLS globals
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.

Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.

llvm-svn: 251582
2015-10-28 23:43:00 +00:00
Hal Finkel bdd292ae22 [PowerPC] Don't return unsupported register classes for asm constraints
As a follow-up to r251566, do the same for the other optionally-supported
register classes (mostly for vector registers). Don't return an unavailable
register class (which would cause an assert later), but fail cleanly when
provided an unsupported inline asm constraint.

llvm-svn: 251575
2015-10-28 23:03:45 +00:00
Tim Northover f8e47e4868 ARM: add support for WatchOS's compact unwind information.
llvm-svn: 251573
2015-10-28 22:56:36 +00:00
Tim Northover 8b40366b54 ARM: teach backend about WatchOS and TvOS libcalls.
The most substantial changes are again for watchOS: libcalls are hard-float if
needed and sincos has a different calling convention.

llvm-svn: 251571
2015-10-28 22:51:16 +00:00
Tim Northover e0ccdc6de9 ARM: add backend support for the ABI used in WatchOS
At the LLVM level this ABI is essentially a minimal modification of AAPCS to
support 16-byte alignment for vector types and the stack.

llvm-svn: 251570
2015-10-28 22:46:43 +00:00
Tim Northover 2d4d161519 ARM: support .watchos_version_min and .tvos_version_min.
These MachO file directives are used by linkers and other tools to provide
compatibility information, much like the existing .ios_version_min and
.macosx_version_min.

llvm-svn: 251569
2015-10-28 22:36:05 +00:00
Hal Finkel 34d4149452 [PowerPC] Cleanly reject asm crbit constraint with -crbits
When crbits are disabled, cleanly reject the constraint (return the register
class only to cause an assert later).

llvm-svn: 251566
2015-10-28 22:25:52 +00:00
Cong Hou da4e8aeec6 [X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is simple before calling getSimpleVT().
llvm-svn: 251538
2015-10-28 18:15:46 +00:00
Artyom Skrobov b43981076a [ARM] Allow SP in rGPR, starting from ARMv8
Summary:
This patch handles assembly and disassembly, but not codegen, as of yet.

Additionally, it fixes a bug whereby SP and PC as shifted-reg operands
were treated as predictable in ARMv7 Thumb; and it enables the tests
for invalid and unpredictable instructions to run on both ARMv7 and ARMv8.

Reviewers: jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14141

llvm-svn: 251516
2015-10-28 13:58:36 +00:00
Benjamin Kramer 039b10423a Put global classes into the appropriate namespace.
Most of the cases belong into an anonymous namespace. No
functionality change intended.

llvm-svn: 251515
2015-10-28 13:54:36 +00:00
Hrvoje Varga 18148671ee [mips][microMIPS] Implement PAUSE, RDHWR, RDPGPR, SDBBP, SSNOP, SYNC, SYNCI and WAIT instructions
Differential Revision: http://reviews.llvm.org/D12628

llvm-svn: 251510
2015-10-28 11:04:29 +00:00
Craig Topper 93d4a9e117 [X86] Make some for loops over MVTs more explicit (and shorter) by just mentioning all the relevant types in an initializer list. NFC
llvm-svn: 251500
2015-10-28 05:48:32 +00:00
Craig Topper 3a47587c41 Use range-based for loops and use initializer list to remove a small static array. NFC
llvm-svn: 251494
2015-10-28 04:53:27 +00:00
Craig Topper 4b27576001 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

llvm-svn: 251490
2015-10-28 04:02:12 +00:00
Hal Finkel f4052340a4 [PowerPC] Replace cntlz[.] with cntlzw[.]
cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic.

This change fixes an issue when -no-integrated-as: The opcode cntlz is
unrecognized by gas

Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.]
This is done for because the POWER cntlz mnemonic has be used by LLVM for
a very long time. We need to make sure that assembly programs
that are using the cntlz[.] do not break with this change.

Change PowerPC tests to reflect the insn change from cntlz to cntlzw.
Add assembly test to verify cntlz[.] is encoded correctly.

Patch by Tom Rix!

llvm-svn: 251489
2015-10-28 03:26:45 +00:00
Jun Bum Lim c9879ecfbc [AArch64]Merge halfword loads into a 32-bit load
This recommits r250719, which caused a failure in SPEC2000.gcc
because of the incorrect insert point for the new wider load.

Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
  ldrh w0, [x2]
  ldrh w1, [x2, #2]
becomes
  ldr w0, [x2]
  ubfx w1, w0, #16, #16
  and  w0, w0, #ffff

llvm-svn: 251438
2015-10-27 19:16:03 +00:00
Cong Hou 07eeb8001e Create a new interface addSuccessorWithoutWeight(MBB*) in MBB to add successors when optimization is disabled.
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.

We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.

In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.

Differential revision: http://reviews.llvm.org/D13963

llvm-svn: 251429
2015-10-27 17:59:36 +00:00
Asaf Badouh c7cb880669 [X86][AVX512] [X86][AVX512] add convert float to half
convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem).

Differential Revision: http://reviews.llvm.org/D14113

llvm-svn: 251409
2015-10-27 15:37:17 +00:00
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
Michael Kuperstein e1194bdb4f [X86] Make elfiamcu an OS, not an environment.
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.

Differential Revision: http://reviews.llvm.org/D14081

llvm-svn: 251390
2015-10-27 07:23:59 +00:00
Craig Topper ee0c859788 Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

llvm-svn: 251382
2015-10-27 04:14:24 +00:00
Sanjay Patel 309c4f93e5 [x86] replace integer logic ops with packed SSE FP logic ops
If we have an operand to a bitwise logic op that's already in
an XMM register and the result is going to be sent to an XMM
register, then use an SSE logic op to avoid moves between the
integer and vector register files.

Related commits:
http://reviews.llvm.org/rL248395
http://reviews.llvm.org/rL248399
http://reviews.llvm.org/rL248404
http://reviews.llvm.org/rL248409
http://reviews.llvm.org/rL248415

This should solve PR22428:
https://llvm.org/bugs/show_bug.cgi?id=22428

llvm-svn: 251378
2015-10-27 01:28:07 +00:00
Daniel Sanders 5bf6eab6b8 [mips][ias] Fold needsExpansion() and expandInstruction() together. NFC.
Summary:
Previously we maintained two separate switch statements that had to be kept in
sync. This patch merges them into a single switch.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14012

llvm-svn: 251369
2015-10-26 23:50:00 +00:00
Sanjay Patel e9b500f722 reorganize logic; NFCI (retry r251349)
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

..and I really hope it's NFC this time!

llvm-svn: 251357
2015-10-26 21:54:14 +00:00
Tim Northover 939f089242 ARM: make sure VFP loads and stores are properly aligned.
Both VLDRS and VLDRD fault if the memory is not 4 byte aligned, which wasn't
really being checked before, leading to faults at runtime.

llvm-svn: 251352
2015-10-26 21:32:53 +00:00
Sanjay Patel f29fed423a revert r251349; it included code for a functional change
llvm-svn: 251350
2015-10-26 21:28:02 +00:00
Sanjay Patel fdf75452e4 reorganize logic; NFCI
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

llvm-svn: 251349
2015-10-26 21:24:09 +00:00
Peter Collingbourne 99fac80db2 ARM/ELF: Restore original (pre-r251322) logic for deciding whether to use GOT.
Unbreaks linking with gold, which cannot resolve direct relocations referring
to global symbols.

llvm-svn: 251342
2015-10-26 20:46:44 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Peter Collingbourne 97aae40880 ARM/ELF: Better codegen for global variable addresses.
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.

This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.

As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.

We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.

This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.

Differential Revision: http://reviews.llvm.org/D13650

llvm-svn: 251322
2015-10-26 18:23:16 +00:00
Jonas Paulsson 83553d0cac [SystemZ] LTGFR use regclass should be GR32, not GR64.
Discovered by testing int-cmp-44.ll with -verify-machineinstrs (added to
test run).

llvm-svn: 251299
2015-10-26 15:03:49 +00:00
Jonas Paulsson 7da3820882 [SystemZ] Also clear kill flag for index reg in splitMove().
Discovered by running fp-move-05.ll with -verify-machineinstrs (added
to test case run).

llvm-svn: 251298
2015-10-26 15:03:41 +00:00
Jonas Paulsson 9525b2c0c8 [SystemZ] Don't forget the CC def op on LTEBRCompare pseudos
Discovered by running fp-cmp-02.ll with -verify-machineinstrs (now added
to test run).

llvm-svn: 251297
2015-10-26 15:03:32 +00:00
Jonas Paulsson dab7407258 [SystemZ] Tie operands in SystemZShorteInst if MI becomes 2-address.
Discovered by testing fp-add-02.ll with -verify-machineinstrs.

Test case updated to always run with -verify-machineinstrs.

llvm-svn: 251296
2015-10-26 15:03:07 +00:00
Vasileios Kalintiris 165121f326 [mips] Check for the correct error message in tests for interrupt attributes.
Instead of XFAIL-ing the tests with the wrong usage of the "interrupt"
attribute, we should check that we emit the correct error messages to
the user.

llvm-svn: 251295
2015-10-26 14:24:30 +00:00
Igor Breger e4ddc3f4cd AVX512: Enabled VPBROADCASTB lowering for v64i8 vectors.
Differential Revision: http://reviews.llvm.org/D13896

llvm-svn: 251287
2015-10-26 13:01:02 +00:00
Vasileios Kalintiris 43dff0c033 [mips] Interrupt attribute support for mips32r2+.
Summary:
This patch adds support for using the "interrupt" attribute on Mips
for interrupt handling functions. At this time only mips32r2+ with the
o32 ABI with the static relocation model is supported. Unsupported
configurations will be rejected

Patch by Simon Dardis (+ clang-format & some trivial changes to follow the
LLVM coding standards by me).

Reviewers: mpf, dsanders

Subscribers: dsanders, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D10768

llvm-svn: 251286
2015-10-26 12:38:43 +00:00
Igor Breger 684af8156c AVX-512: Use correct extract vector length.
Bug https://llvm.org/bugs/show_bug.cgi?id=25318

Differential Revision: http://reviews.llvm.org/D14062

llvm-svn: 251285
2015-10-26 12:26:34 +00:00
James Molloy 72222f5dca [ARM] Handle the inline asm constraint type 'o'
This means "memory with offset" and requires very little plumbing to get working. This fixes PR25317.

llvm-svn: 251280
2015-10-26 10:04:52 +00:00
Benjamin Kramer 8604457f2e Drop code after unreachable. No functionality change.
llvm-svn: 251278
2015-10-26 09:55:45 +00:00
Igor Breger f8e461f920 AVX512: Add AVX-512 not materializable instructions.
Otherwise value can be reused , despite its value could be changed - produces incorrect assembler.

https://llvm.org/bugs/show_bug.cgi?id=25270

Differential Revision: http://reviews.llvm.org/D14057

llvm-svn: 251275
2015-10-26 08:37:12 +00:00
David Majnemer a375b26144 [MC] Don't crash when .word is given bogus values
We didn't validate that the .word directive was given a sane value,
leading to crashes when we attempt to write out the object file.

Instead, perform some validation and issue a diagnostic pointing at the
start of the diagnostic.

llvm-svn: 251270
2015-10-26 02:45:50 +00:00
Benjamin Kramer 8ceb323bb4 Convert assert(false) into llvm_unreachable where it makes sense.
llvm-svn: 251266
2015-10-25 22:28:27 +00:00
Simon Pilgrim ec6db262e0 [X86][SSE4A] Fix for EXTRQI shuffle lowering.
Incorrect range test - found during fuzz testing.

llvm-svn: 251245
2015-10-25 17:40:54 +00:00
Elena Demikhovsky 092858588a Scalarizer for masked.gather and masked.scatter intrinsics.
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.

Differential Revision: http://reviews.llvm.org/D13722

llvm-svn: 251237
2015-10-25 15:37:55 +00:00
Michael Kuperstein eaa16005af [X86] Use correct calling convention for MCU psABI libcalls
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.

This is a workaround for PR3997, which describes a similar issue for -mregparm.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251223
2015-10-25 08:14:05 +00:00
Michael Kuperstein fe897623f3 [X86] Add support for elfiamcu triple
This adds support for the i?86-*-elfiamcu triple, which indicates the IAMCU psABI is used.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251222
2015-10-25 08:07:37 +00:00
Craig Topper eda02a905e Remove two unnecessary conversions from MVT to EVT. NFC
llvm-svn: 251219
2015-10-25 03:15:29 +00:00
Craig Topper 7bf52c9d26 Use MVT::SimpleValueType instead of MVT in template parameter. NFC
llvm-svn: 251217
2015-10-25 00:27:14 +00:00
Simon Pilgrim 53c2bff5fe [X86][SSE] Use lowerVectorShuffleWithUNPCK instead of custom matches.
Most 128-bit and 256-bit shuffles were manually matching UNPCK patterns - use lowerVectorShuffleWithUNPCK to be more thorough.

llvm-svn: 251211
2015-10-24 22:45:04 +00:00
Simon Pilgrim fdfed5143c [X86][SSE] lowerVectorShuffleWithUNPCK - use equivalent shuffle mask test.
Use isShuffleEquivalent to match UNPCK shuffles - better support for build vector inputs.

llvm-svn: 251207
2015-10-24 20:48:08 +00:00
Craig Topper 272d6a57bb Call the version of ConvertCostTableLookup that takes a statically sized array rather than pointer and size. NFC
llvm-svn: 251196
2015-10-24 18:40:22 +00:00
Hans Wennborg 34d40434a7 X86ISelLowering: Support tail calls to/from callee pop functions
This enables tail calls with thiscall, stdcall, vectorcall and
fastcall functions.

Differential Revision: http://reviews.llvm.org/D13999

llvm-svn: 251190
2015-10-24 16:47:10 +00:00
Simon Pilgrim e379fe0ddb Fix unused variable warning. NFC.
llvm-svn: 251189
2015-10-24 13:41:45 +00:00
Simon Pilgrim d5ef318b5b [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Matt Arsenault 2ea0a23f18 AMDGPU: Print modifiers when dumping AMDGPUOperand
llvm-svn: 251160
2015-10-24 00:12:56 +00:00
Reid Kleckner f02e33ce42 [X86] Clean up the tail call eligibility logic
Summary:
The logic here isn't straightforward because our support for
TargetOptions::GuaranteedTailCallOpt.

Also fix a bug where we were allowing tail calls to cdecl functions from
fastcall and vectorcall functions. We were special casing thiscall and
stdcall callers rather than checking for any convention that requires
clearing stack arguments before returning.

Reviewers: hans

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14024

llvm-svn: 251137
2015-10-23 19:35:38 +00:00
Matt Arsenault 382557ec72 AMDGPU: Fix parsing of 32-bit literals with sign bit set
llvm-svn: 251132
2015-10-23 18:07:58 +00:00
Artyom Skrobov 5a6e39454e [ARM] Renaming +t2dsp feature into +dsp, as discussed on llvm-dev
llvm-svn: 251125
2015-10-23 17:19:19 +00:00
Oleg Ranevskyy 6389dd9fa2 [ARM CodeGen] @llvm.debugtrap call may be removed when restoring callee saved registers
Summary:
When ARMFrameLowering::emitPopInst generates a "pop" instruction to restore the callee saved registers, it checks if the LR register is among them. If so, the function may decide to remove the basic block's terminator and replace it with a "pop" to the PC register instead of LR.

This leads to a problem when the block's terminator is preceded by a "llvm.debugtrap" call. The MI iterator points to the trap in such a case, which is also a terminator. If the function decides to restore LR to PC, it erroneously removes the trap.

Reviewers: asl, rengolin

Subscribers: aemerson, jfb, rengolin, dschuff, llvm-commits

Differential Revision: http://reviews.llvm.org/D13672

llvm-svn: 251123
2015-10-23 17:17:59 +00:00
Oleg Ranevskyy 5f78c5c293 Test commit: fix typo in comment.
llvm-svn: 251122
2015-10-23 17:10:44 +00:00
Joseph Tremoulet 3d0fbf1d74 [CodeGen] Mark setjmp/catchret MBBs address-taken
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.

Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks.  I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.

This fixes PR25168.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13774

llvm-svn: 251113
2015-10-23 15:06:05 +00:00
James Molloy 5b18b4ce96 Revert "[AArch64]Merge halfword loads into a 32-bit load"
This reverts commit r250719. This introduced a codegen fault in SPEC2000.gcc, when compiled for Cortex-A53.

llvm-svn: 251108
2015-10-23 10:41:38 +00:00
Zlatko Buljan 2cf61020b8 [mips][microMIPS] Implement SHLL.PH, SHLL_S.PH, SHLL.QB, SHLLV.PH, SHLLV_S.PH, SHLLV.QB, SHLLV_S.W, SHLL_S.W, SHRA.QB and SHRA_R.QB instructions
Differential Revision: http://reviews.llvm.org/D13929

llvm-svn: 251098
2015-10-23 06:39:29 +00:00
Matthias Braun d276de6db1 AArch64: Disable the latency heuristic
It turned out not to improve any of our benchmarks but occasionally led
to increased register pressure and spilling.

Only enabling for the Cyclone CPU as the results on the cortex CPUs
give mixed results.

Differential Revision: http://reviews.llvm.org/D13708

llvm-svn: 251038
2015-10-22 18:07:38 +00:00
Eric Christopher 227d71bba6 Remove the last traces of X86CompilationCallback as it is completely
unused.

llvm-svn: 251035
2015-10-22 17:55:35 +00:00
Craig Topper 8fe40e0ed5 Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC
llvm-svn: 251032
2015-10-22 17:05:00 +00:00
Bill Schmidt de1dc9c98f [PPC] Fix PR24686 by failing assembly for an invalid relocation
PR24686 identifies a problem where a relocation expression is invalid
when not all of the symbols in the expression can be locally
resolved.  This causes the compiler to request a PC-relative half16ds
relocation, which is nonsensical for PowerPC.  This patch recognizes
this situation and ensures we fail the assembly cleanly.

Test case provided by Anton Blanchard.

llvm-svn: 251027
2015-10-22 15:53:44 +00:00
Asaf Badouh 7c52245660 [X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versions
Differential Revision: http://reviews.llvm.org/D13945

llvm-svn: 251018
2015-10-22 14:01:16 +00:00
Elena Demikhovsky 5c97dfdc9c AVX-512: Fixed a bug in select_cc for i1 type
Fixed faiure:
LLVM ERROR: Cannot select: t33: i1 = select_cc t25, Constant:i32<0>, t45, t42, seteq:ch

added a test

Differential Revision: http://reviews.llvm.org/D13943

llvm-svn: 250996
2015-10-22 07:10:29 +00:00
Elena Demikhovsky 7ad0d563a5 Partially reverted changes from r250686
Clang runtime failure was reported.
   Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT
I'll need to add a proper handling for PointerType in masked load/store intrinsics.

llvm-svn: 250995
2015-10-22 06:20:29 +00:00
JF Bastien f2364bf129 WebAssembly: fix more syntax
br_if shouldn't start with a dot.
div and rem went from prefix u/s to suffix.

llvm-svn: 250972
2015-10-22 02:32:50 +00:00
Pete Cooper b70b956c80 Add missing load/store flags to thumb2 instructions.
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs.  Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.

While looking at this code, there was a stale comment that these
instructions were only used for disassembly.  This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.

This reapplies r242300 which was reverted in r242428 due to bot failures.

Ultimately those failures were spurious and completely unrelated to this commit.  I reverted this
at the time because it was thought to be at fault.

llvm-svn: 250969
2015-10-22 01:48:57 +00:00
Matt Arsenault 391be09ef3 AMDGPU: Fix adding redundant m0 uses
BuildMI already adds these since they are defined correctly now.

llvm-svn: 250961
2015-10-21 22:37:51 +00:00
Matt Arsenault e8c0891e42 AMDGPU: Fix verifier error in SIFoldOperands
There may be other use operands that also need their kill flags cleared.

This happens in a few tests when SIFoldOperands is moved after
PeepholeOptimizer.

PeepholeOptimizer rewrites cases that look like:
%vreg0 = ...
%vreg1 = COPY %vreg0
use %vreg1<kill>
%vreg2 = COPY %vreg0
use %vreg2<kill>

to use the earlier source to
%vreg0 = ...
use %vreg0
use %vreg0

Currently SIFoldOperands sees the copied registers, so there is
only one use. So far I haven't managed to come up with a test
that currently has multiple uses of a foldable VGPR -> VGPR copy.

llvm-svn: 250960
2015-10-21 22:37:50 +00:00
Matt Arsenault b6fd98c7d9 AMDGPU: Split DiagnosticInfoUnsupported into its own file
llvm-svn: 250959
2015-10-21 22:37:46 +00:00
Matt Arsenault 6005fcbe12 AMDGPU: Simplify VOP3 operand legalization.
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.

We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.

For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).

The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand.  However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.

llvm-svn: 250951
2015-10-21 21:51:02 +00:00
Matt Arsenault e223cebd10 AMDGPU: Fix not checking implicit operands in verifyInstruction
When verifying constant bus restrictions, this wasn't catching
uses in implicit operands.

llvm-svn: 250948
2015-10-21 21:15:01 +00:00
Joerg Sonnenberger 7212809abc Drop assert that a call with struct return goes to a function with sret
attribute. Clang incorrectly misses it on __muldc3 and friends and the
type system doesn't include it properly either.

llvm-svn: 250938
2015-10-21 20:05:01 +00:00
Sanjay Patel efab8b0d08 [x86] move recursive add match for LEA to helper function; NFCI
llvm-svn: 250926
2015-10-21 18:56:06 +00:00
Craig Topper 896c267544 [X86] Add AMD mwaitx, monitorx, and clzero instructions to the assembly parser and disassembler.
llvm-svn: 250911
2015-10-21 17:26:45 +00:00
Daniel Sanders d6cf3e05ef [mips][mips16] Re-work the inline assembly stubs to work with IAS. NFC.
Summary:
Previously, we were inserting an InlineAsm statement for each line of the
inline assembly. This works for GAS but it triggers prologue/epilogue
emission when IAS is in use. This caused:
    .set noreorder
    .cpload $25
to be emitted as:
    .set push
    .set reorder
    .set noreorder
    .set pop
    .set push
    .set reorder
    .cpload $25
    .set pop
which led to assembler errors and caused the test to fail.

The whitespace-after-comma changes included in this patch are necessary to
match the output when IAS is in use.

Reviewers: vkalintiris

Subscribers: rkotler, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13653

llvm-svn: 250895
2015-10-21 12:44:14 +00:00
Daniel Sanders 0f596814e9 [mips][msa] Remove copy_u.d and move copy_u.w to MSA64.
Summary:
The forwards compatibility strategy employed by MIPS is to consider registers
to be infinitely sign-extended. Then on ISA's with a wider register, the result
of existing instructions are sign-extended to register width and zero-extended
counterparts are added. copy_u.w on MSA32 and copy_u.w on MSA64 violate this
strategy and we have therefore corrected the MSA specs to fix this.

We still keep track of sign/zero-extension during legalization but we now
match copy_s.[wd] where required.

No change required to clang since __builtin_msa_copy_u_[wd] will map to
copy_s.[wd] where appropriate for the target.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13472

llvm-svn: 250887
2015-10-21 09:58:54 +00:00
Mehdi Amini 4215236621 Do not use `dyn_cast<X>` after `isa<X>` (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 250883
2015-10-21 06:11:01 +00:00
JF Bastien 1a59c6b2c9 WebAssembly: support imports
C/C++ code can declare an extern function, which will show up as an import in WebAssembly's output. It's expected that the linker will resolve these, and mark unresolved imports as call_import (I have a patch which does this in wasmate).

llvm-svn: 250875
2015-10-21 02:23:09 +00:00
Krzysztof Parzyszek ced9941cd4 [Hexagon] Bit-based instruction simplification
Analyze bit patterns of operands and values of instructions to perform
various simplifications, dead/redundant code elimination, etc.

llvm-svn: 250868
2015-10-20 22:57:13 +00:00
Krzysztof Parzyszek 26b2c9080f [Hexagon] Fix isNVStorable flag in .td files
An upper half and a double word cannot be used as value sources in a
new-value store.

llvm-svn: 250867
2015-10-20 22:40:57 +00:00
Krzysztof Parzyszek 79512b88b0 [Hexagon] Capture aggregate variables by reference, not value
llvm-svn: 250851
2015-10-20 19:33:46 +00:00
Krzysztof Parzyszek e4cff4058c [Hexagon] Do not fall-through if there is no CFG edge
llvm-svn: 250850
2015-10-20 19:30:21 +00:00
Krzysztof Parzyszek bfe8e92fd1 [Hexagon] Use symbolic name for subregister instead of hardcoded number
llvm-svn: 250849
2015-10-20 19:26:36 +00:00
Krzysztof Parzyszek 0257905f27 [Hexagon] Change Based->Base in getBasedWithImmOffset
llvm-svn: 250848
2015-10-20 19:21:05 +00:00
Krzysztof Parzyszek 05da79d5ac [Hexagon] Remove the remnants of isConstExtProfitable
llvm-svn: 250845
2015-10-20 19:04:53 +00:00
Jonas Paulsson 4b29f6f7f7 [SystemZ] Use LivePhysRegs helper class in SystemZShortenInst.cpp.
Don't use home brewed liveness tracking code for phys regs, since
this class does the job.

Reviewed by Ulrich Weigand.

llvm-svn: 250829
2015-10-20 15:05:58 +00:00
Artyom Skrobov 7fd67e25aa Adding support for TargetLoweringBase::LibCall
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:

- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
  computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall

This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.

The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.

The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.

Reviewers: t.p.northover, rengolin

Subscribers: t.p.northover, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13862

llvm-svn: 250826
2015-10-20 13:14:52 +00:00
Igor Breger 21296d230a AVX512: Implemented encoding and intrinsics for VPBROADCASTB/W/D/Q instructions.
Differential Revision: http://reviews.llvm.org/D13884

llvm-svn: 250819
2015-10-20 11:56:42 +00:00
Matt Arsenault 3add6439d0 AMDGPU: Add MachineInstr overloads for instruction format tests
llvm-svn: 250797
2015-10-20 04:35:43 +00:00
Matt Arsenault 8f18917a90 AMDGPU: Stop reserving v[254:255]
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.

This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.

llvm-svn: 250794
2015-10-20 03:59:58 +00:00
JF Bastien c8f89e86d5 WebAssembly: fix call/return syntax.
They are now typeless, unlike other operations.

llvm-svn: 250793
2015-10-20 01:26:54 +00:00
Duncan P. N. Exon Smith c4829deae8 MSP430: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250792
2015-10-20 01:18:39 +00:00
Duncan P. N. Exon Smith a2c90e4743 SystemZ: Remove implicit ilist iterator conversion, NFC
llvm-svn: 250790
2015-10-20 01:12:46 +00:00
Duncan P. N. Exon Smith 0ce253d3a9 XCore: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250788
2015-10-20 01:07:42 +00:00
Duncan P. N. Exon Smith ac65b4c422 PowerPC: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250787
2015-10-20 01:07:37 +00:00
Duncan P. N. Exon Smith c3f7988472 Sparc: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250781
2015-10-20 00:59:43 +00:00
Duncan P. N. Exon Smith 61149b86c3 NVPTX: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250779
2015-10-20 00:54:09 +00:00
Duncan P. N. Exon Smith a72c6e25ec Hexagon: Remove implicit ilist iterator conversions, NFC
There are two things out of the ordinary in this commit.  First, I made
a loop obviously "infinite" in HexagonInstrInfo.cpp.  After checking if
an instruction was at the beginning of a basic block (in which case,
`break`), the loop decremented and checked the iterator for `nullptr` as
the loop condition.  This has never been possible (the prev pointers are
always been circular, so even with the weird ilist/iplist
implementation, this isn't been possible), so I removed the condition.

Second, in HexagonAsmPrinter.cpp there was another case of comparing a
`MachineBasicBlock::instr_iterator` against `MachineBasicBlock::end()`
(which returns `MachineBasicBlock::iterator`).  While not incorrect,
it's fragile.  I switched this to `::instr_end()`.

All that said, no functionality change intended here.

llvm-svn: 250778
2015-10-20 00:46:39 +00:00
JF Bastien 3b0177c542 WebAssembly: fix syntax for br_if.
llvm-svn: 250777
2015-10-20 00:37:42 +00:00
Duncan P. N. Exon Smith 7869148c47 Mips: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250769
2015-10-20 00:15:20 +00:00
Duncan P. N. Exon Smith 19d951874a CppBackend: Remove implicit ilist iterator conversions, NFC
Mostly just converted to range-based for loops.  May have converted a
couple of extra loops as a drive-by (not sure).

llvm-svn: 250766
2015-10-20 00:06:41 +00:00
Duncan P. N. Exon Smith d95fa4ccf1 BPF: Remove implicit ilist iterator conversion, NFC
llvm-svn: 250765
2015-10-20 00:02:50 +00:00
Duncan P. N. Exon Smith 9f9559e807 ARM: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250759
2015-10-19 23:25:57 +00:00
Duncan P. N. Exon Smith d77de6495e X86: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250741
2015-10-19 21:48:29 +00:00
Krzysztof Parzyszek 055c5fd74e [Hexagon] Remove unnecessary argument sign extends
llvm-svn: 250724
2015-10-19 19:10:48 +00:00
Benjamin Kramer 755e502952 Add missing override noticed by Clang's -Winconsistent-missing-override.
llvm-svn: 250720
2015-10-19 18:41:23 +00:00
Jun Bum Lim d3548303ec [AArch64]Merge halfword loads into a 32-bit load
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
  ldrh w0, [x2]
  ldrh w1, [x2, #2]
becomes
  ldr w0, [x2]
  ubfx w1, w0, #16, #16
  and  w0, w0, #ffff

llvm-svn: 250719
2015-10-19 18:34:53 +00:00
Krzysztof Parzyszek 23920ec95d [Hexagon] Fix debug information for local objects
- Isolate the check for the existence of a stack frame into hasFP.
- Implement getFrameIndexReference for DWARF address computation.
- Use getFrameIndexReference for offset computation in eliminateFrameIndex.
- Preserve debug information for dynamically allocated stack objects.
- Prefer FP to access local objects at -O0.
- Add experimental code to skip allocframe when not strictly necessary
  (disabled by default).

llvm-svn: 250718
2015-10-19 18:30:27 +00:00
Krzysztof Parzyszek db8677067c [Hexagon] Delay emission of CFI instructions
Emit the CFI instructions after all code transformation have been done.
This will avoid any interference between CFI instructions and packetization.

llvm-svn: 250714
2015-10-19 17:46:01 +00:00
Benjamin Kramer 335332329b Remove CRLF newlines. NFC.
llvm-svn: 250698
2015-10-19 13:05:25 +00:00
Asiri Rathnayake 1040a53be3 Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions
The mapping of these two intrinsics in ARMInstrInfo.td had a small
omission which lead to their operands not being validated/transformed
before being lowered into usat and ssat instructions. This can cause
incorrect instructions to be emitted.

I've also added tests for the remaining two saturating arithmatic
intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing
codegen tests.

llvm-svn: 250697
2015-10-19 11:44:24 +00:00
Elena Demikhovsky 20662e39f1 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850

llvm-svn: 250686
2015-10-19 07:43:38 +00:00
Zlatko Buljan 5292083584 [mips][microMIPS] Implement ADDQ.PH, ADDQ_S.W, ADDQH.PH, ADDQH.W, ADDSC, ADDU.PH, ADDU_S.QB, ADDWC and ADDUH.QB instructions
Differential Revision: http://reviews.llvm.org/D13130

llvm-svn: 250685
2015-10-19 07:16:26 +00:00
Zlatko Buljan d0a7d6e4ee [mips][microMIPS] Implement ABSQ.QB, ABSQ_S.PH, ABSQ_S.W, ABSQ_S.QB, INSV, MADD, MADDU, MSUB, MSUBU, MULT and MULTU instructions
Differential Revision: http://reviews.llvm.org/D13721

llvm-svn: 250683
2015-10-19 06:34:44 +00:00
Asaf Badouh 696e8e0bb7 [X86][AVX512DQ] add scalar fpclass
Differential Revision: http://reviews.llvm.org/D13769

llvm-svn: 250650
2015-10-18 11:04:38 +00:00
Igor Breger cbb9550537 AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction
Differential Revision: http://reviews.llvm.org/D13632

llvm-svn: 250649
2015-10-18 09:56:39 +00:00
Craig Topper 92cfdd70f8 [Sparc] Use MCPhysReg instead of unsigned to size static arrays of registers. Should reduce the table size.
llvm-svn: 250644
2015-10-18 05:29:05 +00:00
Craig Topper 2626094fa1 Make a bunch of static arrays const.
llvm-svn: 250642
2015-10-18 05:15:34 +00:00
Craig Topper ec15ea12e7 Use std::find instead of manual loop.
llvm-svn: 250624
2015-10-17 21:32:28 +00:00
Craig Topper a833451173 Use std::is_sorted to replace a custom version. Also replace a comparison predicate struct with a lambda.
llvm-svn: 250623
2015-10-17 21:32:26 +00:00
Simon Pilgrim 86c5e85e84 [X86][XOP] Add VPROT instruction opcodes
Added X86ISD opcodes for VPROT vector rotate by variable and by immediate.

llvm-svn: 250620
2015-10-17 19:04:24 +00:00
Craig Topper a2d0635098 Remove unnecessary 'const' pointed out by David Blaikie.
llvm-svn: 250619
2015-10-17 18:22:46 +00:00
Craig Topper 9ff9bf4959 Replace a custom table sort check with std::is_sorted. Change a function to take ArrayRef instead of pointer and length. NFC
llvm-svn: 250615
2015-10-17 16:37:13 +00:00
Craig Topper c177d9edb3 Use std::begin/end and std::is_sorted to simplify some code. NFC
llvm-svn: 250614
2015-10-17 16:37:11 +00:00
Simon Pilgrim a18ae9bd70 [CostModel] Fixed AVX integer shift costs
Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts.

llvm-svn: 250611
2015-10-17 13:23:38 +00:00
Simon Pilgrim 5b65f28fe7 [X86][FastISel] Teach how to select SSE4A nontemporal stores.
Add FastISel support for SSE4A scalar float / double non-temporal stores

Follow up to D13698

Differential Revision: http://reviews.llvm.org/D13773

llvm-svn: 250610
2015-10-17 13:04:42 +00:00
Colin LeMahieu 7c9587136d [Hexagon] Adding skeleton of HVX extension instructions.
llvm-svn: 250600
2015-10-17 01:33:04 +00:00
JF Bastien 3428ed4f53 WebAssembly: don't omit dead vregs from locals
Summary:
This is a temporary hack until we get around to remapping the vreg
numbers to local numbers. Dead vregs cause bad numbering and make
consumers sad.

We could also just look at debug info an use named locals instead, but
vregs have to work properly anyways so there!

Reviewers: binji, sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D13839

llvm-svn: 250594
2015-10-17 00:25:38 +00:00
JF Bastien 4f43e80ece WebAssembly: fix the syntax for comparisons
Summary: It has also slightly changed.

Reviewers: binji

Subscribers: jfb, dschuff, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D13837

llvm-svn: 250591
2015-10-17 00:12:29 +00:00
Reid Kleckner 28e490342b [WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.

The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.

llvm-svn: 250583
2015-10-16 23:43:27 +00:00
Sanjay Patel bbd524496c [x86] promote 'add nsw' to a wider type to allow more combines
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, 
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: 
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm 
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.

A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.

Differential Revision: http://reviews.llvm.org/D13757

llvm-svn: 250560
2015-10-16 22:14:12 +00:00
Andrew Kaylor 09b39acc03 Fix assertion failure with fp128 to unsigned i64 conversion
Patch by Mitch Bodart

Differential Revision: http://reviews.llvm.org/D13780

llvm-svn: 250550
2015-10-16 20:39:20 +00:00
Krzysztof Parzyszek a7c5f0409c [Hexagon] Split double registers
llvm-svn: 250549
2015-10-16 20:38:54 +00:00
Krzysztof Parzyszek aec39c68ae [Hexagon] Delete lib/Target/Hexagon/HexagonRemoveSZExtArgs.cpp
llvm-svn: 250543
2015-10-16 19:51:53 +00:00
Krzysztof Parzyszek 5b7dd0cdf9 [Hexagon] Merge adjacent stores
llvm-svn: 250542
2015-10-16 19:43:56 +00:00
JF Bastien 6126d2b883 WebAssembly: fix load/store syntax
Summary: The syntax has changed a bit recently.

Reviewers: binji

Subscribers: llvm-commits, jfb, sunfish, dschuff

Differential Revision: http://reviews.llvm.org/D13821

llvm-svn: 250535
2015-10-16 18:24:42 +00:00
JF Bastien 53bd975033 WebAssembly: relooper analysis pass
Summary: Make the relooper an analysis pass, to convert CFG to AST.

Reviewers: sunfish

Subscribers: jfb, dschuff

Differential Revision: http://reviews.llvm.org/D12744

llvm-svn: 250524
2015-10-16 16:35:49 +00:00
Charlie Turner 434d4599d4 [AArch64] Implement vector splitting on UADDV.
Summary: Fixes PR25056.

Reviewers: mcrosier, junbuml, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13466

llvm-svn: 250520
2015-10-16 15:38:25 +00:00
Hrvoje Varga 3c88fbd367 [mips][microMIPS] Implement LB, LBE, LBU and LBUE instructions
Differential Revision: http://reviews.llvm.org/D11633

llvm-svn: 250511
2015-10-16 12:24:58 +00:00
Craig Topper 09b6598572 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
JF Bastien 1d20a5e9e8 WebAssembly: update syntax
Summary:
Follow the same syntax as for the spec repo. Both have evolved slightly
independently and need to converge again.

This, along with wasmate changes, allows me to do the following:

  echo "int add(int a, int b) { return a + b; }" > add.c
  ./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack
  ./experimental/prototype-wasmate/wasmate.py add.wack > add.wast
  ./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm
  ./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));"

As you'd expect, the d8 shell prints out the right value.

Reviewers: sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D13712

llvm-svn: 250480
2015-10-16 00:53:49 +00:00
Evgeniy Stepanov 9addbc9fc1 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov 142947e9f0 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
JF Bastien 2cdd5e4710 x86: preserve flags when folding atomic operations
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.

This patch adds the missing EFLAGS definition.

llvm-svn: 250438
2015-10-15 18:24:52 +00:00
JF Bastien 5b327712b0 x86 FP atomic codegen: don't drop globals, stack
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.

Don't let it forget about the displacement.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25171

A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25144

Reviewers: pete

Subscribers: llvm-commits, majnemer, rsmith

Differential Revision: http://reviews.llvm.org/D13749

llvm-svn: 250429
2015-10-15 16:46:29 +00:00
Daniel Sanders 6394ee598e [mips][ias] Implement ulh macro.
Summary:
This macro is needed to prevent test/CodeGen/Mips/2008-08-01-AsmInline.ll from
failing after the integrated assembler is enabled by default.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13654

llvm-svn: 250414
2015-10-15 14:52:58 +00:00
Benjamin Kramer c5275bdec1 [NVPTX] Remove dead code.
I left helpers that look useful for debugging alone. NFC.

llvm-svn: 250410
2015-10-15 14:45:41 +00:00
Daniel Sanders 8008de5551 [mips][mips16] MIPS16 is not a CPU/Architecture but is an ASE.
Summary:
The -mcpu=mips16 option caused the Integrated Assembler to crash because
it couldn't figure out the architecture revision number to write to the
.MIPS.abiflags section. This CPU definition has been removed because, like
microMIPS, MIPS16 is an ASE to a base architecture.

Reviewers: vkalintiris

Subscribers: rkotler, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13656

llvm-svn: 250407
2015-10-15 14:34:23 +00:00
Benjamin Kramer 5dfcda73d5 [X86] Rip out orphaned method declarations and other dead code. NFC.
llvm-svn: 250406
2015-10-15 14:09:59 +00:00
Igor Breger d7bae451de AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648

llvm-svn: 250400
2015-10-15 13:29:07 +00:00
Igor Breger b4bb190eed AVX512: Implemented encoding and intrinsics for vpternlogd/q.
Differential Revision: http://reviews.llvm.org/D13768

llvm-svn: 250396
2015-10-15 12:33:24 +00:00
Elena Demikhovsky ecff21b297 AVX-512: Fixed a bug in shuffle lowering 32-bit mode
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.

Differential Revision: http://reviews.llvm.org/D13644

llvm-svn: 250390
2015-10-15 11:35:33 +00:00
Artyom Skrobov 63471330d2 Don't pretend AMDGPU backend knows how to custom-lower UDIVREM for vector types; it can't
Reviewers: arsenm, jvesely, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13734

llvm-svn: 250384
2015-10-15 09:18:47 +00:00
Zlatko Buljan 54b1eb4c73 [mips][microMIPS] Implement DPA.W.PH, DPAQ_S.W.PH, DPAQ_SA.L.W, DPAQX_S.W.PH, DPAQX_SA.W.PH, DPAU.H.QBL, DPAU.H.QBR and DPAX.W.PH instructions
Differential Revision: http://reviews.llvm.org/D13376

llvm-svn: 250382
2015-10-15 08:59:45 +00:00
Hrvoje Varga 3a3c4b8a39 [mips][microMIPS] Implement BREAK16, LI16, MOVE16, SDBBP16, SUBU16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D11292#inline-103143

llvm-svn: 250381
2015-10-15 08:39:07 +00:00
Hrvoje Varga 3ef4dd7bc8 [mips][microMIPS] Implement LLE and SCE instructions
Differential Revision: http://reviews.llvm.org/D11630

llvm-svn: 250379
2015-10-15 08:11:50 +00:00
Hrvoje Varga a766eff5a0 [mips][microMIPS] Implement LWLE, LWRE, SWLE and SWRE instructions
Differential Revision: http://reviews.llvm.org/D11631

llvm-svn: 250377
2015-10-15 07:23:06 +00:00
Hrvoje Varga 8c9526400e Test commit.
llvm-svn: 250367
2015-10-15 05:20:51 +00:00
Craig Topper fd2cc7cd8a Add XSAVE/XSAVEOPT to KNL processor.
llvm-svn: 250362
2015-10-15 03:56:54 +00:00
Quentin Colombet 5084e44d71 [ARM] Make sure we do not dereference the end iterator when accessing debug
information.
Although the problem was always here, it would only be exposed when
shrink-wrapping is enable.

rdar://problem/23110493

llvm-svn: 250352
2015-10-15 00:41:26 +00:00
Bill Schmidt 048cc97fb1 [PowerPC] Fix invalid lxvdsx optimization (PR25157)
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction.  That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not.  This corrects that problem.

Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.

llvm-svn: 250324
2015-10-14 20:45:00 +00:00
Andrea Di Biagio c47edbef4c [x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.

On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.

Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.

With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.

Differential Revision: http://reviews.llvm.org/D13698

llvm-svn: 250285
2015-10-14 10:03:13 +00:00
Craig Topper 0ee356951a [X86] Add XSAVE feature flags to their various processors.
llvm-svn: 250268
2015-10-14 05:37:38 +00:00
Dan Gohman ac93f649fa [WebAssembly] Remove a TODO comment which is no longer needed. NFC.
llvm-svn: 250233
2015-10-13 22:06:40 +00:00
Duncan P. N. Exon Smith a73371a9b7 AMDGPU: Remove implicit ilist iterator conversions, NFC
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one.  Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.

I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators).  Otherwise I would have just
added the conversion.

Even with that, no there should be functionality change here.

llvm-svn: 250218
2015-10-13 20:07:10 +00:00
Duncan P. N. Exon Smith d3b9df02b3 AArch64: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250216
2015-10-13 20:02:15 +00:00
Akira Hatanaka 5a4e4f8d8a [AArch64] Check the size of the vector before accessing its elements.
This fixes an assert in AArch64AsmParser::MatchAndEmitInstruction.

rdar://problem/23081753

llvm-svn: 250207
2015-10-13 18:55:34 +00:00
Sanjay Patel 85030aa1bd function names should start with a lower case letter; NFC
llvm-svn: 250174
2015-10-13 16:23:00 +00:00
Sanjay Patel b5723d0dbd don't repeat function/class/variable names in comments; NFC
llvm-svn: 250162
2015-10-13 15:12:27 +00:00
Christof Douma f0765c4f5b Test commit
llvm-svn: 250154
2015-10-13 09:38:21 +00:00
Michael Kuperstein af22dafc8b Fix line-ending issue. NFC.
llvm-svn: 250151
2015-10-13 06:22:30 +00:00
Craig Topper 24b56a62bb [X86] Mark the AAD and AAM aliases as not valid in 64-bit mode.
llvm-svn: 250148
2015-10-13 05:12:07 +00:00
Craig Topper 4f76372afc [X86] Change all the i8imm operands in XOP instructions to u8imm so the parser will check the size.
llvm-svn: 250147
2015-10-13 05:06:25 +00:00
JF Bastien 986ed68eed x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135
2015-10-13 00:28:47 +00:00
Matt Arsenault f0d9e47da2 AMDGPU: Refactor isVGPRToSGPRCopy
It should now correctly handle physical registers and make
it easier to identify the other direction.

llvm-svn: 250132
2015-10-13 00:07:54 +00:00
Matt Arsenault 61dc235f20 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Reid Kleckner 4a5f35c0ae Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088
2015-10-12 19:43:34 +00:00
Andrea Di Biagio b0fe4eb199 [x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085
2015-10-12 19:22:30 +00:00
Sanjay Patel 0dc91b3143 combine predicates; NFCI
llvm-svn: 250075
2015-10-12 18:15:08 +00:00
Matt Arsenault 8c0ef8b36d AMDGPU: Register some more passes so -print-before works
llvm-svn: 250071
2015-10-12 17:43:59 +00:00
Sanjay Patel b814ef1ad6 fix typos; NFC
llvm-svn: 250059
2015-10-12 16:09:59 +00:00
Zoran Jovanovic 2e386d3d07 [mips][micromips] Initial support for micrmomips DSP instructions and addu.qb implementation
Differential Revision: http://reviews.llvm.org/D12798

llvm-svn: 250058
2015-10-12 16:07:25 +00:00
Vasileios Kalintiris 2a95f82859 [mips][FastISel] Clang-format switch statement. NFC.
llvm-svn: 250053
2015-10-12 15:39:41 +00:00
Sanjay Patel 53d1d8b731 fix capitalization; NFC
llvm-svn: 250049
2015-10-12 15:24:01 +00:00
Daniel Sanders b1ef88c172 [mips][ias] Implement macro expansion when bcc has an immediate where a register belongs.
Summary: Fixes PR24915.

Reviewers: vkalintiris

Subscribers: emaste, seanbruno, llvm-commits

Differential Revision: http://reviews.llvm.org/D13533

llvm-svn: 250042
2015-10-12 14:24:05 +00:00
Daniel Sanders 2a5ce1ace0 [mips] Clean up most macro expansions to use the emit*() functions.
Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13591

llvm-svn: 250040
2015-10-12 14:09:12 +00:00
Daniel Sanders 2fb8564d99 [mips] Handle undef when extracting subregs from FP64 registers.
Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.

Reviewers: vkalintiris

Subscribers: llvm-commits, zoran.jovanovic, petarj

Differential Revision: http://reviews.llvm.org/D13467

llvm-svn: 250039
2015-10-12 13:55:44 +00:00
James Molloy fa4e994a7a [ARM] Mark Swift MISched model as incomplete
The Swift Machine Scheduler Model is incomplete. There are instructions
missing which can trigger the "incomplete machine model" abort. This was
observed when a downstream SchedMachineModel was added to the ARM
target.

Patch by Christof Douma!

llvm-svn: 250033
2015-10-12 12:49:59 +00:00
Amjad Aboud 1db6d7af46 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Andrea Di Biagio a0922ed8fe [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.

We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.

With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.

Differential Revision: http://reviews.llvm.org/D13364

llvm-svn: 250027
2015-10-12 11:25:41 +00:00
Zlatko Buljan d76b666a06 Test commit
llvm-svn: 250026
2015-10-12 11:19:40 +00:00
Craig Topper 8d2e6bc25b [X86] Use u8imm for the immediate type for all shift and rotate instructions. This way the assembler will perform range checking. Believe this matches gas behavior.
llvm-svn: 250016
2015-10-12 06:23:10 +00:00
Craig Topper d6b661dbf0 [X86] Add support to assembler and MCInst lowering to use the other vmovq %xmmX, %xmmX encoding if it would be a shorter VEX encoding.
llvm-svn: 250014
2015-10-12 04:57:59 +00:00
Craig Topper 635e05df0a [X86] Cleanup formatting a bit. NFC
llvm-svn: 250013
2015-10-12 04:27:17 +00:00
Craig Topper 5be914eda1 [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly parser will check the size.
llvm-svn: 250012
2015-10-12 04:17:55 +00:00
Craig Topper 95fffba227 [X86] Add some instruction aliases to get the assembly parser table to favor arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax.
This allows us to remove the explicit code for working around the existing priority

llvm-svn: 250011
2015-10-12 03:39:57 +00:00
Craig Topper fcc34bdee0 [X86] Fix CMP and TEST with al/ax/eax/rax to not mark EFLAGS as a use or al/ax/eax/rax as a def. Probably doesn't have a functional affect since these aren't used in isel.
llvm-svn: 249994
2015-10-11 19:54:02 +00:00
Craig Topper 87990ee4ec [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Craig Topper a71630729d [X86] Simplify immediate range checking code.
llvm-svn: 249979
2015-10-11 16:38:14 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Craig Topper 55b1f29203 Change isUIntN/isIntN calls with constant N to use the template version. NFC
llvm-svn: 249952
2015-10-10 20:17:07 +00:00
Jonas Paulsson 63a2b6862e [SystemZ] Fixes in the backend I/R.
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.

SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.

Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.

Reviewed by Ulrich Weigand.

llvm-svn: 249945
2015-10-10 07:14:24 +00:00
Craig Topper 7143d8001a Use range-based for loops. NFC.
llvm-svn: 249941
2015-10-10 05:25:06 +00:00
Craig Topper 7d5b23101c Use emplace_back instead of a constructor call and push_back. NFC
llvm-svn: 249940
2015-10-10 05:25:02 +00:00
David Majnemer bfa5b98201 [WinEH] Remove more dead code
wineh-parent is dead, so is ValueOrMBB.

llvm-svn: 249920
2015-10-10 00:04:29 +00:00
Reid Kleckner 14e773500e [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
David Majnemer 35d27b21a1 [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
James Y Knight 692e037499 Fix assert when emitting llvm.pow.f86.
This occurred due to introducing the invalid i64 type after type
legalization had already finished, in an attempt to workaround bitcast
f64 -> v2i32 not doing constant folding.

The *right* thing is to actually fix bitcast, but that has other
complications. So, for now, just get rid of the broken workaround, and
check in a test-case showing that it doesn't crash, with TODOs for
emitting proper code.

llvm-svn: 249908
2015-10-09 21:36:19 +00:00
James Y Knight 5b8217bc05 Fix assert in X86 backend.
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.

However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.

Fix, and add trivial test.

llvm-svn: 249891
2015-10-09 20:10:14 +00:00
Dan Gohman ee1588ce96 [WebAssembly] Rename floating-point operators to match their spec names.
llvm-svn: 249859
2015-10-09 17:50:00 +00:00
Duncan P. N. Exon Smith 769e1a972d AArch64: Make getNextNode() cleanup in r249764 more clear
After r249764, if you didn't see the full context, it looked like
`std::next(I)` would get the same result as
`++MachineBasicBlock::iterator(I)`.  However, `I` is a `MachineInstr*`
(not a `MachineBasicBlock::iterator`).

Use the `getIterator()` helper I added later (r249782) to make this code
more clear.

llvm-svn: 249852
2015-10-09 16:54:54 +00:00
Jun Bum Lim 0aace13d18 Improve ISel across lane float min/max reduction
In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

  svn0 = vector_shuffle t0, undef<2,3,u,u>
  fmin = fminnum t0,svn0
  svn1 = vector_shuffle fmin, undef<1,u,u,u>
  cc = setcc fmin, svn1, ole
  n0 = extract_vector_elt cc, #0
  n1 = extract_vector_elt fmin, #0
  n2 = extract_vector_elt fmin, #1
  result = select n0, n1,n2
into :
  result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

llvm-svn: 249834
2015-10-09 14:11:25 +00:00
Jonas Paulsson ee3685fd45 [SystemZ] Remove unused code in SystemZElimCompare.cpp
The Reference IndirectDef and IndirectUse members were unused and therefore
removed.

llvm-svn: 249824
2015-10-09 11:27:44 +00:00
Nemanja Ivanovic d389657399 Vector element extraction without stack operations on Power 8
This patch corresponds to review:
http://reviews.llvm.org/D12032

This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:

    - Vector element extraction for all vector types with constant element number
    - Vector element extraction for v16i8 and v8i16 with variable element number
    - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
      unnecessarily moving things around between registers

Not included in this patch (will be in upcoming patch):

    - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
      variable element number
    - Vector element insertion for variable/constant element number

Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.

llvm-svn: 249822
2015-10-09 11:12:18 +00:00
Jonas Paulsson 5b3bab40b2 [SystemZ] Remove superfluous braces in SystemZShortenInst.cpp
llvm-svn: 249812
2015-10-09 07:19:20 +00:00
Jonas Paulsson 18d877f79b [SystemZ] Minor bugfixes.
LLCH, LLHH and CLIH had the wrong register classes for the def-operand.
Tie operands if changing opcode to an instruction with tied ops.
Comment typo fix.

These fixes were needed in order to make regression test case
SystemZ/asm-18.ll pass with -verify-machineinstrs (not used by
default).

Reviewed by Ulrich Weigand.

llvm-svn: 249811
2015-10-09 07:19:16 +00:00
Jonas Paulsson 0a9049ba82 [SystemZ] Bugfix in SystemZAsmParser.cpp.
Let parseRegister() allow RegFP Group if expecting RegV Group, since the
%f register prefix yields the FP group even while used with vector instructions.

Reviewed by Ulrich Weigand.

llvm-svn: 249810
2015-10-09 07:19:12 +00:00
Saleem Abdulrasool 1825fac3c9 ARM: tweak WoA frame lowering
Accept r11 when targeting Windows on ARM rather than just low registers.
Because we are in a thumb-2 only mode, this may be slightly more expensive in
code size, but results in better code for the environment since it spills the
frame register, which is generally desired for fast stack walking as per the
ABI.

llvm-svn: 249804
2015-10-09 03:19:03 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Duncan P. N. Exon Smith d389165c14 AArch64: Stop using MachineInstr::getNextNode()
Stop using `getNextNode()` to get an insertion point (at least, in this
one place).  Instead, use iterator logic directly.

The `getNextNode()` interface isn't actually supposed to work for
creating iterators; it's supposed to return `nullptr` (not a real
iterator) if this is the last node.  It's currently broken and will
"happen" to work, but if we ever fix the function, we'll get some
strange failures in places like this.

llvm-svn: 249764
2015-10-08 22:43:26 +00:00
Duncan P. N. Exon Smith a3da44882f PowerPC: Don't use getNextNode() for insertion point
Stop using `getNextNode()` to create an insertion point for machine
instructions (at least, in this one place).  Instead, use an iterator.
As a drive-by, clean up dump statements to use iterator logic.

The `getNextNode()` interface isn't actually supposed to work for
insertion points; it's supposed to return `nullptr` if this is the last
node.  It's currently broken and will "happen" to work, but if we ever
fix the function, we'll get some strange failures.

llvm-svn: 249758
2015-10-08 22:20:37 +00:00
Evgeniy Stepanov 5fe279e727 Add Triple::isAndroid().
This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750
2015-10-08 21:21:24 +00:00
Eric Christopher 11e5983658 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Alexei Starovoitov 87f83e6926 [bpf] Do not expand UNDEF SDNode during insn selection lowering
o Before this patch, BPF backend will expand UNDEF node
    to i64 constant 0.
  o For second pass of dag combiner, legalizer will run through
    each to-be-processed dag node.
  o If any new SDNode is generated and has an undef operand,
    dag combiner will put undef node, newly-generated constant-0 node,
    and any node which uses these nodes in the working list.
  o During this process, it is possible undef operand is
    generated again, and this will form an infinite loop
    for dag combiner pass2.
  o This patch allows UNDEF to be a legal type.

Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
llvm-svn: 249718
2015-10-08 18:52:40 +00:00
Reid Kleckner b2244cb8f0 [WinEH] Relax assertion in the presence of stack realignment
The code is correct as is, but we should test it.

llvm-svn: 249715
2015-10-08 18:41:52 +00:00
Ulrich Weigand f4d14f781f [SystemZ] Fix another assertion failure in tryBuildVectorShuffle
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types.  This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.

When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.

llvm-svn: 249706
2015-10-08 17:46:59 +00:00
Frederic Riss 263b772bda [X86] Disable X86CallFrameOptimization on Darwin in presence of EH
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.

(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)

llvm-svn: 249694
2015-10-08 15:45:08 +00:00
Igor Breger defab3c1ef AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.
This instructions doesn't have intrincis.
Added tests for lowering and encoding.

Differential Revision: http://reviews.llvm.org/D12317

llvm-svn: 249688
2015-10-08 12:55:01 +00:00
Michael Kuperstein 04e79329d0 [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.

Differential Revision: http://reviews.llvm.org/D13505

llvm-svn: 249669
2015-10-08 08:13:02 +00:00
Jonas Paulsson 5d3fbd3733 [SystemZ] SystemZElimCompare pass improved.
Compare elimination extended to recognize load-and-test instructions used
for comparison and eliminate them the same way as with compare instructions.

Test case fp-cmp-05.ll updated to expect optimized results now also for z13.

The order of instruction shortening and compare elimination passes have been
changed so that opcodes do not have to be handled in both passes.

Reviewed by Ulrich Weigand.

llvm-svn: 249666
2015-10-08 07:40:23 +00:00
Jonas Paulsson 29d9d8d955 [SystemZ] Bugfix: check CC reg liveness in SystemZShortenInst.
The following instruction shortening transformations would introduce a
definition of the CC reg, so therefore liveness of CC reg must be checked:

WFADB -> ADBR
WFSDB -> SDBR

Also add the CC reg implicit def operand to the MI in case of change of opcode.

Reviewed by Ulrich Weigand.

llvm-svn: 249665
2015-10-08 07:40:19 +00:00
Jonas Paulsson 7c5ce10a07 [SystemZ] Use load-and-test for fp compare with 0 if vector support is present.
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.

Reviewed by Ulrich Weigand.

llvm-svn: 249664
2015-10-08 07:40:16 +00:00
Jonas Paulsson 2c96dd64fc [SystemZ] More minor fixing in SystemZElimCompare.cpp
Don't use subreg indices since they are not used after regalloc.

Reviewed by Ulrich Weigand.

llvm-svn: 249663
2015-10-08 07:40:11 +00:00
Jonas Paulsson 9e1f3bd1bd [SystemZ] Minor fixes in SystemZElimCompare.cpp
Reviewed by Ulrich Weigand.

llvm-svn: 249662
2015-10-08 07:39:55 +00:00
Justin Bogner 468c998031 CodeGen: print and verify after TargetPassConfig::insertPass by default
In r224059, we started verifying after addPass, but missed doing so on
insertPass. There isn't a good reason for the discrepancy, and
skipping the verifier in these cases causes bugs.

This also exposes a verifier error that was introduced in r249087, but
the verifier doesn't run until after the register coalescer, when the
issue happens to have been resolved. I've skipped the verifier after
SIFixSGPRLiveRangesID to avoid the failures for now and will follow up
with Matt for a proper fix.

llvm-svn: 249643
2015-10-08 00:36:22 +00:00
Reid Kleckner 97797419e6 [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.

llvm-svn: 249637
2015-10-07 23:55:01 +00:00
Rafael Espindola 284093033f git-clang-format r249548.
Sorry for missing this the first time.

llvm-svn: 249610
2015-10-07 20:32:24 +00:00
Vasileios Kalintiris b876b58d38 [mips][FastISel] Factor out common code from switch statement. NFC
llvm-svn: 249603
2015-10-07 20:06:30 +00:00
Vasileios Kalintiris 6ae1b35cda [mips][FastISel] Use ternary operator to select opcode. NFC
llvm-svn: 249594
2015-10-07 19:43:31 +00:00
Vasileios Kalintiris daad571ba4 [mips][FastISel] Simple refactoring of MipsFastISel::emitLogicalOP(). NFC.
llvm-svn: 249580
2015-10-07 18:14:24 +00:00
Chad Rosier 7c6ac2b8f9 [AArch64] Fold a floating-point divide by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249579
2015-10-07 17:51:37 +00:00
Matt Arsenault fc0ad42516 AMDGPU: Fix missing implicit m0 uses on movrel instructions
llvm-svn: 249577
2015-10-07 17:46:32 +00:00
Chad Rosier fa30c9b436 [AArch64] Fold a floating-point multiply by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249576
2015-10-07 17:39:18 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Kevin B. Smith 9c7408807f Test commit access. Fixed comment to have correct input parameter name and
period termination.

llvm-svn: 249571
2015-10-07 17:24:25 +00:00
Oliver Stannard d3d114ba54 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.

llvm-svn: 249565
2015-10-07 16:58:49 +00:00
Chad Rosier 17436bf64e [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 249560
2015-10-07 16:15:40 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Rafael Espindola 30d77777e7 Use non virtual destructors for sections.
llvm-svn: 249548
2015-10-07 13:46:06 +00:00
Chad Rosier db71abf2d4 [ARM] Push more complex check down to reduce compile time. NFC.
llvm-svn: 249547
2015-10-07 13:40:44 +00:00
Rafael Espindola 665b0d3a4e Don't repeat names in comments and don't indent in namespaces. NFC.
llvm-svn: 249546
2015-10-07 13:38:49 +00:00
Scott Egerton 9004cc7942 Revert: r249536 - Testing commit access with a trival whitespace change.
llvm-svn: 249537
2015-10-07 10:57:06 +00:00
Scott Egerton be6b54b691 Testing commit access with a trival whitespace change.
llvm-svn: 249536
2015-10-07 10:49:49 +00:00
Michael Kuperstein 259f1508f0 [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.

This should fix the exception handling issues in PR24792.

Differential Revision: http://reviews.llvm.org/D13132

llvm-svn: 249522
2015-10-07 07:01:31 +00:00
Igor Breger 1a6fd1cc0f AVX512: Change encoding of vpshuflw and vpshufhw instructions. Implement WIG as W0 and not W1, like all other instruction have been implemented.
Add encoding tests.

Differential Revision: http://reviews.llvm.org/D13471

llvm-svn: 249521
2015-10-07 06:31:18 +00:00
Matt Arsenault 10e6a61892 AMDGPU: Add comment for VOP2b operand class
Because of the constant bus requirement, it is never legal to
use a literal constant for these instructions despite the encoding
allowing it. This was already doing the right thing, but note why.

llvm-svn: 249500
2015-10-07 01:36:00 +00:00
Matt Arsenault 187276fa94 AMDGPU: Properly register passes
llvm-svn: 249495
2015-10-07 00:42:53 +00:00
Matt Arsenault 284192730a AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

llvm-svn: 249494
2015-10-07 00:42:51 +00:00
Matt Arsenault 922b7bf808 AMDGPU: Remove inferRegClassFromUses / inferRegClassFromDefs
I'm not sure why this would be necessary, and no tests fail with
them removed. Looking at the uses is suspect as well because
the use reg classes will likely change when the users are moved
as a result of moving this instruction.

llvm-svn: 249493
2015-10-07 00:42:31 +00:00
Hans Wennborg 083ca9bb32 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Tom Stellard 0fbf899c0f AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13367

llvm-svn: 249468
2015-10-06 21:16:34 +00:00
Chad Rosier dca46b426f [ARM] Minor refactoring. NFC.
llvm-svn: 249465
2015-10-06 20:58:42 +00:00
Chad Rosier aed910b7d7 [ARM] Minor refactoring. NFC.
llvm-svn: 249464
2015-10-06 20:51:26 +00:00
Chad Rosier 9df4aff86d [ARM] Minor refactoring. NFC.
llvm-svn: 249463
2015-10-06 20:45:45 +00:00
Joseph Tremoulet 2afea5438f [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
Chad Rosier a087fd21da [ARM] Minor refactoring to improve readability. NFC.
llvm-svn: 249454
2015-10-06 20:23:42 +00:00
Krzysztof Parzyszek 8d2b2cfa29 [Hexagon] Remove ZeroOrMore from option flags
llvm-svn: 249438
2015-10-06 18:29:36 +00:00
Tom Stellard 88e0b25181 AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcp
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13436

llvm-svn: 249424
2015-10-06 15:57:53 +00:00
Krzysztof Parzyszek fb33824efd [Hexagon] Add an early if-conversion pass
llvm-svn: 249423
2015-10-06 15:49:14 +00:00
Daniel Sanders 1b3341724c [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet

Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
  0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
    0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...

There was problem with selecting sqrt instruction in LLVM backend.

To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.

Patch by Zlatko Buljan

Reviewers: zoran.jovanovic, hvarga, dsanders

Subscribers: llvm-commits, petarj

Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249416
2015-10-06 15:17:25 +00:00
Daniel Sanders add9057fa7 Revert r249123 - [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
The author was not credited and most of the commit message is missing. Will re-commit with this fixed.

llvm-svn: 249415
2015-10-06 15:13:16 +00:00
Alexei Starovoitov 4e01a38da0 [bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
  int pid;
  char name[16];
};
extern void test1(char *);
int test() {
  struct key_t key = {};
  test1(key.name);
  return 0;
}
For key.name, the llc/bpf may generate the below code:
  R1 = R10  // R10 is the frame pointer
  R1 += -24 // framepointer adjustment
  R1 |= 4   // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.

This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
  R1 = R10
  R1 += -20

Patch by Yonghong Song <yhs@plumgrid.com>

llvm-svn: 249371
2015-10-06 04:00:53 +00:00
Craig Topper 79dd1bf094 [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Craig Topper d69d495333 [X86] Remove unnecessary AddComplexity directive. The instruction is already wrapped in the equivalent earlier. NFC
llvm-svn: 249369
2015-10-06 02:50:21 +00:00
Dan Gohman e51c058ecc [WebAssembly] Switch to a more traditional assembly syntax
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.

llvm-svn: 249364
2015-10-06 00:27:55 +00:00
David Majnemer e4f9b09b51 [WinEH] Update CATCHRET's operand to match its successor
The CATCHRET operand did not match the MachineFunction's CFG.  This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.

Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.

llvm-svn: 249344
2015-10-05 20:09:16 +00:00
Tom Stellard d585cd85a3 AMDGPU/SI: Add a helper for creating aliases for the _e32 instructions
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.

These aliases allow for the automatic matching of instructions
with forced 32-bit encoding.  Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13396

llvm-svn: 249330
2015-10-05 17:57:39 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Zoran Jovanovic 5a8dffc618 [mips][microMIPS] Implement JALRC16, JRCADDIUSP and JRC16 instructions
Differential Revision: http://reviews.llvm.org/D11219

llvm-svn: 249317
2015-10-05 14:00:09 +00:00
Alexandros Lamprineas 1bab191f25 [MC layer][AArch64] llvm-mc accepts 4-bit immediate values for
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.

Differential Revision: http://reviews.llvm.org/D13011

llvm-svn: 249313
2015-10-05 13:42:31 +00:00
Daniel Sanders d5a89418c5 [mips] Changed the way symbols are handled in dla and la instructions to allow simple expressions.
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.

Patch by Scott Egerton.

Reviewers: vkalintiris, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12760

llvm-svn: 249311
2015-10-05 13:19:29 +00:00
Rafael Espindola e3a20f57d9 Fix pr24486.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.

With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.

In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.

This patch is a joint work by Maxim Ostapenko andy myself.

llvm-svn: 249303
2015-10-05 12:07:05 +00:00
Joerg Sonnenberger 726e624c0c [SPARCv9] Add support for the rdpr/wrpr instructions.
llvm-svn: 249262
2015-10-04 09:11:22 +00:00
Igor Breger 78741a1b1e AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
Jeroen Ketema 321fc30afc Fix typo in README
llvm-svn: 249253
2015-10-04 00:46:16 +00:00
Simon Pilgrim bc707d04a4 [X86] Lower SEXTLOAD using SIGN_EXTEND_VECTOR_INREG. NCI.
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.

llvm-svn: 249243
2015-10-03 18:55:43 +00:00
Dan Gohman dc51b96b7f [WebAssembly] Implement the remaining conversion operations.
This is a temporary assembly syntax that will likely evolve along with
broader upcoming syntax changes.

llvm-svn: 249225
2015-10-03 02:10:28 +00:00
Tom Stellard dc9088a10e AMDGPU/SI: Remove unused tablegen multiclass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13395

llvm-svn: 249221
2015-10-03 00:29:50 +00:00
Dan Gohman 6a050f30de [WebAssembly] Rename setlocal to set_local to match the spec.
llvm-svn: 249218
2015-10-03 00:01:53 +00:00
Dan Gohman e3e4a5ff52 [WebAssembly] Fix CFG stackification of nested loops.
llvm-svn: 249187
2015-10-02 21:11:36 +00:00
Dan Gohman 9cc692b06e [WebAssembly] Support calls marked as "tail", fastcc, and coldcc.
llvm-svn: 249184
2015-10-02 20:54:23 +00:00
Dan Gohman baba8c648b [WebAssembly] Add a resize_memory intrinsic.
llvm-svn: 249178
2015-10-02 20:10:26 +00:00
Dan Gohman 72f1692a2c [WebAssembly] Add a memory_size intrinsic.
llvm-svn: 249171
2015-10-02 19:21:15 +00:00
Matt Arsenault d092a068ba AMDGPU/SI: Add verifier check for exec reads
Make sure we aren't accidentally not setting
these in the instruction definitions.

llvm-svn: 249170
2015-10-02 18:58:37 +00:00
Roman Divacky 4b5507a037 Actually switch the arch when we see .arch. PR21695
llvm-svn: 249165
2015-10-02 18:25:25 +00:00
Tim Northover 8d67b8e053 ARM: diagnose invalid local fixups on Thumb1
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.

llvm-svn: 249164
2015-10-02 18:07:18 +00:00
Tim Northover 956b008db6 ARM: correctly align constant pool value on Thumb1 targets.
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

llvm-svn: 249163
2015-10-02 18:07:13 +00:00
Chad Rosier 1f385618c0 [ARM] Typo. NFC.
llvm-svn: 249153
2015-10-02 16:42:59 +00:00
Andrea Di Biagio 77f62652c1 Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio 45874e67a1 Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Zoran Jovanovic 9ffdfa5986 [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249123
2015-10-02 13:06:02 +00:00
Andrea Di Biagio cb33456122 [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Matt Arsenault b733f00510 AMDGPU: Fix unused variable warning in release build
llvm-svn: 249091
2015-10-01 22:40:35 +00:00
Matt Arsenault b87fc22915 AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc pass
Replace LiveInterval usage with LiveVariables. LiveIntervals
computes far more information than is needed for this pass
which just needs to find if an SGPR is live out of the
defining block.

LiveIntervals are not usually available that early, requiring
computing them twice which is very expensive. The extra run of
LiveIntervals/LiveVariables/SlotIndexes was costing in total
about 5% of compile time.

Continuing to use LiveIntervals is problematic. It seems
there is an option (early-live-intervals) to run the analysis
about where it should go to avoid recomputing LiveVariables,
but it seems to be completely broken with subreg liveness
enabled. There are also problems from trying to recompute
LiveIntervals since this seems to undo LiveVariables
and clearing kill flags, causing TwoAddressInstructions
to make bad decisions.

Insert the pass right after live variables and preserve it.
The tricky case to worry about might be phis since
LiveVariables doesn't count a register as live out if
in the successor block it is only used in a phi,
but I don't think this is a concern right now
because SIFixSGPRCopies replaces SGPR phis.

llvm-svn: 249087
2015-10-01 22:10:03 +00:00
Joerg Sonnenberger c8d50d6347 Fix relocation used for GOT references in non-PIC mode. Fix relocations
for "set" pseudo op in PIC mode.

Differential Revision: http://reviews.llvm.org/D13173

llvm-svn: 249086
2015-10-01 22:08:20 +00:00
Matt Arsenault d2c7589f93 AMDGPU: Merge if and switch
llvm-svn: 249082
2015-10-01 21:51:59 +00:00
Matt Arsenault db7f0ef367 AMDGPU: Remove dead code
There's no point in checking VReg_1 because all uses
of it should already have been removed by SILowerI1Copies.

llvm-svn: 249081
2015-10-01 21:51:57 +00:00
Matt Arsenault d1d499aa56 AMDGPU: Make SIInsertWaits about a factor of 4 faster
This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.

Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.

There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.

llvm-svn: 249079
2015-10-01 21:43:15 +00:00
Reid Kleckner fc64fae6e3 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
Tom Stellard e9f8b24985 AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function.  This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

llvm-svn: 249073
2015-10-01 21:16:05 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
Chad Rosier f11d040f01 [AArch64] Deprecate a command-line option used for testing.
Support for pairing unscaled loads and stores has been enabled since the
original ARM64 port.  This feature is no longer experimental, AFAICT.

llvm-svn: 249049
2015-10-01 18:17:12 +00:00
Jonas Paulsson 12629324a4 [SystemZ] Add some generic (floating point support) load instructions.
Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.

Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.

README.txt updated (bullet removed).

Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.

Reviewed by Ulrich Weigand.

llvm-svn: 249046
2015-10-01 18:12:28 +00:00
Tom Stellard e0e582c9aa AMDGPU: Add MEM_RAT STORE_TYPED.
v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

llvm-svn: 249042
2015-10-01 17:51:34 +00:00
Tom Stellard c0f0fba2c4 AMDGPU: Factor out EOP query.
v2: Fix brace placement and capitalization (Matt).

Patch by: Zoltan Gilian

llvm-svn: 249041
2015-10-01 17:51:29 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Ulrich Weigand cf1670a095 [SystemZ] Add assembly instructions for obtaining clock values as well as CPU features
Provide assembler support for STCK, STCKF, STCKE, and STFLE.

Author: joncmu
Differential Revision: http://reviews.llvm.org/D13299

llvm-svn: 249015
2015-10-01 14:43:48 +00:00
Chad Rosier b7c5b91068 [AArch64] Hoist commonly failing check. NFC.
llvm-svn: 249011
2015-10-01 13:43:05 +00:00
Chad Rosier 0b15e7c618 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 249008
2015-10-01 13:33:31 +00:00
Chad Rosier 7a83d770ae [AArch64] Update comment to reflect reality.
llvm-svn: 249007
2015-10-01 13:09:44 +00:00
Zoran Jovanovic 2960f3a346 [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions
Differential Revision: http://reviews.llvm.org/D10337

llvm-svn: 249004
2015-10-01 12:49:27 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Tom Stellard 1f0e7bbc5b AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init order
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12451

llvm-svn: 248978
2015-10-01 02:02:46 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
David Blaikie 757908e545 Fix -Wsign-compare warning
llvm-svn: 248942
2015-09-30 20:37:48 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Hal Finkel 4c45775880 [PowerPC] Disable shrink wrapping
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

llvm-svn: 248924
2015-09-30 17:29:03 +00:00
Artyom Skrobov 72ca6b8f3f [ARM] Support for ARMv6-Z / ARMv6-ZK missing
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

llvm-svn: 248921
2015-09-30 17:25:52 +00:00
Chad Rosier 4f04e2ec87 [AArch64] Use helper function to improve readability. NFC.
llvm-svn: 248914
2015-09-30 16:50:41 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Marek Olsak d1a69a2839 AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
2015-09-29 23:37:32 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Nemanja Ivanovic 2c84b29464 Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1
This patch corresponds to review:
http://reviews.llvm.org/D13191

Back end portion of the fifth round of additions to altivec.h.

llvm-svn: 248809
2015-09-29 17:41:53 +00:00
Chad Rosier 32d4d37e61 [AArch64] Scale offsets by the size of the memory operation. NFC.
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored.  This change gets
us one step closer to forming LDPSW instructions.  This change also enables
pre- and post-indexing for halfword and byte loads and stores.

llvm-svn: 248804
2015-09-29 16:07:32 +00:00
Chad Rosier a4d3217e81 [AArch64] Remove some redundant cases. NFC.
llvm-svn: 248800
2015-09-29 14:57:10 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
NAKAMURA Takumi 0c12a3949e [CMake] X86AsmParser: Prune redundant LINK_LIBS.
It is described in LLVMBuild.txt.

llvm-svn: 248771
2015-09-29 01:25:01 +00:00
Sanjay Patel 3a14f1a338 add a FIXME for a CPU model check that should have an attribute instead
llvm-svn: 248746
2015-09-28 22:00:24 +00:00
Matt Arsenault ba6aae785a AMDGPU: Factor switch into separate function
llvm-svn: 248742
2015-09-28 20:54:57 +00:00
Matt Arsenault 73aa8f687a AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

llvm-svn: 248741
2015-09-28 20:54:52 +00:00
Matt Arsenault e5d042cd56 AMDGPU: Fix moving SMRD loads with literal offsets on CI
llvm-svn: 248740
2015-09-28 20:54:46 +00:00
Matt Arsenault dd49c5fc1b AMDGPU: Fix splitting SMRD with large offset
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.

Test will be included in a following commit
which fixes a related issue.

llvm-svn: 248739
2015-09-28 20:54:42 +00:00
Andrew Kaylor 16c4da03d5 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Daniel Sanders 7727e1098c [mips][p5600] Added P5600 processor and initial scheduler.
Summary:
The P5600 is an out-of-order, superscalar implementation of the MIPS32R5
architecture.

The scheduler has a few missing details (see the 'Tricky Instructions'
section and some quirks of the P5600 are deliberately omitted due to
implementation difficulty and low chance of significant benefit (e.g. the
predicate on P5600WriteEitherALU). However, testing on SingleSource is
showing significant performance benefits on some apps (seven in the 10-30%
range) and only one significant regression (12%) when
-pre-RA-sched=linearize is given. Without -pre-RA-sched=linearize the
results are more variable. Some do even better (up to 55% improvement) but
increased numbers of copies are slowing others down (up to 12%).

Overall, the scheduler as it currently stands is a 2.4% win with
-pre-RA-sched=linearize and a 2.7% win without -pre-RA-sched=linearize.
I'm sure we can improve on this further.

For completeness, the FPGA this was tested on shows some failures with and
without the P5600 scheduler. These appear to be scheduling related since
the two test runs have fairly different sets of failing tests even after
accounting for other factors (e.g. spurious connection failures) however
it's not P5600 specific since we also get some for the generic scheduler.

Reviewers: vkalintiris

Subscribers: mpf, llvm-commits, atrick, vkalintiris

Differential Revision: http://reviews.llvm.org/D12193

llvm-svn: 248725
2015-09-28 18:24:08 +00:00
Dan Gohman 05a17aa82a [WebAssembly] Support for direct call and call_indirect.
llvm-svn: 248716
2015-09-28 16:22:39 +00:00
Zoran Jovanovic cdb64566cc [mips] Handling of immediates bigger than 16 bits
Differential Revision: http://reviews.llvm.org/D10539

llvm-svn: 248706
2015-09-28 11:11:34 +00:00
Artyom Skrobov ad8a0638f7 [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()
supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.

Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.

NFC.

llvm-svn: 248703
2015-09-28 09:44:11 +00:00
Craig Topper 862d5d8322 Remove 'const' from some ArrayRefs. ArrayRefs are already immutable. NFC
llvm-svn: 248693
2015-09-28 00:15:34 +00:00
Yaron Keren e5a9dc2f5b Silence clang warning: variable ‘Status’ set but not used.
llvm-svn: 248691
2015-09-27 21:31:33 +00:00
Matt Arsenault 1d36b717a5 AMDGPU: Remove hasPostISelHook from most instructions
Since this is only needed for VOP3 and a few other special
case instructions, stop setting it on everything.

llvm-svn: 248657
2015-09-26 05:06:48 +00:00
Matt Arsenault f32481372c AMDGPU: Switch over reg class size instead of checking all super classes
This gets isSGPRClass out of my profile of SIFixSGPRCopies.

llvm-svn: 248656
2015-09-26 04:59:04 +00:00
Matt Arsenault 6e28010215 AMDGPU: Don't handle invalid reg classes in helper functions
No tests hit these and it would be better to have checks like
this explicit where they are used.

llvm-svn: 248655
2015-09-26 04:53:30 +00:00
Saleem Abdulrasool 9174623b2d AMDGPU: address -Winconsistent-missing-override
Add missing override.  NFC.

llvm-svn: 248652
2015-09-26 04:34:52 +00:00
Matt Arsenault 8e1ddf84fe AMDGPU: Set CopyCost of register classes
These require multiple mov instructions to copy,
but the default value is that 1 instruction is needed.
I'm not sure if this actually changes anything.

llvm-svn: 248651
2015-09-26 04:09:34 +00:00
Matt Arsenault e98a074c42 AMDGPU: VOP3b definition cleanups
llvm-svn: 248647
2015-09-26 02:25:48 +00:00
Matt Arsenault 86095b8dec AMDGPU: Fix sched model for VOP2b instructions
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

llvm-svn: 248646
2015-09-26 02:25:45 +00:00
Dan Gohman d0bf981296 [WebAssembly] Rename several functions and types according to the new spec.
llvm-svn: 248644
2015-09-26 01:09:44 +00:00
Ahmed Bougacha e81610fabb [ARM] Don't generate clrex for pre-v7 targets.
Since r248294, we emit clrex, but it doesn't exist on v6.

llvm-svn: 248640
2015-09-26 00:14:02 +00:00
Matt Arsenault e229c0c45e AMDGPU: Construct new buffer instruction when moving SMRD
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

llvm-svn: 248627
2015-09-25 22:21:19 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Tom Stellard e135ffd554 AMDGPU/SI: Use .hsatext section instead of .text for HSA
Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

llvm-svn: 248619
2015-09-25 21:41:28 +00:00
Matthias Braun c2d4befb54 MachineBasicBlock: Factor out common code into isReturnBlock()
llvm-svn: 248617
2015-09-25 21:25:19 +00:00
Matt Arsenault f743b838cb AMDGPU: Make getNamedOperandIdx declaration readonly
This matches how it is defined in the generated implementation.

llvm-svn: 248598
2015-09-25 18:09:15 +00:00
Chad Rosier 1bbd7fb38e [AArch64] Add support for generating pre- and post-index load/store pairs.
llvm-svn: 248593
2015-09-25 17:48:17 +00:00
Matt Arsenault 0a10900070 AMDGPU: Disable some passes that are not meaningful
Don't run passes related to stack maps, garbage collection,
exceptions since these aren't useful for GPUs.

There might be a few more to turn off that I'm less sure about
(e.g. ShrinkWrapping) or I'm not sure how to disable
(SafeStack and StackProtector)

llvm-svn: 248591
2015-09-25 17:41:20 +00:00
Matt Arsenault 4bf43d4e68 AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG
This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.

Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.

Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.

llvm-svn: 248589
2015-09-25 17:27:08 +00:00
Matt Arsenault 0cb8517dc6 AMDGPU: Fix recomputing dominator tree unnecessarily
SIFixSGPRCopies does not modify the CFG, but this was
being recomputed before running SIFoldOperands.

llvm-svn: 248587
2015-09-25 17:21:28 +00:00
Matt Arsenault 2d6fdb8495 AMDGPU: Re-justify workaround and fix worked around problem
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.

If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.

llvm-svn: 248585
2015-09-25 17:08:42 +00:00
Matt Arsenault 3ad55ec946 AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources
This avoids needting to re-legalize the new REG_SEQUENCE.

llvm-svn: 248584
2015-09-25 17:08:40 +00:00
Matt Arsenault 6525aa3529 AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos
This was only set on the final _si/_vi version, but not
on the pseudos most of codegen sees.

No test since these instructions aren't used yet.

llvm-svn: 248583
2015-09-25 16:58:27 +00:00
Matt Arsenault 5f70436c49 AMDGPU: Improve accuracy of instruction rates for VOPC
These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.

I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.

llvm-svn: 248582
2015-09-25 16:58:25 +00:00
Saleem Abdulrasool 8e99f50768 ARM: make -Asserts,-Werror=unused-variable build happy
The value was only used in an assertion.  Sink the variable usage into the
assertion.

llvm-svn: 248562
2015-09-25 05:41:02 +00:00
Saleem Abdulrasool fe83b50289 ARM: address WoA division limitation
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

llvm-svn: 248561
2015-09-25 05:15:46 +00:00
Matt Arsenault 8aa9973696 AMDGPU: Remove unused includes
llvm-svn: 248553
2015-09-25 00:28:43 +00:00
Chad Rosier b02f5a5a1f [AArch64] Improve the readability of the ld/st optimization pass. NFC.
In this context, MI is an add/sub instruction not a loads/store.

llvm-svn: 248540
2015-09-24 21:27:49 +00:00
Simon Pilgrim 68d0050c6a [X86][SSE2] Fix zero/any extension shuffles that don't start from the first element
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H

llvm-svn: 248536
2015-09-24 21:02:17 +00:00
Matt Arsenault e66621b306 AMDGPU: Add s_dcache_* instructions
llvm-svn: 248533
2015-09-24 19:52:27 +00:00
Matt Arsenault d6adfb401c AMDGPU: Add cache invalidation instructions.
These are necessary for implementing mem_fence for
OpenCL 2.0.

The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.

llvm-svn: 248532
2015-09-24 19:52:21 +00:00
Chad Rosier 7cd472b719 [AArch64] The paired post-increment store instruction has an output register.
The pre- and post-increment version update the base register, but the post-
version was defined incorrectly.  There is no test case as we don't currently
generate these instructions, but I plan on changing that in the near future.

llvm-svn: 248528
2015-09-24 19:21:42 +00:00
Artyom Skrobov cf296444ab [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.

Differential Revision: http://reviews.llvm.org/D12937

llvm-svn: 248519
2015-09-24 17:31:16 +00:00
Daniel Sanders 090f6e41c4 [mips] Use PredicateControl for the MSA ASE instructions. NFC.
Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13092

llvm-svn: 248486
2015-09-24 12:10:23 +00:00
Matt Arsenault 68d938649e Introduce target hook for optimizing register copies
Allow a target to do something other than search for copies
that will avoid cross register bank copies.

Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.

I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid  kinds
of subregister copies on some targets.

I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.

The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.

llvm-svn: 248478
2015-09-24 08:36:14 +00:00
Matt Arsenault e068f9a263 AMDGPU: Return after instruction is processed.
llvm-svn: 248476
2015-09-24 07:51:28 +00:00
Matt Arsenault 708586faa2 AMDGPU: Remove another unnecessary check from commuteInstruction
llvm-svn: 248475
2015-09-24 07:51:25 +00:00
Matt Arsenault fa242960fc AMDGPU: Add readonly to InstrMapping functions
llvm-svn: 248474
2015-09-24 07:51:23 +00:00
Matt Arsenault cab64f1c75 AMDGPU: Fix printing trailing whitespace for mubuf atomics
llvm-svn: 248472
2015-09-24 07:51:17 +00:00
Matt Arsenault c8e2ce4046 AMDGPU: Reduce number of copies emitted
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

llvm-svn: 248467
2015-09-24 07:16:37 +00:00
Tim Northover beb5bccf88 ARM: fix folding stack adjustment (again again again...)
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.

To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.

llvm-svn: 248437
2015-09-23 22:21:09 +00:00
Sanjay Patel 1a6534661b [x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
xorl %eax, %ecx
movd %ecx, %xmm0

into this:

xorps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248415
2015-09-23 18:33:42 +00:00
Sanjay Patel aba37553c4 [x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
orl %eax, %ecx
movd %ecx, %xmm0

into this:

orps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248409
2015-09-23 18:19:07 +00:00
Evgeniy Stepanov a2002b08f7 Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

This is a re-commit of a change in r248357 that was reverted in
r248358.

llvm-svn: 248405
2015-09-23 18:07:56 +00:00
Sanjay Patel b14ecd34f7 move call to convertIntLogicToFPLogic up; NFCI
The BEXTR comments didn't make sense before, we may want to extend the
FP logic transform to work on vectors, and this way is more beautiful.

llvm-svn: 248404
2015-09-23 18:03:37 +00:00
Sanjay Patel ade3abd2d9 [x86] move code for converting int logic to FP logic to a helper function; NFCI
This is a follow-on to:
http://reviews.llvm.org/rL248395

so we can add the call to the or/xor combines too.

llvm-svn: 248399
2015-09-23 17:39:41 +00:00
Sanjay Patel df2495f331 [x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars
Turn this:
   movd %xmm0, %eax
   movd %xmm1, %ecx
   andl %eax, %ecx
   movd %ecx, %xmm0

into this:
   andps %xmm1, %xmm0


This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

Differential Revision: http://reviews.llvm.org/D13065

llvm-svn: 248395
2015-09-23 17:00:06 +00:00
Dan Gohman 979840d31f [WebAssembly] Fix hasAddr64 being used before being initializer.
This reverts r248388 and fixes the underlying bug: hasAddr64 was initialized
in runOnMachineFunction, but runOnMachineFunction isn't ever called in
CodeGen/WebAssembly/global.ll since that testcase has no functions. The fix
here is to use AsmPrinter's getPointerSize() as needed to determine the
pointer size instead.

llvm-svn: 248394
2015-09-23 16:59:10 +00:00
Alexander Kornienko a3eaa204e6 Fix CodeGen/WebAssembly/global.ll test under ASAN.
llvm-svn: 248388
2015-09-23 15:41:25 +00:00
Chad Rosier 2dfd35499e [AArch64] Refactor pre- and post-index merge fuctions into a single function. NFC.
llvm-svn: 248377
2015-09-23 13:51:44 +00:00
Oliver Stannard f2ed5c68d2 [ARM] Add option to force fast-isel
The ARM backend has some logic that only allows the fast-isel to be enabled for
subtargets where it is known to be stable. This adds a backend option to
override this and force the fast-isel to be used for any target, to allow it to
be tested.

This is an ARM-specific option, because no other backend disables the fast-isel
on a per-subtarget basis.

llvm-svn: 248369
2015-09-23 09:19:54 +00:00
Simon Pilgrim 9cb018b6b6 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Sanjoy Das 2aacc0ecca [SCEV] Introduce ScalarEvolution::getOne and getZero.
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.

I've refactored the call sites I could find easily to use getZero /
getOne.

Reviewers: hfinkel, majnemer, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12947

llvm-svn: 248362
2015-09-23 01:59:04 +00:00
Evgeniy Stepanov 8d0e3011d8 Revert "Android support for SafeStack."
test/Transforms/SafeStack/abi.ll breaks when target is not supported;
needs refactoring.

llvm-svn: 248358
2015-09-23 01:23:22 +00:00
Evgeniy Stepanov ce2e16f00c Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

llvm-svn: 248357
2015-09-23 01:03:51 +00:00
Ahmed Bougacha 81616a72ea [ARM] Emit clrex in the expanded cmpxchg fail block.
ARM counterpart to r248291:

In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248294
2015-09-22 17:22:58 +00:00
Ahmed Bougacha 07a844d758 [AArch64] Emit clrex in the expanded cmpxchg fail block.
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248291
2015-09-22 17:21:44 +00:00
Daniel Sanders 86cce70010 [mips][sched] Split IIBranch into specific instruction classes.
Summary:
Almost no functional change since the InstrItinData's have been duplicated.
The one functional change is to remove IIBranch from the MSA branches. The
classes will be assigned to the MSA instructions as part of implementing
the P5600 scheduler.

II_IndirectBranchPseudo and II_ReturnPseudo can probably be removed. I've
preserved the itinerary information for the corresponding pseudo
instructions to avoid making a functional change to these pseudos in
this patch.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12189

llvm-svn: 248273
2015-09-22 13:36:28 +00:00
Daniel Sanders 1af1d275bc [mips][sched] Temporarily rename IIAlu to IIM16Alu. NFC.
Summary:
The only instructions left in IIAlu are MIPS16 specific. We're not
implementing a MIPS16 scheduler at this time so rename the class to make it
obvious that they are MIPS16 instructions.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12188

llvm-svn: 248267
2015-09-22 12:36:28 +00:00
Stephen Canon 8216d88511 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
NAKAMURA Takumi 10c80e7996 Prune trailing whitespaces.
llvm-svn: 248265
2015-09-22 11:19:03 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 84965031a7 Reformat comment lines.
llvm-svn: 248262
2015-09-22 11:14:12 +00:00
NAKAMURA Takumi 70ad98aca4 Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
NAKAMURA Takumi 59a16a76be ARMInstrInfo.cpp: Reformat.
llvm-svn: 248260
2015-09-22 11:10:17 +00:00
NAKAMURA Takumi bf9cc7f30b Fix utf8 chars.
llvm-svn: 248259
2015-09-22 11:10:08 +00:00
Daniel Sanders f173dda0e2 [mips][ias] Implement .cpreturn directive.
Summary:
Based on a patch by David Chisnall. I've modified the original patch as follows:
* Moved the expansion to the TargetStreamers so that the directive isn't
  expanded when emitting assembly.
* Fixed an operand order bug.
* Changed the move instructions from DADDu to OR to match recent changes to GAS.

Reviewers: vkalintiris

Subscribers: llvm-commits, emaste, seanbruno, theraven

Differential Revision: http://reviews.llvm.org/D13017

llvm-svn: 248258
2015-09-22 10:50:09 +00:00
Daniel Sanders 254f387723 [mips][sched] Added class for WSBH
Summary:
No functional change since no InstrItinData is provided.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12190

llvm-svn: 248257
2015-09-22 10:01:13 +00:00
Simon Pilgrim 1cad0cd3ce [X86][SSE] Match zero/any extension shuffles that don't start from the first element
This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.

The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.

Differential Revision: http://reviews.llvm.org/D12561

llvm-svn: 248250
2015-09-22 08:16:08 +00:00
Matt Arsenault f11e7489e1 AMDGPU: Remove unnecessary check
If the instruction doesn't have enough operands, it
either shouldn't be marked as isCommutable or is malformed.

llvm-svn: 248242
2015-09-22 04:17:45 +00:00
Jeroen Ketema 41681a5329 [ARM] Do not scale vext with a factor
The vext pseudo-instruction takes the number of elements that need to be
extracted, not the number of bytes. Hence, use the number of elements
directly instead of scaling them with a factor.

Reviewers: Silviu Baranga, James Molloy
(not reflected in the differential revision)

Differential Revision: http://reviews.llvm.org/D12974

llvm-svn: 248208
2015-09-21 20:28:04 +00:00
Ulrich Weigand 126caeb043 [SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOS
The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there
is no legal instruction for those on SystemZ.  This could cause
LLVM internal errors.  Fixed by setting the operation action to
Expand for those opcodes.

Also added test cases for all other LLVM IR intrinsics that should
generate a library call.  (Those already work correctly since the
default operation action is fine.)

llvm-svn: 248180
2015-09-21 17:35:45 +00:00
James Molloy e46da3849a Revert "[ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def"
This was committed without the code review (http://reviews.llvm.org/D12937) being approved.

This reverts commit r248152.

llvm-svn: 248174
2015-09-21 16:35:08 +00:00
Matt Arsenault 85441dd724 AMDGPU: Move copy handling under switch like other instructions
llvm-svn: 248172
2015-09-21 16:27:22 +00:00
Chad Rosier 03a47305ec [Machine Combiner] Refactor machine reassociation code to be target-independent.
No functional change intended.
Patch by Haicheng Wu <haicheng@codeaurora.org>!

http://reviews.llvm.org/D12887
PR24522

llvm-svn: 248164
2015-09-21 15:09:11 +00:00
Artyom Skrobov 79b0adaae4 [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following review comments, also updating the description of FeatureDSPThumb2
in ARM.td.

Differential Revision: http://reviews.llvm.org/D12937

llvm-svn: 248152
2015-09-21 12:43:10 +00:00
Asaf Badouh eaf2da14bf [X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FP
Differential Revision: http://reviews.llvm.org/D12524

llvm-svn: 248147
2015-09-21 10:23:53 +00:00
Daniel Sanders 5d7962880d [mips] Allow constant expressions in second argument of .cpsetup.
Summary:
Also tightened up the test and made a trivial fix to prevent double-newline
after emitting .cpsetup directives.

Reviewers: vkalintiris

Subscribers: seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D12956

llvm-svn: 248143
2015-09-21 09:26:55 +00:00
Craig Topper 0013be16ff Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type extra times. NFC
llvm-svn: 248140
2015-09-21 05:32:41 +00:00
Craig Topper 4e9b03d6f9 Don't pass StringRefs around by const reference. Pass by value instead per coding standards. NFC
llvm-svn: 248136
2015-09-21 00:18:00 +00:00
Craig Topper 3c76c523e1 Cleanup places that passed SMLoc by const reference to pass it by value instead. NFC
llvm-svn: 248135
2015-09-20 23:35:59 +00:00
Igor Breger b7e1f9d680 AVX512: Implemented encoding and intrinsics for vcmpss/sd.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12593

llvm-svn: 248121
2015-09-20 15:15:10 +00:00
Asaf Badouh 2744d21fb8 [X86][AVX512] extend support in Scalar conversion
add scalar FP to Int conversion with truncation intrinsics
add scalar conversion FP32 from/to FP64 intrinsics
add rounding mode and SAE mode encoding for these intrinsics

Differential Revision: http://reviews.llvm.org/D12665

llvm-svn: 248117
2015-09-20 14:31:19 +00:00
Igor Breger 4c4cd789c9 AVX512: vsqrtss/sd encoding and intrinsics implementation.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12102

llvm-svn: 248116
2015-09-20 09:13:41 +00:00
Asaf Badouh 572bbceecc [X86][AVX512DQ] Add fpclass instruction
Differential Revision: http://reviews.llvm.org/D12931

llvm-svn: 248115
2015-09-20 08:46:07 +00:00
Michael Kuperstein 58e86bc893 [X86] Fix sitofp and uitofp instruction matching failures with long double and avx512
The operation action for i32 and i64 cannot be set to legal, as long double 
needs custom lowering.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D12372

llvm-svn: 248114
2015-09-20 08:12:17 +00:00
Igor Breger 1d55f20bee AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4
Added tests for intrinsics.

Differential Revision: http://reviews.llvm.org/D12525

llvm-svn: 248113
2015-09-20 07:18:53 +00:00
Igor Breger 0ede3cbb5c AVX512: Implement instructions encoding, lowering and intrinsics
vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4
Added tests for encoding, lowering and intrinsics.

Differential Revision: http://reviews.llvm.org/D11893

llvm-svn: 248111
2015-09-20 06:52:42 +00:00
Saleem Abdulrasool 4966f58ac2 ARM: cleanup formatting
clang-format a line which was poorly formatted.  NFC.

llvm-svn: 248110
2015-09-20 03:19:09 +00:00
Simon Pilgrim d0448ee59f [X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))

Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))

Differential Revision: http://reviews.llvm.org/D12663

llvm-svn: 248091
2015-09-19 13:22:57 +00:00
Matt Arsenault 1fafdc82d6 AMDGPU: Remove dead code
getCFGStructurizerRegClass is not used for SI, so
move it into R600 specific stuff.

llvm-svn: 248087
2015-09-19 06:41:10 +00:00
Bob Wilson 8823b84fae NFC: Fix indentation and add braces to clarify nested of else-statement.
llvm-svn: 248086
2015-09-19 06:20:59 +00:00
Eric Christopher a835956bda Limit the range of processors supported by ARM fast isel to v6 or
later as that's all that is tested right now.

Fixes PR24858.

llvm-svn: 248027
2015-09-18 20:08:18 +00:00