Commit Graph

35772 Commits

Author SHA1 Message Date
Craig Topper 15576e1c8f Use make_range to reduce mentions of iterator type. NFC
llvm-svn: 254872
2015-12-06 05:08:07 +00:00
Dan Gohman d85c3b1fbc [WebAssembly] Don't perform the returned-argument optimization on constants.
llvm-svn: 254866
2015-12-05 22:12:39 +00:00
Dan Gohman 3bb55e98e2 [WebAssembly] Replace the fake JUMP_TABLE instruction with a def : Pat. NFC.
llvm-svn: 254864
2015-12-05 20:46:53 +00:00
Dan Gohman e2a7a8278f [WebAssembly] Implement direct calls to external symbols.
llvm-svn: 254863
2015-12-05 20:41:36 +00:00
Dan Gohman 284384b640 [WebAssembly] Support inline asm constraints of type i16 and similar.
llvm-svn: 254861
2015-12-05 20:03:44 +00:00
Dan Gohman 905bef5cf9 [WebAssembly] Update a stale comment. NFC.
llvm-svn: 254859
2015-12-05 19:43:19 +00:00
JF Bastien f05f6fd1e2 WebAssembly: improve readme, add placeholder for tests.
llvm-svn: 254857
2015-12-05 19:36:33 +00:00
Dan Gohman 7615e46919 [WebAssembly] Move useAA() out of line to make it more convenient to experiment with.
llvm-svn: 254856
2015-12-05 19:27:18 +00:00
Dan Gohman b0921ca9e1 [WebAssembly] Call TargetPassConfig base class functions in overriding functions.
llvm-svn: 254855
2015-12-05 19:24:17 +00:00
Dan Gohman ebb23545de [WebAssembly] Expand frem as a floating point library function.
llvm-svn: 254854
2015-12-05 19:15:57 +00:00
Craig Topper 5c32279bee [Hexagon] Don't call getNumImplicitDefs and then iterate over the count. getNumImplicitDefs contains a loop so its better to just loop over the null terminated implicit def list. NFC
llvm-svn: 254852
2015-12-05 17:34:07 +00:00
Simon Pilgrim 4ba5969224 [X86][ADX] Added memory folding patterns and stack folding tests
llvm-svn: 254844
2015-12-05 07:27:50 +00:00
Craig Topper e5e035a3a8 Replace uint16_t with the MCPhysReg typedef in many places. A lot of physical register arrays already use this typedef.
llvm-svn: 254843
2015-12-05 07:13:35 +00:00
Simon Pilgrim 5a64d98303 [X86][FMA4] Explicitly set the domain of FMA4 float/double scalar instructions
Both were defaulting to the float domain - now matches the packed instructions.

llvm-svn: 254841
2015-12-05 07:07:42 +00:00
Dan Gohman f0b165a7f8 [WebAssembly] Implement ReverseBranchCondition, and re-enable MachineBlockPlacement
This patch introduces a codegen-only instruction currently named br_unless,
which makes it convenient to implement ReverseBranchCondition and re-enable
the MachineBlockPlacement pass. Then in a late pass, it lowers br_unless
back into br_if.

Differential Revision: http://reviews.llvm.org/D14995

llvm-svn: 254826
2015-12-05 03:03:35 +00:00
Dan Gohman 4da4abd87f [WebAssembly] Fix scheduling dependencies in register-stackified code
Add physical register defs to instructions used from stackified
instructions to prevent them from being scheduled into the middle of
a stack sequence. This is a conservative measure which may be loosened
in the future.

Differential Revision: http://reviews.llvm.org/D15252

llvm-svn: 254811
2015-12-05 00:51:40 +00:00
Derek Schuff 9d77952332 [WebAssembly] Support constant offsets on loads and stores
This is just prototype for load/store for i32 types. I'll add them to
the rest of the types if we like this direction.

Differential Revision: http://reviews.llvm.org/D15197

llvm-svn: 254807
2015-12-05 00:26:39 +00:00
Philip Reames 7c6692de16 [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access.  Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing.  Note that the actual implementation was always bailing if the load or store wasn't simple.  

Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared.  In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above.  These are normal loads and what most of the optimizer works with.

llvm-svn: 254805
2015-12-05 00:18:33 +00:00
Hans Wennborg fbf2822e6d Add FeatureLAHFSAHF to amdfam10 as well.
llvm-svn: 254801
2015-12-04 23:32:19 +00:00
Dan Gohman 35bfb24c28 [WebAssembly] Initial varargs support.
Full varargs support will depend on prologue/epilogue support, but this patch
gets us started with most of the basic infrastructure.

Differential Revision: http://reviews.llvm.org/D15231

llvm-svn: 254799
2015-12-04 23:22:35 +00:00
Hans Wennborg 5000ce8a63 X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.

(GCC puts these instructions behind -msahf on 64-bit for the same reason.)

This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.

Differential Revision: http://reviews.llvm.org/D15240

llvm-svn: 254793
2015-12-04 23:00:33 +00:00
Chad Rosier f3491496dc [AArch64] Expand vector SDIVREM/UDIVREM operations.
http://reviews.llvm.org/D15214
Patch by Ana Pazos <apazos@codeaurora.org>!

llvm-svn: 254773
2015-12-04 21:38:44 +00:00
Dan Gohman 1ce2b1afd6 [WebAssembly] Add several more calling conventions to the supported list.
llvm-svn: 254741
2015-12-04 18:27:03 +00:00
Sanjay Patel 1640c54593 fix formatting; NFC
llvm-svn: 254739
2015-12-04 17:51:55 +00:00
Manman Ren 19c7bbe3b7 [CXX TLS calling convention] Add CXX TLS calling convention.
This commit adds a new target-independent calling convention for C++ TLS
access functions. It aims to minimize overhead in the caller by perserving as
many registers as possible.

The target-specific implementation for X86-64 is defined as following:
  Arguments are passed as for the default C calling convention
  The same applies for the return value(s)
  The callee preserves all GPRs - except RAX and RDI

The access function makes C-style TLS function calls in the entry and exit
block, C-style TLS functions save a lot more registers than normal calls.
The added calling convention ties into the existing implementation of the
C-style TLS functions, so we can't simply use existing calling conventions
such as preserve_mostcc.

rdar://9001553

llvm-svn: 254737
2015-12-04 17:40:13 +00:00
Dan Gohman 541841e365 [WebAssembly] Give names to the callseq begin and end instructions.
llvm-svn: 254730
2015-12-04 17:19:44 +00:00
Dan Gohman a3f5ce5f1b [WebAssembly] clang-format CallingConvSupported. NFC.
llvm-svn: 254729
2015-12-04 17:18:32 +00:00
Dan Gohman 85dbdda1ed [WebAssembly] Factor out the list of supported calling conventions.
llvm-svn: 254728
2015-12-04 17:16:07 +00:00
Dan Gohman 2d822e73fa [WebAssembly] Check for more unsupported ABI flags.
llvm-svn: 254727
2015-12-04 17:12:52 +00:00
Dan Gohman cb7940f9f5 [WebAssembly] Use SelectionDAG::getUNDEF. NFC.
llvm-svn: 254726
2015-12-04 17:09:42 +00:00
Krzysztof Parzyszek f1b3e5e52e [Hexagon] Simplify LowerCONCAT_VECTORS, handle different types better
llvm-svn: 254724
2015-12-04 16:18:15 +00:00
Colin LeMahieu 4c606e66a7 [Hexagon] Using multiply instead of shift on signed number which can be UB
llvm-svn: 254719
2015-12-04 15:48:45 +00:00
Jonas Paulsson 7fa69cd5dd [SystemZ] Bugfix: Don't add CC twice to new three-address instruction.
Since BuildMI() automatically adds the implicit operands for a new instruction,
adding the old instructions CC operand resulted in that there were two CC imp-def
operands, where only one was marked as dead. This caused buildSchedGraph() to
miss dependencies on the CC reg.

Review by Ulrich Weigand

llvm-svn: 254714
2015-12-04 12:48:51 +00:00
Alexey Bataev 7cf324772f LEA code size optimization pass (Part 1): Remove redundant address recalculations, by Andrey Turetsky
Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.
Differential Revision: http://reviews.llvm.org/D13294

llvm-svn: 254712
2015-12-04 10:53:15 +00:00
Quentin Colombet 901f036353 [ARM] When a bitcast is about to be turned into a VMOVDRR, try to combine it
with its source instead of forcing the values on GPRs.

This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.

rdar://problem/23691584 

llvm-svn: 254684
2015-12-04 01:53:14 +00:00
JF Bastien 580b6572b5 X86InstrInfo::copyPhysReg: workaround reg liveness
Summary:
computeRegisterLiveness and analyzePhysReg are currently getting
confused about liveness in some cases, breaking copyPhysReg's
calculation of whether AX is dead in some cases. Work around this issue
temporarily by assuming that AX is always live.

See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7
And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201.

This workaround makes the code correct but slightly inefficient, but it
seems to confuse the machine instr verifier which now things EAX was
undefined in some cases where it's being conservatively saved /
restored.

Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15198

llvm-svn: 254680
2015-12-04 01:18:17 +00:00
Dan Gohman 391a98afd5 [WebAssembly] Fix dominance check for PHIs in the StoreResult pass
When a block has no terminator instructions, getFirstTerminator() returns
end(), which can't be used in dominance checks. Check dominance for phi
operands separately.

Also, remove some bits from WebAssemblyRegStackify.cpp that were causing
trouble on the same testcase; they were left behind from an earlier
experiment.

Differential Revision: http://reviews.llvm.org/D15210

llvm-svn: 254662
2015-12-03 23:07:03 +00:00
Chih-Hung Hsieh ed7d81e5d4 [X86] Part 1 to fix x86-64 fp128 calling convention.
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.

* Relax type legalization checks to accept new f128 type configuration,
  whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
  TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
  which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
  to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
  to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
  to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
  to guarantee that returned common sub class will contain the specified
  simple value type.
  This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.

Differential Revision: http://reviews.llvm.org/D15134

llvm-svn: 254653
2015-12-03 22:02:40 +00:00
Colin LeMahieu 15ca65c253 [Hexagon] Adding shuffling resources for HVX instructions and tests for instruction encodings.
llvm-svn: 254652
2015-12-03 21:44:28 +00:00
Reid Kleckner 93fc520339 [X86] Put no-op ADJCALLSTACK markers around all dynamic lowerings
Summary:
These ADJCALLSTACK markers don't generate code, but they keep dynamic
alloca code that calls chkstk out of the prologue.

This slightly pessimizes inalloca calls by preventing some register copy
coalescing, but I can live with that.

Reviewers: qcolombet

Subscribers: hans, llvm-commits

Differential Revision: http://reviews.llvm.org/D15200

llvm-svn: 254645
2015-12-03 20:46:59 +00:00
Krzysztof Parzyszek 7709aa0e07 [Hexagon] Remove variable unused in NDEBUG build
llvm-svn: 254623
2015-12-03 17:53:34 +00:00
Matthias Braun 0d4505c067 AArch64FastISel: Use cbz/cbnz to branch on i1
In the case of a conditional branch without a preceding cmp we used to emit
a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.

Differential Revision: http://reviews.llvm.org/D15122

llvm-svn: 254621
2015-12-03 17:19:58 +00:00
Krzysztof Parzyszek c168c0165c [Hexagon] Implement CONCAT_VECTORS for HVX using V6_vcombine
llvm-svn: 254617
2015-12-03 16:47:20 +00:00
Colin LeMahieu 7c572b2125 [Hexagon] NFC Using canonicalizePacket to compound/duplex/pad packets rather than doing it separately. This also ensures the integrated assembler path matches the assembly parser path.
llvm-svn: 254616
2015-12-03 16:37:21 +00:00
Krzysztof Parzyszek 25ddd2c9e8 [Hexagon] Fix instruction descriptor flags for memory access size
llvm-svn: 254613
2015-12-03 15:41:33 +00:00
Marina Yatsina 4b1aea0802 [X86] MS inline asm: produce error when encountering "<type> ptr <reg name>"
Currently "<type> ptr <reg name>" treated as <reg name> in MS inline asm, ignoring the "<type> ptr" completely and possibly ignoring the intention of the user.
Fixed llvm to produce an error when encountering "<type> ptr <reg name>" operands.

For example: andpd xmm1,xmmword ptr xmm1 --> andpd xmm1, xmm1 
though andpd has 2 possible matching formats - andpd xmm, xmm/m128

Patch by: ziv.izhar@intel.com
Differential Revision: http://reviews.llvm.org/D14607

llvm-svn: 254607
2015-12-03 12:17:03 +00:00
Marina Yatsina 90d9ffa7d6 [X86] Add support for fcomip, fucomip for Intel syntax
According to x86 spec, fcomip and fucomip should be supported for Intel syntax.

Differential Revision: http://reviews.llvm.org/D15104

llvm-svn: 254595
2015-12-03 08:55:33 +00:00
Tom Stellard 9760f03757 AMDGPU/SI: Emit constant arrays in the .hsrodata_readonly_agent section
Summary: This is done only when targeting HSA.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13807

llvm-svn: 254587
2015-12-03 03:34:32 +00:00
Joerg Sonnenberger 48eb197434 Add a TODO item that the nop handling before FP conditional branches is
not enough for SPARCv7.

llvm-svn: 254580
2015-12-03 02:35:24 +00:00
Derek Schuff 5268aaf7b6 [WebAssembly] Add a test for wasm-store-results pass
Differential Revision: http://reviews.llvm.org/D15167

llvm-svn: 254570
2015-12-03 00:50:30 +00:00
Dan Gohman ac132e9305 [WebAssembly] Assert that byval and nest are not used for return types.
llvm-svn: 254567
2015-12-02 23:40:03 +00:00
Krzysztof Parzyszek 8d8b229de9 [Hexagon] Improve lowering of instructions to the MC layer
- Add extenders when necessary.
- Handle some basic relocations.

This should fix the failure in tools/clang/test/CodeGenCXX/crash.cpp

llvm-svn: 254564
2015-12-02 23:08:29 +00:00
David Majnemer 70497c696a Move EH-specific helper functions to a more appropriate place
No functionality change is intended.

llvm-svn: 254562
2015-12-02 23:06:39 +00:00
Alexey Samsonov 44ff204fad Fixup for r254547: use format_hex() to simplify code.
llvm-svn: 254560
2015-12-02 22:59:22 +00:00
Alexey Samsonov 39b7d65d82 [PowerPC] Remove wild call to RegScavenger::initRegState().
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).

llvm-svn: 254548
2015-12-02 21:25:28 +00:00
Alexey Samsonov bcfabaa05b [Hexagon] Remove std::hex in favor of format().
std::hex is not used anywhere in LLVM code base except for this place,
and it has a known undefined behavior (at least in libstdc++ 4.9.3):
https://llvm.org/bugs/show_bug.cgi?id=18156, which fires in UBSan
bootstrap of LLVM.

llvm-svn: 254547
2015-12-02 21:13:43 +00:00
Tom Stellard 00f2f91af4 AMDGPU/SI: Correctly emit agent global segment variables when targeting HSA
Differential Revision: http://reviews.llvm.org/D14508

llvm-svn: 254540
2015-12-02 19:47:57 +00:00
Krzysztof Parzyszek de25ecfa62 [Hexagon] Remove TFRI_V4 instruction, use existing A2_tfrsi instead
llvm-svn: 254539
2015-12-02 19:44:35 +00:00
Kyle Butt 015f4fc854 Test Commit: iteratee
Remove whitespace from blank lines. NFC

llvm-svn: 254531
2015-12-02 18:53:33 +00:00
Tom Stellard e928533dae AMDGPU: Fix msan test failure
llvm-svn: 254527
2015-12-02 18:35:23 +00:00
Tim Northover f520eff782 AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.

llvm-svn: 254524
2015-12-02 18:12:57 +00:00
Dan Gohman 53d1399792 [WebAssembly] Fix comments to say "LIFO" instead of "FIFO" when describing a stack.
llvm-svn: 254523
2015-12-02 18:08:49 +00:00
Tom Stellard e3b5aeaf83 AMDGPU/SI: Don't emit group segment global variables
Summary: Only global or readonly segment variables should appear in object files.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15111

llvm-svn: 254519
2015-12-02 17:00:42 +00:00
Michael Zuckerman 15152a5c41 By intel spec
|9B DD /7| FSTSW m2byte| Valid Valid Store FPU status word at m2byteafter checking for pending unmasked floating-point exceptions.|
|9B DF E0| FSTSW AX| Valid Valid Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.|
|DD /7 |FNSTSW *m2byte| Valid Valid Store FPU status word at m2bytewithout checking for pending unmasked floating-point exceptions.|
|DF E0 |FNSTSW *AX| Valid Valid Store FPU status word in AX register without checking for pending unmasked floating-point exceptions|

m2byte is word register, and therefor instruction operand need to be change from f32mem to i16mem.

Differential Revision: http://reviews.llvm.org/D14953

llvm-svn: 254512
2015-12-02 14:34:34 +00:00
Christof Douma 8b5dc2c94e [AArch64]: Add support for Cortex-A35
Adds support for the new Cortex-A35 ARMv8-A core.

llvm-svn: 254503
2015-12-02 11:53:44 +00:00
Nemanja Ivanovic 74e31bc929 Patch to fix a crash in the PowerPC back end due to ISD::ROTL and ISD::ROTR
not being expanded. Test case included.

llvm-svn: 254501
2015-12-02 10:36:24 +00:00
Hrvoje Varga 672b0f5582 [mips][microMIPS] Implement PREPEND, RADDU.W.QB, RDDSP, REPL.PH, REPL.QB, REPLV.PH, REPLV.QB and MTHLIP instructions
Differential Revision: http://reviews.llvm.org/D14527

llvm-svn: 254496
2015-12-02 09:31:24 +00:00
Simon Pilgrim 3fc3454a0c [X86][FMA] Optimize FNEG(FMUL) Patterns
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)

Fix for PR24366

Differential Revision: http://reviews.llvm.org/D14909

llvm-svn: 254495
2015-12-02 09:07:55 +00:00
Elena Demikhovsky a1a40cce9f AVX-512: Updated cost of FP/SINT/UINT conversion operations
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).

Differential Revision: http://reviews.llvm.org/D15074

llvm-svn: 254494
2015-12-02 08:59:47 +00:00
Asaf Badouh 2489f350c0 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 254493
2015-12-02 08:17:51 +00:00
Craig Topper f419a1f69a [X86] Change getZeroVector to take an MVT instead of EVT. One minor change needed to only try to perform 256-it shuffle combines on legal vector types.
llvm-svn: 254490
2015-12-02 06:39:19 +00:00
Craig Topper 6164297f46 [X86] Fix weird identation. NFC
llvm-svn: 254487
2015-12-02 05:24:38 +00:00
Quentin Colombet bbdebefff6 [X86] Fix a think-o when checking if the eflags needs to be preserved.
llvm-svn: 254480
2015-12-02 02:07:00 +00:00
Quentin Colombet f1e91c8bf1 [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.
This is a superset of the fix done in r254448.

This fixes PR25607.

llvm-svn: 254478
2015-12-02 01:22:54 +00:00
Tim Northover f3be9d5c0b AArch64: fix 128-bit shifts
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).

The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.

rdar://22491037

llvm-svn: 254475
2015-12-02 00:33:54 +00:00
Matt Arsenault 592d068198 AMDGPU: Error on addrspacecasts that aren't actually implemented
llvm-svn: 254469
2015-12-01 23:04:05 +00:00
Matt Arsenault f9bfeafd00 AMDGPU: Implement isNoopAddrSpaceCast
llvm-svn: 254468
2015-12-01 23:04:00 +00:00
Matthias Braun b258d794dd ARM: Change ArchCheck field to uint64_t
The values in this field are compared against getAvailableFeatures()
which returns an uint64_t. This was causing problems in an internal
branch.

llvm-svn: 254462
2015-12-01 21:48:52 +00:00
Matt Arsenault 3b15967008 AMDGPU: Disallow flat_scr in SI assembler
llvm-svn: 254459
2015-12-01 20:31:08 +00:00
Matt Arsenault 856d1928a8 AMDGPU: Optimize VOP2 operand legalization
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.

This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.

llvm-svn: 254452
2015-12-01 19:57:17 +00:00
Quentin Colombet 9cb01aa30a [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.
This fixes PR25629.

llvm-svn: 254448
2015-12-01 19:49:31 +00:00
Artyom Skrobov 5d1f2524a0 Fix Thumb1 epilogue generation
Summary:
This had been broken for a very long time, but nobody noticed until
D14357 enabled shrink-wrapping by default.

Reviewers: jroelofs, qcolombet

Subscribers: tyomitch, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14986

llvm-svn: 254444
2015-12-01 19:25:11 +00:00
Weiming Zhao 56ab51870c [AArch64] Fix a corner case in BitFeild select
Summary:
When not useful bits, BitWidth becomes 0 and APInt will not be happy.

See https://llvm.org/bugs/show_bug.cgi?id=25571

We can just mark the operand as IMPLICIT_DEF is none bits of it is used.

Reviewers: t.p.northover, jmolloy

Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14803

llvm-svn: 254440
2015-12-01 19:17:49 +00:00
Matt Arsenault e830f5427b AMDGPU: Report extractelement as free in cost model
The cost for scalarized operations is computed as N * (scalar operation
cost + 1 extractelement + 1 insertelement). This partially fixes
inflating the cost of scalarized operations since every operation is
scalarized and free. I don't think we want any cost asociated with
scalarization, but for now insertelement is still counted. I'm not sure
if we should pretend that insertelement is also free, or add a way
to compute a custom scalarization cost.

llvm-svn: 254438
2015-12-01 19:08:39 +00:00
Tom Stellard 38b7cbe3e0 AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now dead
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15050

llvm-svn: 254427
2015-12-01 17:45:22 +00:00
Tom Stellard ff63c25753 AMDGPU: Use the default strings for data emission directives
Summary:
This makes the assembly output look nicer and there is no reason to
have custom strings for these.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D14671

llvm-svn: 254426
2015-12-01 17:45:17 +00:00
Sanjay Patel 60216f6943 [x86] add a convenience method to check for FMA capability; NFCI
llvm-svn: 254425
2015-12-01 17:27:55 +00:00
Elena Demikhovsky 0d0692d854 AVX-512: fixed asm string of vsqrtss
(vvsqrtss was generated before)

llvm-svn: 254411
2015-12-01 12:43:46 +00:00
Hrvoje Varga e51b0e13f3 [mips][microMIPS] Implement RECIP.fmt, RINT.fmt, ROUND.L.fmt, ROUND.W.fmt, SEL.fmt, SELEQZ.fmt, SELNEQZ.fmt and CLASS.fmt
Differential Revision: http://reviews.llvm.org/D13885

llvm-svn: 254405
2015-12-01 11:59:21 +00:00
Yury Gribov d7dbb66eb8 Introduce new @llvm.get.dynamic.area.offset.i{32, 64} intrinsics.
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.

Patch by Max Ostapenko.

Differential Revision: http://reviews.llvm.org/D14983

llvm-svn: 254404
2015-12-01 11:40:55 +00:00
Oliver Stannard a34e47066e [AArch64] Add ARMv8.2-A Statistical Profiling Extension
The Statistical Profiling Extension is an optional extension to
ARMv8.2-A. Since it is an optional extension, I have added the
FeatureSPE subtarget feature to control it. The assembler-visible parts
of this extension are the new "psb csync" instruction, which is
equivalent to "hint #17", and a number of system registers.

Differential Revision: http://reviews.llvm.org/D15021

llvm-svn: 254401
2015-12-01 10:48:51 +00:00
Oliver Stannard 4667071574 [ARM] Add ARMv8.2-A to TargetParser
Add ARMv8.2-A to TargetParser, so that it can be used by the clang
command-line options and the .arch directive.

Most testing of this will be done in clang, checking that the
command-line options that this enables work.

Differential Revision: http://reviews.llvm.org/D15037

llvm-svn: 254400
2015-12-01 10:33:56 +00:00
Oliver Stannard 8addbf4350 [ARM] Add subtarget features for ARMv8.2-A
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature.
There is also one large, optional feature, which adds 16-bit floating
point versions of all existing floating-point instructions (VFP and
SIMD), this is represented by the FeatureFullFP16 subtarget feature.

Differential Revision: http://reviews.llvm.org/D15036

llvm-svn: 254399
2015-12-01 10:23:06 +00:00
Craig Topper c458c7c6c9 [X86] Fix patterns for memory forms of FP FSUBR and FDIVR. They need to have memory on the left hand side of the fsub/fdiv operations in their patterns.
Not sure how to test this. I noticed by inspection in the isel tables where the same pattern tried to produce DIV and DIVR or SUB and SUBR.

llvm-svn: 254388
2015-12-01 06:13:16 +00:00
Craig Topper 271f9ded44 [X86] Use range-based for loops. NFC
llvm-svn: 254387
2015-12-01 06:13:15 +00:00
Craig Topper ba894c3c0d [X86] Use array_lengthof instead of calculating manually. Also change index types to size_t to match.
llvm-svn: 254386
2015-12-01 06:13:13 +00:00
Craig Topper ddc76f2bed [Hexagon] Use std::begin() and std::end() instead of doing the same manually. NFC
llvm-svn: 254385
2015-12-01 06:13:10 +00:00
Craig Topper d824f5f0d9 [Hexagon] Use array_lengthof and const correct and type correct the array and array size. NFC
llvm-svn: 254384
2015-12-01 06:13:08 +00:00
Craig Topper 6261e1b94d Use array_lengthof instead of manually calculating it. NFC
llvm-svn: 254383
2015-12-01 06:13:06 +00:00
Craig Topper 3da000c07f [Hexagon] Use ArrayRef to avoid needing to calculate an array size. Interestingly the original code may have had a bug because it was passing the byte size of a uint16_t array instead of the number of entries.
llvm-svn: 254382
2015-12-01 06:13:04 +00:00
Craig Topper 8072081b63 [ARM] Use range-based for loops to avoid the need for calculating an array size that I would have otherwise cconverted to array_lengthof. NFC
llvm-svn: 254381
2015-12-01 06:13:01 +00:00
Cong Hou d97c100dc4 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
(This is the second attempt to submit this patch. The first caused two assertion
 failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)

The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254377
2015-12-01 05:29:22 +00:00
Hans Wennborg 1dbaf67537 Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces."
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."

Asserts were firing in Chromium builds. See PR25687.

llvm-svn: 254366
2015-12-01 03:49:42 +00:00
Matt Arsenault 456fdfcdc2 Squelch unused variable warning in SIRegisterInfo.cpp.
Patch by Justin Lebar

llvm-svn: 254362
2015-12-01 02:14:33 +00:00
Cong Hou fa1917c673 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254348
2015-12-01 00:02:51 +00:00
Simon Pilgrim db26b3ddfa [X86][FMA4] Prefer FMA4 to FMA
We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4).

This patch flips this so FMA4 is preferred; this is for several reasons:

1 - FMA4 is non-destructive reducing the need for mov instructions.
2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference).
3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions.

Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs.

Differential Revision: http://reviews.llvm.org/D14997

llvm-svn: 254339
2015-11-30 22:22:06 +00:00
Matt Arsenault ada6cf1b22 AMDGPU: Fix unused function
llvm-svn: 254333
2015-11-30 21:32:10 +00:00
Matt Arsenault 41003af292 AMDGPU: Error if too many user SGPRs used
llvm-svn: 254332
2015-11-30 21:16:07 +00:00
Matt Arsenault 26f8f3db39 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00
Matt Arsenault ac234b604d AMDGPU: Rename enums to be consistent with HSA code object terminology
llvm-svn: 254330
2015-11-30 21:15:57 +00:00
Matt Arsenault 0e3d38937e AMDGPU: Remove SIPrepareScratchRegs
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

llvm-svn: 254329
2015-11-30 21:15:53 +00:00
Matt Arsenault ff6da2fe89 AMDGPU: Use assert zext for workgroup sizes
llvm-svn: 254328
2015-11-30 21:15:45 +00:00
Quentin Colombet cdad10f333 [ARM] For old thumb ISA like v4t, we cannot use PC directly in pop.
Fix the epilogue emission to account for that.

llvm-svn: 254325
2015-11-30 20:37:58 +00:00
David Majnemer bf4119faf6 [X86] Add RIP to GR64_TCW64
The MachineVerifier wants to check that the register operands of an
instruction belong to the instruction's register class.  RIP-relative
control flow instructions violated this by referencing RIP.  While this
was fixed for SysV, it was never fixed for Win64.

llvm-svn: 254315
2015-11-30 19:04:19 +00:00
Kit Barton f4ce2f3a9e Enable shrink wrapping for PPC64
Re-enable shrink wrapping for PPC64 Little Endian.

One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly.

Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found.

PHabricator: http://reviews.llvm.org/D14778
llvm-svn: 254314
2015-11-30 18:59:41 +00:00
Dan Gohman 96029f7880 [WebAssembly] Fix a few minor compiler warnings. NFC.
llvm-svn: 254311
2015-11-30 18:42:08 +00:00
Sanjay Patel 239be1fb0d fix formatting; NFC
llvm-svn: 254310
2015-11-30 17:52:02 +00:00
Colin LeMahieu e6241798c9 [Hexagon] NFC Reordering headers.
llvm-svn: 254307
2015-11-30 17:32:34 +00:00
Matt Arsenault ea03cf2fa1 AMDGPU: Don't reserve SCRATCH_PTR input register
This hasn't been doing anything since using relocations was added.

llvm-svn: 254304
2015-11-30 15:46:47 +00:00
Aaron Ballman 33c95f08b0 Silencing a 32-bit to 64-bit implicit conversion warning; NFC.
llvm-svn: 254302
2015-11-30 14:52:33 +00:00
Hrvoje Varga c03957f049 [mips][microMIPS] Implement LBUX, LHX, LWX, MAQ_S[A].W.PHL, MAQ_S[A].W.PHR, MFHI, MFLO, MTHI and MTLO instructions
Differential Revision: http://reviews.llvm.org/D14436

llvm-svn: 254297
2015-11-30 12:58:39 +00:00
Zoran Jovanovic a887b36167 [mips][microMIPS] Fix issue with offset operand of BALC and BC instructions
Value of offset operand for microMIPS BALC and BC instructions is currently shifted 2 bits, but it should be 1 bit.
Differential Revision: http://reviews.llvm.org/D14770

llvm-svn: 254296
2015-11-30 12:56:18 +00:00
Zlatko Buljan 56f3b0e410 [mips][microMIPS] Implement PRECR.QB.PH, PRECR_SRA[_R].PH.W, PRECRQ.PH.W, PRECRQ.QB.PH, PRECRQU_S.QB.PH and PRECRQ_RS.PH.W instructions
Differential Revision: http://reviews.llvm.org/D14605

llvm-svn: 254291
2015-11-30 08:37:38 +00:00
Craig Topper 27e2912fa8 Revert r254279 "[X86] Use ArrayRef. NFC". It seems to have upset an MSVC build bot.
llvm-svn: 254280
2015-11-30 02:28:19 +00:00
Craig Topper b84f39865f [X86] Use ArrayRef. NFC
llvm-svn: 254279
2015-11-30 02:08:05 +00:00
Craig Topper aad5f11e5f [AVX512] The vpermi2 instructions require an integer vector for the index vector. This is reflected correctly in the intrinsics, but was not refelected in the isel patterns.
For the floating point types, this requires adding a bitcast to the index vector when its passed through to the output.

llvm-svn: 254277
2015-11-30 00:13:24 +00:00
Craig Topper fbde7aa13a [X86] Remove duplicate entries from intrinsics tables and add asserts to verify there are no others.
llvm-svn: 254274
2015-11-29 23:18:32 +00:00
Dan Gohman 9551a44d0c [WebAssembly] Delete an obsolete TODO comment.
llvm-svn: 254272
2015-11-29 23:09:41 +00:00
Dan Gohman 174b2d83ee [WebAssembly] Set several MCInstrDesc flags.
llvm-svn: 254271
2015-11-29 22:59:19 +00:00
Craig Topper ecae476e4c [X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices.
llvm-svn: 254269
2015-11-29 22:53:22 +00:00
Dan Gohman 5237b3991d [WebAssembly] Delete unused functions. NFC.
llvm-svn: 254268
2015-11-29 22:48:57 +00:00
Dan Gohman 7a6b9825ce [WebAssembly] Minor clang-format and selected clang-tidy cleanups. NFC.
llvm-svn: 254267
2015-11-29 22:32:02 +00:00
Simon Pilgrim 88aa627c0b [X86][SSE] Added support for lowering to ADDSUBPS/ADDSUBPD with commuted inputs
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.

llvm-svn: 254259
2015-11-29 16:41:04 +00:00
Igor Breger e293e83f5d AVX512:Implemented encoding for the vmovq.s instruction.
Differential Revision: http://reviews.llvm.org/D14810

llvm-svn: 254248
2015-11-29 07:41:26 +00:00
Renato Golin 5dbc8a5283 Revert "[ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM."
This reverts commit r254201 and r254202, as it broke test-suite,
self-hosting and sanitizer tests on ARM buildbots.

llvm-svn: 254234
2015-11-28 17:23:46 +00:00
Jonas Paulsson f12b925bb1 [Stack realignment] Handling of aligned allocas.
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.

It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.

SystemZ benefits from this, due to its stack frame layout.

New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.

Review and help from Ulrich Weigand and Hal Finkel.

llvm-svn: 254227
2015-11-28 11:02:32 +00:00
Artyom Skrobov f01a59f9fb Follow-up fix for r254201
llvm-svn: 254202
2015-11-27 16:20:34 +00:00
Artyom Skrobov b955b90509 [ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM.
Summary:
Since this build attribute corresponds to a whole module, and
different functions in a module may differ in the optimizations
enabled for them, this attribute is emitted after all functions,
and only in the case that the optimization goals for all
functions match.

Reviewers: logan, hans

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14934

llvm-svn: 254201
2015-11-27 15:30:51 +00:00
Oliver Stannard b25914e03f [AArch64] Add ARMv8.2-A FP16 scalar instructions
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Most of these instructions are the same as the 32- and 64-bit versions,
but with the type field (bits 23-22) set to 0b11. Previously the top bit
of the size field was always 0, so the instruction classes only provided
a 1-bit size field, which I have widened to 2 bits.

Differential Revision: http://reviews.llvm.org/D15014

llvm-svn: 254198
2015-11-27 13:04:48 +00:00
Craig Topper e38c57a4b8 [X86] Pair a NoVLX with HasAVX512 to match the others and remove a unique predicate check in the isel tables. NFC
llvm-svn: 254191
2015-11-27 05:44:02 +00:00
Craig Topper a47576f297 [X86] Now that X86VPermt2 is used in all the avx512_perm_t_sizes just hardcode it into the patterns instead of passing as an argument. NFC
llvm-svn: 254177
2015-11-26 20:21:29 +00:00
Craig Topper 05858f52fe [X86] Merge X86VPermt2Fp and X86VPermt2Int back together by weakening them just enough. The SDTCisSameSizeAs introduced in r254138 helps here.
llvm-svn: 254176
2015-11-26 20:02:01 +00:00
Craig Topper 0009656335 [X86] Split ISD node for Vfpclass and Vfpclasss so that we can write strong type constraints for each that don't cause ambiguous isel.
llvm-svn: 254172
2015-11-26 19:41:34 +00:00
Craig Topper ff2f14731a [X86] Revert part of r254167 to recover bots.
llvm-svn: 254169
2015-11-26 19:13:05 +00:00
Krzysztof Parzyszek 08ff8883fd [Hexagon] Lowering of V60/HVX vector types
llvm-svn: 254168
2015-11-26 18:38:27 +00:00
Craig Topper 9d1deb4b72 [X86] Strengthen more type constraints to reduce isel table size.
llvm-svn: 254167
2015-11-26 18:31:19 +00:00
Krzysztof Parzyszek 4eb6d4d1f2 [Hexagon] Hexagon V60 HVX intrinsic defintions
Author: Ron Lieberman <ronl@codeaurora.org>
llvm-svn: 254165
2015-11-26 16:54:33 +00:00
Daniel Sanders daa4b6fbd9 [mips][ias] Range check uimm5 operands and fix several bugs this revealed.
Summary:
The bugs were:
* append, prepend, and balign were not tested
* balign takes a uimm2 not a uimm5.
* drotr32 was correctly implemented with a uimm5 but the tests expected
  '52' to be valid.
* li/la were implemented with a uimm5 instead of simm32. simm32 isn't
  completely correct either but I'll fix that when I get to simm32.

A notable omission are some of the shift instructions. Several of these
have been implemented using a single uimm6 instruction (rather than two
uimm5 instructions and a CodeGen-only uimm6 pseudo). These will be updated
in the uimm6 patch.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14712

llvm-svn: 254164
2015-11-26 16:35:41 +00:00
Oliver Stannard 64c167db7a [AArch64] Add ARMv8.2-A new AT instruction variants
ARMv8.2-A adds new variants of the "at" (address translate) system
instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These
are a required part of ARMv8.2-A, so no additional subtarget features
are required.

Differential Revision: http://reviews.llvm.org/D15018

llvm-svn: 254159
2015-11-26 15:34:44 +00:00
Martell Malone d12292480a ARM: address WOA unsigned division overflow crash
Building on r253865 the crash is not limited to signed overflows.

Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.

llvm-svn: 254158
2015-11-26 15:34:03 +00:00
Oliver Stannard 911ea20f07 [AArch64] Add ARMv8.2-A UAO PSTATE bit
ARMv8.2-A adds a new PSTATE bit, PSTATE.UAO, which allows the LDTR/STTR
instructions to behave the same as LDR/STR with respect to execute-only
pages at higher privilege levels. New variants of the MSR/MRS
instructions are added to allow reading and writing this bit. It is a
required part of ARMv8.2-A, so no additional subtarget features are
required.

Differential Revision: http://reviews.llvm.org/D15020

llvm-svn: 254157
2015-11-26 15:32:30 +00:00
Oliver Stannard 1a81cc9f43 [AArch64] Add ARMv8.2-A persistent memory instruction
ARMv8.2-A adds the "dc cvap" instruction, which is a system instruction
that cleans caches to the point of persistence (for systems that have
persistent memory). It is a required part of ARMv8.2-A, so no additional
subtarget features are required.

Differential Revision: http://reviews.llvm.org/D15016

llvm-svn: 254156
2015-11-26 15:28:47 +00:00
Oliver Stannard 48b43741d0 [AArch64] Add ARMv8.2-A ID_A64MMFR2_EL1 register
ARMv8.2-A adds a new ID register, ID_A64MMFR2_EL1, which behaves in the
same way as ID_A64MMFR0_EL1 and ID_A64MMFR1_EL1. It is a required part
of ARMv8.2-A, so no additional subtarget features are required.

Differential Revision: http://reviews.llvm.org/D15017

llvm-svn: 254155
2015-11-26 15:26:10 +00:00
Oliver Stannard 7cc0c4e675 [AArch64] Add subtarget features for ARMv8.2-A
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature. There
is also one large, optional feature, which adds 16-bit floating point
versions of all existing floating-point instructions (VFP and SIMD),
this is represented by the FeatureFullFP16 subtarget feature.

Differential Revision: http://reviews.llvm.org/D15013

llvm-svn: 254154
2015-11-26 15:23:32 +00:00
Craig Topper a3ac738725 [X86] Strengthen more type constraints to reduce isel table size.
llvm-svn: 254142
2015-11-26 07:58:20 +00:00
Vyacheslav Klochkov ed865dfcc5 X86-FMA3: Improved/enabled the memory folding optimization for scalar loads
generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated 
for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}().

Reviewer: David Kreitzer
Differential Revision: http://reviews.llvm.org/D14762

llvm-svn: 254140
2015-11-26 07:45:30 +00:00
Craig Topper 4c175cdc8e [X86] Strengthen the type constraints on X86psadbw and X86dbpsadbw to reduce some of the type checks in the isel matching tables.
llvm-svn: 254139
2015-11-26 07:02:21 +00:00
Krzysztof Parzyszek 195dc8d0db [Hexagon] HVX vector register classes and more isel patterns
llvm-svn: 254132
2015-11-26 04:33:11 +00:00
Tom Stellard 48f29f21ee AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsic
Summary:
This returns a pointer to the dispatch packet, which can be used to load
information about the kernel dispach.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D14898

llvm-svn: 254116
2015-11-26 00:43:29 +00:00
Dan Gohman a774d719a0 [WebAssembly] Fix inline asm support for i64 operands.
llvm-svn: 254106
2015-11-25 22:28:50 +00:00
Dan Gohman d9b4218831 [WebAssembly] Fold setne and seteq comparisons into selects.
llvm-svn: 254104
2015-11-25 22:13:48 +00:00
Krzysztof Parzyszek 70a134d29f [Hexagon] Treat transfers of FP immediates are pseudo instructions
This is a temporary fix to address ICE on 2005-10-21-longlonggtu.ll.
The proper fix will be to use A2_tfrsi, but it will need more work to
teach all users of A2_tfrsi to also expect a floating-point operand.

llvm-svn: 254099
2015-11-25 21:40:03 +00:00
Dan Gohman 5941bde03c [WebAssembly] Add some comments. NFC.
llvm-svn: 254096
2015-11-25 21:32:06 +00:00
Marek Olsak 7ed6b2f414 AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function

It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.

llvm-svn: 254095
2015-11-25 21:22:45 +00:00
Dan Gohman 80e34e0a18 [WebAssembly] Fix WebAssembly register numbering for registers added late.
If virtual registers are created late, mappings to WebAssembly
registers need to be added explicitly. This patch adds a function
to do so and teaches WebAssemblyPeephole to use it. This fixes
an out-of-bounds access on the WARegs vector.

llvm-svn: 254094
2015-11-25 21:13:02 +00:00
Matt Arsenault 49affb8462 AMDGPU: Check feature attributes in SIMachineFunctionInfo
llvm-svn: 254091
2015-11-25 20:55:12 +00:00
Krzysztof Parzyszek 207c13f254 Add hexagonv55 and hexagonv60 as recognized CPUs, make v60 the default
llvm-svn: 254089
2015-11-25 20:30:59 +00:00
Matt Arsenault 61001bbc03 AMDGPU: Make v2i64/v2f64 legal types.
They can be loaded and stored, so count them as legal. This is
mostly to fix a number of common cases for load/store merging.

llvm-svn: 254086
2015-11-25 19:58:34 +00:00
Artyom Skrobov 314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Dan Gohman fb3e0594e4 [WebAssembly] Use a physical register to describe ARGUMENT liveness.
Instead of trying to move ARGUMENT instructions back up to the top after
they've been scheduled or sunk down, use a fake physical register to
create a liveness constraint that prevents ARGUMENT instructions from
moving down in the first place. This is still not entirely ideal, however
it is more robust than letting them move and moving them back.

llvm-svn: 254084
2015-11-25 19:36:19 +00:00
Dan Gohman 9c54d3b4c6 [WebAssembly] Clean up several FIXME comments.
llvm-svn: 254079
2015-11-25 18:13:18 +00:00
Dan Gohman 81719f8555 [WebAssembly] Support for register stackifying with load and store instructions.
llvm-svn: 254076
2015-11-25 16:55:01 +00:00
Dan Gohman 2c8fe6a428 [WebAssembly] Codegen support for ISD::ExternalSymbol
llvm-svn: 254075
2015-11-25 16:44:29 +00:00
Dan Gohman fd4a88c376 [WebAssembly] Add 'final' to some classes. NFC.
llvm-svn: 254073
2015-11-25 16:29:24 +00:00
Dan Gohman 04c0401f28 [WebAssembly] Whitespace consistency. NFC.
llvm-svn: 254071
2015-11-25 16:26:14 +00:00
Sanjay Patel 25150784ae fix typo; NFC
llvm-svn: 254069
2015-11-25 15:33:36 +00:00
Hal Finkel 005f840959 [PowerPC] Don't generate mfocrf on the e500mc
The e500mc does not actually support the mfocrf instruction; update the
processor definitions to reflect that fact.

Patch by Tom Rix (with some test-case cleanup by me).

llvm-svn: 254064
2015-11-25 10:14:31 +00:00
Elena Demikhovsky f07df9fcac AVX-512: Fixed a bug in VPERMT2* intrinsic.
It was wrong order of operands (from intrinsic to DAG node).
I added more strict type specification for instruction selection.

Differential Revision: http://reviews.llvm.org/D14942

llvm-svn: 254059
2015-11-25 08:17:56 +00:00
Hans Wennborg e412b71f95 Revert r253528: "[X86] Enable shrink-wrapping by default."
This caused PR25607 and also caused Chromium to crash on start-up.

(Also had to update test/CodeGen/X86/avx-splat.ll, which was committed
after shrink wrapping was enabled.)

llvm-svn: 254044
2015-11-25 00:05:13 +00:00
Kaelyn Takata d0955312d9 Fix an asan error where NumElements > 32 for at least one case in
test/CodeGen/X86/avg.ll.

llvm-svn: 254043
2015-11-25 00:03:29 +00:00
Simon Pilgrim 1b4fecb098 [X86][FMA] Optimize FNEG(FMA) Patterns
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.

Fix for PR24364

Differential Revision: http://reviews.llvm.org/D14906

llvm-svn: 254016
2015-11-24 20:31:46 +00:00
Cong Hou db6220f84d [X86] Fix several issues related to X86's psadbw instruction.
This patch fixes the following issues:

1. Fix the return type of X86psadbw: it should not be the same type of inputs.
   For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.


Differential revision: http://reviews.llvm.org/D14897

llvm-svn: 254010
2015-11-24 19:51:26 +00:00
Sanjay Patel a0d354541d [x86] remove duplicate movq instruction defs (PR25554)
We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE:

  def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
                     "mov{d|q}\t{$src, $dst|$dst, $src}", // X86-64 only
                     [(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))],
                     IIC_SSE_MOVDQ>;

  def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
                     "mov{d|q}\t{$src, $dst|$dst, $src}",
                     [(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))],
                     IIC_SSE_MOVDQ>, Sched<[WriteMove]>;

As shown in the test case and PR25554:
https://llvm.org/bugs/show_bug.cgi?id=25554

This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction.
This patch deletes one pair of these defs.
Sadly, this won't fix the original test case in the bug report. Something else is still broken.

Differential Revision: http://reviews.llvm.org/D14941

llvm-svn: 253988
2015-11-24 15:44:35 +00:00
Krzysztof Parzyszek aa93575b7e [Hexagon] Add missing include of <cctype>
Lack thereof breaks Windows builds due to the use of std::isspace
in HexagonInstrInfo.cpp.

llvm-svn: 253987
2015-11-24 15:11:13 +00:00
Krzysztof Parzyszek b9a1c3a32c [Hexagon] Bring HexagonInstrInfo up to date
llvm-svn: 253986
2015-11-24 14:55:26 +00:00
Matt Arsenault ff05da806c AMDGPU: Split LDS vector loads
If properly aligned this could allow using ds_read_b64.

llvm-svn: 253975
2015-11-24 12:18:54 +00:00
Matt Arsenault 4d801cd357 AMDGPU: Split x8 and x16 vector loads instead of scalarize
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.

llvm-svn: 253974
2015-11-24 12:05:03 +00:00
Cong Hou 1938f2eb98 Let SelectionDAG start to use probability-based interface to add successors.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.


Differential revision: http://reviews.llvm.org/D14361

llvm-svn: 253965
2015-11-24 08:51:23 +00:00
Cong Hou bed60d35ed [X86][SSE] Detect AVG pattern during instruction combine for SSE2/AVX2/AVX512BW.
This patch detects the AVG pattern in vectorized code, which is simply
c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of
either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to
i32 before any arithmetic operations. The following IR shows such an example:

%1 = zext <N x i8> %a to <N x i32>
%2 = zext <N x i8> %b to <N x i32>
%3 = add nuw nsw <N x i32> %1, <i32 1 x N>
%4 = add nuw nsw <N x i32> %3, %2
%5 = lshr <N x i32> %N, <i32 1 x N>
%6 = trunc <N x i32> %5 to <N x i8>

and with this patch it will be converted to a X86ISD::AVG instruction.

The pattern recognition is done when combining instructions just before type
legalization during instruction selection. We do it here because after type
legalization, it is much more difficult to do pattern recognition based
on many instructions that are doing type conversions. Therefore, for
target-specific instructions (like X86ISD::AVG), we need to take care of type
legalization by ourselves. However, as X86ISD::AVG behaves similarly to
ISD::ADD, I am wondering if there is a way to legalize operands and result
types of X86ISD::AVG together with ISD::ADD. It seems that the current design
doesn't support this idea.

Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of
variant vector sizes.


Differential revision: http://reviews.llvm.org/D14761

llvm-svn: 253952
2015-11-24 05:44:19 +00:00
Dan Gohman 192dddc595 [WebAssembly] Don't print the types of memory_size and grow_memory
This matches the current spec, for now.

llvm-svn: 253931
2015-11-23 22:37:29 +00:00
Andy Ayers 9f7501896e findDeadCallerSavedReg needs to pay attention to calling convention
Caller saved regs differ between SysV and Win64. Use the tail call available set to scavenge from.

Refactor register info to create new helper to get at tail call GPRs. Added a new test case for windows. Fixed up a number of X64 tests since now RCX is preferred over RDX on SysV.

Differential Revision: http://reviews.llvm.org/D14878

llvm-svn: 253927
2015-11-23 22:17:44 +00:00
Dan Gohman 2f16f25391 [WebAssembly] Don't special-case call operand order.
With the '=' suffix now indicating which operands are output operands, it's
no longer as important to distinguish between a call's inputs and its outputs
using operand ordering, so we can go back to printing them in the normal order.

llvm-svn: 253925
2015-11-23 22:04:06 +00:00
Dan Gohman 700515fa92 [WebAssembly] Suffix output operands with '='.
This distinguishes input operands from output operands. This is something of
a syntactic experiment to see whether the mild amount of clutter this adds is
outweighed by the extra information it conveys to the reader.

llvm-svn: 253922
2015-11-23 21:55:57 +00:00
Dan Gohman 7054ac1b8b [WebAssembly] Model the return value of store instructions in wasm.
llvm-svn: 253916
2015-11-23 21:16:35 +00:00
Dan Gohman aa0a4bd05b [WebAssembly] Don't use set_local instructions explicitly.
The current approach to using get_local and set_local is to use them
implicitly, as register uses and defs. Introduce new copy instructions
which are themselves no-ops except for the get_local and set_local
that they imply, so that we use get_local and set_local consistently.

llvm-svn: 253905
2015-11-23 19:30:43 +00:00
Dan Gohman f6857223c9 [WebAssembly] Always print loop end labels
WebAssembly is currently using labels to end scopes, so for example a
loop scope looks like this:

BB0_0:
  loop BB0_1
  ...
BB0_1:

with BB0_0 being the label of the first block not in the loop. This
requires that the label be printed even when it's only reachable via
fallthrough. To arrange this, insert a no-op LOOP_END instruction in
such cases at the end of the loop.

llvm-svn: 253901
2015-11-23 19:12:37 +00:00
Dan Gohman e425c32224 [WebAssembly] Remove incomplete MCCodeEmitter bits.
These are parts of a separate patch that I accidentally included in r253878.

llvm-svn: 253892
2015-11-23 18:00:04 +00:00
Dan Gohman 53828fd777 [WebAssembly] Emit .param, .result, and .local through MC.
This eliminates one of the main remaining uses of EmitRawText.

llvm-svn: 253878
2015-11-23 16:50:18 +00:00
Dan Gohman 3280793234 [WebAssembly] Use dominator information to improve BLOCK placement
Always starting blocks at the top of their containing loops works, but creates
unnecessarily deep nesting because it makes all blocks in a loop overlap.
Refine the BLOCK placement algorithm to start blocks at nearest common
dominating points instead, which significantly shrinks them and reduces
overlapping.

llvm-svn: 253876
2015-11-23 16:19:56 +00:00
Daniel Sanders 2b561336d9 [mips] .ent and .end should also set the type and size of the symbol respectively.
Reviewers: vkalintiris

Subscribers: llvm-commits, seanbruno, emaste, vkalintiris, dsanders

Differential Revision: http://reviews.llvm.org/D14221

llvm-svn: 253875
2015-11-23 16:08:03 +00:00
Krzysztof Parzyszek 29d23f9f4c [Hexagon] Update instruction formats
llvm-svn: 253867
2015-11-23 14:09:26 +00:00
Martell Malone a6b867eb0d ARM: address WoA division overflow crash
Disable custom handling of signed 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit integer overflow crashes.

llvm-svn: 253865
2015-11-23 13:11:39 +00:00
Craig Topper 2241dfd2dc [Mips] Remove an unnecessary wrapping of a predicate with std::ptr_fun. NFC
llvm-svn: 253855
2015-11-23 07:19:06 +00:00
Elena Demikhovsky 0fd11526e2 AVX-512: Optimized INSERT_SUBVECTOR for i1 vector types
ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector.
CONCAT_VECTORS uses ISERT_SUBVECTOR.

Differential Revision: http://reviews.llvm.org/D14815

llvm-svn: 253819
2015-11-22 13:57:38 +00:00
Sanjay Patel 8066d906f1 fix formatting; NFC
llvm-svn: 253802
2015-11-22 00:03:16 +00:00
Krzysztof Parzyszek b46557292c Hexagon V60/HVX DFA scheduler support
Extended DFA tablegen to:
  - added "-debug-only dfa-emitter" support to llvm-tblgen

  - defined CVI_PIPE* resources for the V60 vector coprocessor

  - allow specification of multiple required resources
    - supports ANDs of ORs
    - e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
           (SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)

  - added support for combo resources
    - allows specifying ORs of ANDs
    - e.g. [CVI_XLSHF, CVI_MPY01] means:
           (CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)

  - increased DFA input size from 32-bit to 64-bit
    - allows for a maximum of 4 AND'ed terms of 16 resources

  - supported expressions now include:

    expression     => term [AND term] [AND term] [AND term]
    term           => resource [OR resource]*
    resource       => one_resource | combo_resource
    combo_resource => (one_resource [AND one_resource]*)

Author: Dan Palermo <dpalermo@codeaurora.org>

kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.

Reapply the previous patch, this time without circular dependencies.

llvm-svn: 253793
2015-11-21 20:00:45 +00:00
Krzysztof Parzyszek 4ca21fc1aa Revert r253790: it breaks all builds for some reason.
llvm-svn: 253791
2015-11-21 17:38:33 +00:00
Krzysztof Parzyszek 220a9bc018 Hexagon V60/HVX DFA scheduler support
Extended DFA tablegen to:
  - added "-debug-only dfa-emitter" support to llvm-tblgen

  - defined CVI_PIPE* resources for the V60 vector coprocessor

  - allow specification of multiple required resources
    - supports ANDs of ORs
    - e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
           (SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)

  - added support for combo resources
    - allows specifying ORs of ANDs
    - e.g. [CVI_XLSHF, CVI_MPY01] means:
           (CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)

  - increased DFA input size from 32-bit to 64-bit
    - allows for a maximum of 4 AND'ed terms of 16 resources

  - supported expressions now include:

    expression     => term [AND term] [AND term] [AND term]
    term           => resource [OR resource]*
    resource       => one_resource | combo_resource
    combo_resource => (one_resource [AND one_resource]*)

Author: Dan Palermo <dpalermo@codeaurora.org>

kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.

llvm-svn: 253790
2015-11-21 17:23:52 +00:00
Simon Pilgrim d5a154424b [X86][AVX512] Added AVX512 VMOVLHPS/VMOVHLPS shuffle decode comments.
llvm-svn: 253777
2015-11-21 13:04:42 +00:00
Simon Pilgrim 96cbce61b2 [X86][SSE] Legal XMM Register Class ordering for SSE1
It turns out we have a number of places that just grab the first type attached to a register class for various reasons. This is fine unless for some reason that type isn't legal on the current target, such as for SSE1 which doesn't support v16i8/v8i16/v4i32/v2i64 - all of which were included before 4f32 in the class.

Given that this is such a rare situation I've just re-ordered the types and placed the float types first.

Fix for PR16133

Differential Revision: http://reviews.llvm.org/D14787

llvm-svn: 253773
2015-11-21 12:38:34 +00:00
Matthias Braun 5a1857b6eb ARMLoadStoreOptimizer: Cleanup isMemoryOp(); NFC
llvm-svn: 253757
2015-11-21 02:09:49 +00:00
Vinicius Tinti 67cf33d9ab Test commit
llvm-svn: 253737
2015-11-20 23:20:12 +00:00
Eric Christopher 25bf4a8617 Power8 and later support fusing addis/addi and addis/ld instruction
pairs that use the same register to execute as a single instruction.
No Functional Change

Patch by Kyle Butt!

llvm-svn: 253724
2015-11-20 22:38:20 +00:00
Jun Bum Lim 80ec0d3f5a [AArch64]Merge narrow zero stores to a wider store
This change merges adjacent zero stores into a wider single store.
For example :
  strh wzr, [x0]
  strh wzr, [x0, #2]
becomes
  str wzr, [x0]

This will fix PR25410.

llvm-svn: 253711
2015-11-20 21:14:07 +00:00
Eric Christopher c180836722 Weak non-function symbols were being accessed directly, which is
incorrect, as the chosen representative of the weak symbol may not live
with the code in question. Always indirect the access through the TOC
instead.

Patch by Kyle Butt!

llvm-svn: 253708
2015-11-20 20:51:31 +00:00
Krzysztof Parzyszek 6c5ca95814 [Hexagon] Fix the return value from HexagonGenInsert::runOnMachineFunction
llvm-svn: 253705
2015-11-20 20:46:23 +00:00
Artyom Skrobov 91f339ab3f Handle ARMv6-J as an alias, instead of fake architecture
Summary:
This follows D14577 to treat ARMv6-J as an alias for ARMv6,
instead of an architecture in its own right.

The functional change is that the default CPU when targeting ARMv6-J
changes from arm1136j-s to arm1136jf-s, which is currently used as
the default CPU for ARMv6; both are, in fact, ARMv6-J CPUs.

The J-bit (Jazelle support) is irrelevant to LLVM, and it doesn't
affect code generation, attributes, optimizations, or anything else,
apart from selecting the default CPU.

Reviewers: rengolin, logan, compnerd

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14755

llvm-svn: 253675
2015-11-20 16:46:09 +00:00
Tilmann Scheller 4cd1d51a4d [Hexagon] Remove redundant assignment.
Identified by the Clang static analyzer.

llvm-svn: 253664
2015-11-20 13:27:30 +00:00
Daniel Sanders b700203c8b Partially revert r253662: some unrelated work was accidentally committed with it.
Sorry.

llvm-svn: 253663
2015-11-20 13:16:35 +00:00
Daniel Sanders be9db3c00a Revert the revert 253497 and 253539 - These commits aren't the cause of the clang-cmake-mips failures.
Sorry for the noise.

llvm-svn: 253662
2015-11-20 13:13:53 +00:00
Tilmann Scheller bfd7ce01ea [Hexagon] Remove redundant local variable.
Identified by the Clang static analyzer.

llvm-svn: 253660
2015-11-20 12:10:17 +00:00
Hrvoje Varga b65518c15c [mips][microMIPS] Implement MUL[_S].PH, MULEQ_S.W.PHL, MULEQ_S.W.PHR, MULEU_S.PH.QBL, MULEU_S.PH.QBR, MULQ_RS.PH, MULQ_RS.W, MULQ_S.PH and MULQ_S.W instructions
Differential Revision: http://reviews.llvm.org/D14280

llvm-svn: 253651
2015-11-20 07:14:52 +00:00
Dan Gohman d9625276a7 [WebAssembly] Remove the AsmPrinter code for printing physical registers.
WebAssembly does not have physical registers, so even if LLVM uses physical
registers like SP, they'll need to be lowered to virtual registers before
AsmPrinter time.

llvm-svn: 253644
2015-11-20 03:13:31 +00:00
Dan Gohman dfa81d8e22 [WebAssembly] Add a few open tasks to the target README.txt.
llvm-svn: 253643
2015-11-20 03:08:27 +00:00
Dan Gohman bb7ce8e408 [WebAssembly] Rename SWITCH to TABLESWITCH to match the current wording in the spec.
llvm-svn: 253642
2015-11-20 03:02:49 +00:00
Dan Gohman 2dfc3b8be5 [WebAssembly] Remove done items from the README.txt.
llvm-svn: 253640
2015-11-20 02:51:38 +00:00
Dan Gohman 7bafa0eaef [WebAssembly] Add asserts that the expression stack is used in stack order.
llvm-svn: 253638
2015-11-20 02:33:24 +00:00
Dan Gohman b0992dafb3 [WebAssemby] Enforce FIFO ordering for instructions using stackified registers.
llvm-svn: 253634
2015-11-20 02:19:12 +00:00
Eric Christopher eb027124af Split the argument unscheduling loop in the WebAssembly register
coloring pass. Turn the logic into "look for an insert point and
then move things past the insert point".

No functional change intended.

llvm-svn: 253626
2015-11-20 00:34:54 +00:00
Eric Christopher 8c3dbcab1d Fix a [-Werror,-Wcovered-switch-default] warning by removing the
unnecessary default case.

llvm-svn: 253621
2015-11-19 23:45:42 +00:00
Dan Gohman 3192ddfeba [WebAssembly] Implement isCheapToSpeculateCtlz and isCheapToSpeculateCttz.
This unbreaks test/CodeGen/WebAssembly/i32.ll and
test/CodeGen/WebAssembly/i64.ll after r224899.

llvm-svn: 253617
2015-11-19 23:04:59 +00:00
Simon Pilgrim a9912617c8 [X86][SSE4A] Fix issue with EXTRQI shuffles not starting at the correct start index.
Found during stress testing.

llvm-svn: 253611
2015-11-19 22:13:56 +00:00
Reid Kleckner ebee6129cd Fix UMRs in Mips disassembler on invalid instruction streams
The Insn and Size local variables were used without initialization.

llvm-svn: 253607
2015-11-19 21:51:55 +00:00
Simon Pilgrim ae0140d6ec [X86] Use existing MachineInstrBuilder::addDisp to create offseted pointer. NFC.
Minor code duplication tidyup to D13988

llvm-svn: 253606
2015-11-19 21:50:57 +00:00
Jun Bum Lim c12c2790e1 [AArch64] Refactoring aarch64-ldst-opt. NCF.
Summary :
 * Rename isSmallTypeLdMerge() to isNarrowLoad().
 * Rename NumSmallTypeMerged to NumNarrowTypePromoted.
 * Use Subtarget defined as a member variable.

llvm-svn: 253587
2015-11-19 18:41:27 +00:00
Jun Bum Lim 4c35ccac91 [AArch64]Extend merging narrow loads into a wider load
This change extends r251438 to handle more narrow load promotions
including byte type, unscaled, and signed. For example, this change will
convert :
  ldursh w1, [x0, #-2]
  ldurh  w2, [x0, #-4]
into
  ldur  w2, [x0, #-4]
  asr   w1, w2, #16
  and   w2, w2, #0xffff

llvm-svn: 253577
2015-11-19 17:21:41 +00:00
Hans Wennborg dcc2500452 X86: More efficient legalization of wide integer compares
In particular, this makes the code for 64-bit compares on 32-bit targets
much more efficient.

Example:

  define i32 @test_slt(i64 %a, i64 %b) {
  entry:
    %cmp = icmp slt i64 %a, %b
    br i1 %cmp, label %bb1, label %bb2
  bb1:
    ret i32 1
  bb2:
    ret i32 2
  }

Before this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          setae   %al
          cmpl    16(%esp), %ecx
          setge   %cl
          je      .LBB2_2
          movb    %cl, %al
  .LBB2_2:
          testb   %al, %al
          jne     .LBB2_4
          movl    $1, %eax
          retl
  .LBB2_4:
          movl    $2, %eax
          retl

After this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          sbbl    16(%esp), %ecx
          jge     .LBB1_2
          movl    $1, %eax
          retl
  .LBB1_2:
          movl    $2, %eax
          retl

Differential Revision: http://reviews.llvm.org/D14496

llvm-svn: 253572
2015-11-19 16:35:08 +00:00
Zoran Jovanovic 00f998b440 [mips] Expansion of ROL and ROR macros
Author: obucina

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D10611

llvm-svn: 253564
2015-11-19 14:15:03 +00:00
Elena Demikhovsky 7c2c9fd243 AVX-512: Fixed COPY_TO_REGCLASS for mask registers
Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits.
Copying 8 bits under DQ may be done with kmovb.

Differential Revision: http://reviews.llvm.org/D14812

llvm-svn: 253563
2015-11-19 13:13:00 +00:00
Simon Pilgrim 846b64e17a [X86][AVX] Fix lowering of X86ISD::VZEXT_MOVL for 128-bit -> 256-bit extension
The lowering patterns for X86ISD::VZEXT_MOVL for 128-bit to 256-bit vectors were just copying the lower xmm instead of actually masking off the first scalar using a blend.

Fix for PR25320.

Differential Revision: http://reviews.llvm.org/D14151

llvm-svn: 253561
2015-11-19 12:18:37 +00:00
Alexey Bataev b7b82bf33e Alternative to long nops for X86 CPUs, by Andrey Turetsky
Make X86AsmBackend generate smarter nops instead of a bunch of 0x90 for code alignment for CPUs which don't support long nop instructions.
Differential Revision: http://reviews.llvm.org/D14178

llvm-svn: 253557
2015-11-19 11:44:35 +00:00
Igor Breger 1f78296869 AVX512: Implemented encoding, intrinsics and DAG lowering for VMOVDDUP instructions.
Differential Revision: http://reviews.llvm.org/D14702

llvm-svn: 253548
2015-11-19 08:26:56 +00:00
Igor Breger 4424aaa28e AVX512: Implemented encoding for the vmovss.s and vmovsd.s instructions.
Differential Revision: http://reviews.llvm.org/D14771

llvm-svn: 253547
2015-11-19 07:58:33 +00:00
Igor Breger 81b79de54c AVX512: Implemented encoding for the follow instructions.
vmovapd.s, vmovaps.s, vmovdqa32.s, vmovdqa64.s, vmovdqu16.s, vmovdqu32.s, vmovdqu64.s, vmovdqu8.s, vmovupd.s, vmovups.s

Differential Revision: http://reviews.llvm.org/D14768

llvm-svn: 253546
2015-11-19 07:43:43 +00:00
Elena Demikhovsky 1ca72e1846 Pointers in Masked Load, Store, Gather, Scatter intrinsics
The masked intrinsics support all integer and floating point data types. I added the pointer type to this list.
Added tests for CodeGen and for Loop Vectorizer.
Updated the Language Reference.

Differential Revision: http://reviews.llvm.org/D14150

llvm-svn: 253544
2015-11-19 07:17:16 +00:00
Pete Cooper 67cf9a723b Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Quentin Colombet 46d5c71135 [X86] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14156

rdar://problem/21118279

llvm-svn: 253528
2015-11-19 00:38:00 +00:00
Quentin Colombet f6645cce91 [AArch64] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14360

rdar://problem/20820748

llvm-svn: 253520
2015-11-18 23:12:20 +00:00
Pete Cooper 72bc23ef02 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Tim Northover 747ae9a7de ARM: make sure backend is consistent about exception handling method.
It turns out we decide whether to use SjLj exceptions or some alternative in
two separate places in the backend, and they disagreed with each other. This
led to inconsistent code and is generally a terrible idea.

So make them consistent and add an assert that they *do* match (unfortunately
MCAsmInfo isn't available in opt, so it can't be used to initialise the CodeGen
version directly).

llvm-svn: 253502
2015-11-18 21:10:39 +00:00
Matthew Simpson 343af07aa9 [Aarch64] Add cost for missing extensions.
This patch adds a cost estimate for some missing sign and zero extensions. The
costs were determined by counting the number of shift instructions generated
without context for each new extension.

Differential Revision: http://reviews.llvm.org/D14730

llvm-svn: 253482
2015-11-18 18:03:06 +00:00
Dan Gohman 94ef41ff1d [WebAssembly] Add more whitespace characters to prettify the assembly output.
llvm-svn: 253472
2015-11-18 17:05:35 +00:00
Dan Gohman 1f29c68042 [WebAssembly] Add some spaces to the assembly output to vertically align operands.
llvm-svn: 253468
2015-11-18 16:25:38 +00:00
Dan Gohman 4ba4816b97 [WebAssembly] Enable register coloring and register stackifying.
This also takes the push/pop syntax another step forward, introducing stack
slot numbers to make it easier to see how expressions are connected. For
example, the value pushed in $push7 is popped in $pop7.

And, this begins an experiment with making get_local and set_local implicit
when an operation directly uses or defines a register. This greatly reduces
clutter. If this experiment succeeds, it may make sense to do this for
const instructions as well.

And, this introduces more special code for ARGUMENTS; hopefully this code
will soon be obviated by proper support for live-in virtual registers.

llvm-svn: 253465
2015-11-18 16:12:01 +00:00
Asaf Badouh 0d957b8b09 [X86][AVX512CD] add mask broadcast intrinsics
Differential Revision: http://reviews.llvm.org/D14573

llvm-svn: 253450
2015-11-18 09:42:45 +00:00
Igor Breger 5574730454 AVX512: Implemented encoding for vpextrw.s instruction.
Differential Revision: http://reviews.llvm.org/D14766

llvm-svn: 253447
2015-11-18 08:46:16 +00:00
Hrvoje Varga 78409019d9 [mips][microMIPS] Implement DPS.W.PH, DPSQ_S.W.PH, DPSQ_SA.L.W, DPSQX_S.W.PH, DPSQX_SA.W.PH, DPSU.H.QBL, DPSU.H.QBR and DPSX.W.PH instructions
Differential Revision: http://reviews.llvm.org/D14058

llvm-svn: 253443
2015-11-18 07:41:35 +00:00
Craig Topper 66059c9f4d Replace dyn_cast with isa in places that weren't using the returned value for more than a boolean check. NFC.
llvm-svn: 253441
2015-11-18 07:07:59 +00:00
Rafael Espindola 449711cb36 Stop producing .data.rel sections.
If a section is rw, it is irrelevant if the dynamic linker will write to
it or not.

It looks like llvm implemented this because gcc was doing it. It looks
like gcc implemented this in the hope that it would put all the
relocated items close together and speed up the dynamic linker.

There are two problem with this:
* It doesn't work. Both bfd and gold will map .data.rel to .data and
  concatenate the input sections in the order they are seen.
* If we want a feature like that, it can be implemented directly in the
  linker since it knowns where the dynamic relocations are.

llvm-svn: 253436
2015-11-18 06:02:15 +00:00
Quentin Colombet 8cb95b8e51 [ARM] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14357

rdar://problem/21942589

llvm-svn: 253411
2015-11-18 00:40:54 +00:00
Simon Pilgrim 2da4178737 [X86][AVX512] Added AVX512 SHUFP*/VPERMILP* shuffle decode comments.
llvm-svn: 253396
2015-11-17 23:29:49 +00:00
Simon Pilgrim 8483df6e24 [X86][AVX512] Added support for AVX512 UNPCK shuffle decode comments.
llvm-svn: 253391
2015-11-17 22:35:45 +00:00
Reid Kleckner c20276d0b2 [WinEH] Move WinEHFuncInfo from MachineModuleInfo to MachineFunction
Summary:
Now that there is a one-to-one mapping from MachineFunction to
WinEHFuncInfo, we don't need to use a DenseMap to select the right
WinEHFuncInfo for the current funclet.

The main challenge here is that X86WinEHStatePass is an IR pass that
doesn't have access to the MachineFunction. I gave it its own
WinEHFuncInfo object that it uses to calculate state numbers, which it
then throws away. As long as nobody creates or removes EH pads between
this pass and SDAG construction, we will get the same state numbers.

The other thing X86WinEHStatePass does is to mark the EH registration
node. Instead of communicating which alloca was the registration through
WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic.  This
intrinsic generates no code and simply marks the alloca in use.

Reviewers: JCTremoulet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14668

llvm-svn: 253378
2015-11-17 21:10:25 +00:00
Charlie Turner 7968b981bf [ARM] Don't pessimize i32 vselect.
The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad
scalarization that is still happening there.

I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements.

From my benchmarks, I saw these improvements in A57 (T32)
spec.cpu2000.ref.177_mesa 5.95%
lnt.SingleSource/Benchmarks/Shootout/strcat 12.93%
lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89%

I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change

Differential Revision: http://reviews.llvm.org/D14743

llvm-svn: 253349
2015-11-17 17:25:15 +00:00
Ahmed Bougacha 88ddeae8bd [AArch64] Promote f16 SELECT_CC CC operands when op is legal.
SELECT_CC has the nasty property of having operands with unrelated
types. So if you do something like:

  f32 = select_cc f16, f16, f32, f32, cc

You'd only look for the action for <select_cc, f32>, but never f16.
If the types are all legal, but the op isn't (as for f16 on AArch64,
or for f128 on x86_64/AArch64?), then you get into trouble.
For f128, we have softenSetCCOperands to handle this case.

Similarly, for f16, we can directly promote the CC operands.

llvm-svn: 253344
2015-11-17 16:45:40 +00:00
Bradley Smith 982a8888b8 [ARM] Default to ARMv4t in favour of adding Other to ARMArch
llvm-svn: 253335
2015-11-17 13:38:29 +00:00
Charlie Turner b4613c6973 [ARM] Match VABDL from log2 shuffles.
Differential Revision: http://reviews.llvm.org/D14664

llvm-svn: 253334
2015-11-17 13:21:35 +00:00
Zlatko Buljan 72a7f9c1f5 [mips][microMIPS] Implement EXTP, EXTPDP, EXTPDPV, EXTPV, EXTR[_RS].W, EXTR_S.H, EXTRV[_RS].W and EXTRV_S.H instructions
Differential Revision: http://reviews.llvm.org/D14174

llvm-svn: 253332
2015-11-17 12:54:15 +00:00
Bradley Smith 4320205484 [ARM] Properly initialize ARMArch in the ARM subtarget
llvm-svn: 253331
2015-11-17 11:57:33 +00:00
Zlatko Buljan 246b21f66a [mips][microMIPS] Implement SUBQ[_S].PH, SUBQ_S.W, SUBQH[_R].PH, SUBQH[_R].W, SUBU[_S].PH, SUBU[_S].QB and SUBUH[_R].QB instructions
Differential Revision: http://reviews.llvm.org/D14114

llvm-svn: 253329
2015-11-17 10:11:22 +00:00
Oliver Stannard 9be59af3ab [Assembler] Make fatal assembler errors non-fatal
Currently, if the assembler encounters an error after parsing (such as an
out-of-range fixup), it reports this as a fatal error, and so stops after the
first error. However, for most of these there is an obvious way to recover
after emitting the error, such as emitting the fixup with a value of zero. This
means that we can report on all of the errors in a file, not just the first
one. MCContext::reportError records the fact that an error was encountered, so
we won't actually emit an object file with the incorrect contents.

Differential Revision: http://reviews.llvm.org/D14717

llvm-svn: 253328
2015-11-17 10:00:43 +00:00
Zlatko Buljan 3e0588d033 [mips][microMIPS] Implement PRECEQ.W.PHL, PRECEQ.W.PHR, PRECEQU.PH.QBL, PRECEQU.PH.QBLA, PRECEQU.PH.QBR, PRECEQU.PH.QBRA, PRECEU.PH.QBL, PRECEU.PH.QBLA, PRECEU.PH.QBR and PRECEU.PH.QBRA instructions
Differential Revision: http://reviews.llvm.org/D14279

llvm-svn: 253326
2015-11-17 09:43:29 +00:00
Jay Foad b64f0a5a1a Fix typos in comments.
llvm-svn: 253324
2015-11-17 08:54:53 +00:00
Rafael Espindola 65e4902156 Drop prelink support.
The way prelink used to work was

* The compiler decides if a given section only has relocations that
are know to point to the same DSO. If so, it names it
.data.rel.ro.local<something>.
* The static linker puts all of these together.
* The prelinker program assigns addresses to each library and resolves
the local relocations.

There are many problems with this:
* It is incompatible with address space randomization.
* The information passed by the compiler is redundant. The linker
knows if a given relocation is in the same DSO or not. If could sort
by that if so desired.
* There are newer ways of speeding up DSO (gnu hash for example).
* Even if we want to implement this again in the compiler, the previous
  implementation is pretty broken. It talks about relocations that are
  "resolved by the static linker". If they are resolved, there are none
  left for the prelinker. What one needs to track is if an expression
  will require only dynamic relocations that point to the same DSO.

At this point it looks like the prelinker is an historical curiosity.
For example, fedora has retired it because it failed to build for two
releases
(http://pkgs.fedoraproject.org/cgit/prelink.git/commit/?id=eb43100a8331d91c801ee3dcdb0a0bb9babfdc1f)

This patch removes support for it. That is, it stops printing the
".local" sections.

llvm-svn: 253280
2015-11-17 00:51:23 +00:00
Derek Schuff 71e8169ea8 [WebAssembly] Fix printing of global operands
This was regressed in r252656 which wasn't quite NFC. Instead of using a
custom instruction as before, use a pattern to select CONST_I32 for the
global addrs.

Differential Revision: http://reviews.llvm.org/D14587

llvm-svn: 253276
2015-11-17 00:20:44 +00:00
Simon Pilgrim 13d3a20ad7 [X86][SSE] Merged BLEND shuffle decode comments. NFC.
Now that we can recognise different vector sizes.

llvm-svn: 253268
2015-11-16 23:03:18 +00:00
Simon Pilgrim b9ada27052 [X86][SSE] Merged ALIGNR/SLLDQ/SRLDQ shuffle decode comments. NFC.
Now that we can recognise different vector sizes - will make future AVX512 additions easier.

llvm-svn: 253266
2015-11-16 22:54:41 +00:00
Simon Pilgrim 5883a73f18 [X86][SSE] Merged SHUF/PERM shuffle decode comments. NFC.
Now that we can recognise different vector sizes - will make future AVX512 additions easier.

llvm-svn: 253260
2015-11-16 22:39:27 +00:00
Simon Pilgrim 66e43ee289 [X86][SSE] Merged UNPCK shuffle decode comments. NFC.
Now that we can recognise different vector sizes - will make future AVX512 additions easier.

llvm-svn: 253258
2015-11-16 22:21:10 +00:00
Derek Schuff 46e3316888 [WebAssembly] Fix function return type printing
Summary:
Previously return type information for a function was derived from
return dag nodes. But this didn't work for dags with != return node. So
instead compute it directly from the LLVM function as is done for imports.

Differential Revision: http://reviews.llvm.org/D14593

llvm-svn: 253251
2015-11-16 21:12:41 +00:00
Derek Schuff 4ed4778419 [WebAssembly] Reverse the order of operands for br_if
Summary: This is to match the new version in the spec

Reviewers: sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D14519

llvm-svn: 253249
2015-11-16 21:04:51 +00:00
Kit Barton 9c432ae111 Find available scratch register to use in function prologue and epilogue as part of shrink wrapping.
Phabricator: http://reviews.llvm.org/D13955
llvm-svn: 253247
2015-11-16 20:22:15 +00:00
Reid Kleckner c397b26790 [WinEH] Don't let UnwindHelp alias the return address
On top of that, don't bother allocating and initializing UnwindHelp if
we don't have any funclets. Currently we always use RBP as our frame
pointer when funclets are present, so this change makes it impossible to
come here without any fixed stack objects.

Fixes PR25533.

llvm-svn: 253245
2015-11-16 18:47:25 +00:00
Reid Kleckner 4255b04e7b Use the subtarget reference that we already have
llvm-svn: 253244
2015-11-16 18:47:12 +00:00
Vasileios Kalintiris 88faf6d697 [mips] Disable code generation through FastISel for MIPS32R6.
Reviewers: dsanders

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14708

llvm-svn: 253225
2015-11-16 17:05:01 +00:00
Petr Pavlu a770379524 [ARM] Prevent use of a value pointed by end() iterator when placing a jump table
Function ARMConstantIslands::doInitialJumpTablePlacement() iterates over all
basic blocks in a machine function. It calls `MI = MBB.getLastNonDebugInstr()`
to get the last instruction in each block and then uses MI->getOpcode() to
decide what to do. If getLastNonDebugInstr() returns MBB.end() (for example,
when the block does not contain any instructions) then calling getOpcode() on
this value is incorrect. Avoid this problem by checking the result of
getLastNonDebugInstr().

Differential Revision: http://reviews.llvm.org/D14694

llvm-svn: 253222
2015-11-16 16:41:13 +00:00
Oliver Stannard 9327a7575b [ARM,AArch64] Store source location of asm constant pool entries
Storing the source location of the expression that created a constant pool
entry allows us to emit better error messages if we later discover that the
expression cannot be represented by a relocation.

Differential Revision: http://reviews.llvm.org/D14646

llvm-svn: 253220
2015-11-16 16:25:47 +00:00
Oliver Stannard 09be060606 [ARM,AArch64] Store source location for values in assembly files
The MCValue class can store a SMLoc to allow better error messages to be
emitted if an error is detected after parsing. The ARM and AArch64 assembly
parsers were not setting this, so error messages did not have source
information.

Differential Revision: http://reviews.llvm.org/D14645

llvm-svn: 253219
2015-11-16 16:22:47 +00:00
Dan Gohman 1462faad35 [WebAssembly] Prototype passes for register coloring and register stackifying.
These passes are not yet enabled by default.

llvm-svn: 253217
2015-11-16 16:18:28 +00:00
Artyom Skrobov f187a65f99 Handle ARMv6KZ naming
Summary:
* ARMv6KZ is the "canonical" name, given in the ARMARM
* ARMv6Z is an "official abbreviation" for it, mentioned in the ARMARM
* ARMv6ZK is a popular misspelling, which we should support as an alias.

The patch corrects the handling of the names.

Functional changes:
* ARMv6Z no longer treated as an architecture in its own right
* ARMv6ZK renamed to ARMv6KZ, accepting ARMv6ZK as an alias
* arm1176jz-s and arm1176jzf-s recognized as ARMv6ZK, instead of ARMv6K
* default ARMv6K CPU changed to arm1176j-s

Reviewers: rengolin, logan, compnerd

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14568

llvm-svn: 253206
2015-11-16 14:05:32 +00:00
Bradley Smith 323fee105d [ARM] Introduce subtarget features per ARM architecture.
This allows for accurate architecture targeting as well as removing
duplicate information (hardcoded feature strings) from MCTargetDesc.

llvm-svn: 253196
2015-11-16 11:10:19 +00:00
James Molloy 2018091e87 Properly check if a CMPZ node is in fact comparing against zero
This was left implicit and never ever checked, which means we could have a CMPZ against some non-zero value and we were carrying on with BFI conversion regardless.

Caught by Oliver Stannard using csmith; regression test added.

llvm-svn: 253195
2015-11-16 10:49:25 +00:00
Oliver Stannard db9081bf89 [AArch64] ldr= pseudo-instruction silently ignored if register invalid
The AArch64 assembler was silently ignoring instructions like this:
  ldr foo, =bar

AArch64AsmParser::parseOperand was returning true as the parse failed, but was
not calling AArch64AsmParser::Error to report this to the user, so the
instruction was ignored without printing an error message.

Differential Revision: http://reviews.llvm.org/D14651

llvm-svn: 253193
2015-11-16 10:25:19 +00:00
Igor Breger 24cab0fa06 AVX512: Implemented encoding and intrinsics for VMOVSHDUP/VMOVSLDUP instructions.
Differential Revision: http://reviews.llvm.org/D14322

llvm-svn: 253185
2015-11-16 07:22:00 +00:00
Dan Gohman 1031d4a8c3 [WebAssembly] Use tabs instead of spaces in assembly output.
This seems to be the most popular convention among the other backends.

llvm-svn: 253172
2015-11-15 15:34:19 +00:00
Simon Pilgrim cbba348ae7 [X86][SSE] Tidyup with implicit SDValue bool check. NFC.
llvm-svn: 253171
2015-11-15 14:57:07 +00:00
Igor Breger 3ff8ef9eb7 Revert r253160.
It broke layering violation. Reproducible with BUILD_SHARED_LIBS=ON.

llvm-svn: 253163
2015-11-15 12:19:11 +00:00
Igor Breger aa40ddd3ba AVX512: Implemented encoding and intrinsics for VMOVSHDUP/VMOVSLDUP instructions.
Differential Revision: http://reviews.llvm.org/D14322

llvm-svn: 253160
2015-11-15 07:23:13 +00:00
Dan Gohman 5219ecf068 [WebAssembly] Minor code simplification. NFC.
llvm-svn: 253150
2015-11-14 23:28:15 +00:00
Dan Gohman 8ad045c1d1 [WebAssembly] Support signext, zeroext, and several other function attributes.
llvm-svn: 253148
2015-11-14 23:15:41 +00:00
Akira Hatanaka b11ef0897c Reduce the size of MCRelaxableFragment.
MCRelaxableFragment previously kept a copy of MCSubtargetInfo and
MCInst to enable re-encoding the MCInst later during relaxation. A copy
of MCSubtargetInfo (instead of a reference or pointer) was needed
because the feature bits could be modified by the parser.

This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment
with a constant reference to MCSubtargetInfo. The copies of
MCSubtargetInfo are kept in MCContext, and the target parsers are now
responsible for asking MCContext to provide a copy whenever the feature
bits of MCSubtargetInfo have to be toggled.
 
With this patch, I saw a 4% reduction in peak memory usage when I
compiled verify-uselistorder.lto.bc using llc.

rdar://problem/21736951

Differential Revision: http://reviews.llvm.org/D14346

llvm-svn: 253127
2015-11-14 06:35:56 +00:00
Akira Hatanaka bd9fc28444 [MCTargetAsmParser] Move the member varialbes that reference
MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a
member function getSTI.

This is done in preparation for making changes to shrink the size of
MCRelaxableFragment. (see http://reviews.llvm.org/D14346).

llvm-svn: 253124
2015-11-14 05:20:05 +00:00
Eric Christopher 57a6e1321f Add MMX to the 3dnow enum and propagate changes around. This makes
it somewhat more consistent with how the feature is used.

llvm-svn: 253122
2015-11-14 03:04:00 +00:00
Justin Bogner fff708db92 AArch64: Default AArch64Subtarget::ReserveX18 to true on darwin
Darwin reserves x18, so it's never ABI compliant to generate code that
uses it. Set the default value based on the OS part of the triple
rather than forcing front-ends to set the +reserve-x18 target feature
in order to build correct code for Darwin.

This will make r243310 redundant, so I'll revert that shortly.

llvm-svn: 253102
2015-11-13 23:05:46 +00:00
Colin LeMahieu 655489433c [Hexagon] Fixing memory leak during relaxation by allocating MCInst in MCContext.
llvm-svn: 253090
2015-11-13 21:45:50 +00:00
Reid Kleckner 75b4be9a11 [WinEH] Fix ESP management with 32-bit __CxxFrameHandler3
The C++ EH personality automatically restores ESP from the C++ EH
registration node after a catchret. I mistakenly thought it was like
SEH, which does not restore ESP.

It makes sense for C++ EH to differ from SEH here because SEH does not
use funclets for catches, and does not allow catching inside of finally.
C++ EH may need to unwind through multiple catch funclets and eventually
catchret to some outer funclet. Therefore, the runtime has to keep track
of which ESP to use with catchret, rather than having the compiler
reload it manually.

llvm-svn: 253084
2015-11-13 21:27:00 +00:00
Dan Gohman dd0071f440 [WebAssembly] Rename the Const instructions to be upper-case too.
llvm-svn: 253072
2015-11-13 20:27:45 +00:00
Dan Gohman f433324290 [WebAssembly] Rename memory intrinsics to be upper-case, following convention. NFC.
llvm-svn: 253070
2015-11-13 20:19:11 +00:00
Cong Hou ef4074bac2 [X86][SSE] Combine UNPCKL with vector_shuffle into UNPCKH to save one instruction for sext from v16i8 to v16i16 and v8i16 to v8i32.
This patch is enabling combining UNPCKL with vector_shuffle that moves the upper
half of a vector into the lower half, into a UNPCKH instruction. For example:

t2: v16i8 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t1, undef:v16i8
t3: v16i8 = X86ISD::UNPCKL undef:v16i8, t2

will be combined to:

t3: v16i8 = X86ISD::UNPCKH undef:v16i8, t1


Differential revision: http://reviews.llvm.org/D14399

llvm-svn: 253067
2015-11-13 19:47:43 +00:00
Reid Kleckner 94b57065c6 [WinEH] Make UnwindHelp a fixed stack object allocated after XMM CSRs
Now the offset of UnwindHelp in our EH tables and the offset that we
store to in the prologue agree.

llvm-svn: 253059
2015-11-13 19:06:01 +00:00
Colin LeMahieu f0af6e5243 [Hexagon] Factoring bundle creation in to a utility function.
llvm-svn: 253056
2015-11-13 17:42:46 +00:00
Tom Stellard afd6e2f3c3 AMDGPU: Add stony support
Patch by: Alex Deucher

llvm-svn: 253053
2015-11-13 17:06:32 +00:00
James Molloy b564098c62 [ARM] Replace ARMISD::RBIT with ISD::BITREVERSE
ISD::BITREVERSE matches "rbit" completely, so remove ARMISD::RBIT and mark ISD::BITREVERSE as legal, adding a test for lowering.

llvm-svn: 253047
2015-11-13 16:05:22 +00:00
Zlatko Buljan 32fb5c40d2 [mips][microMIPS] Implement SHRA[_R].PH, SHRAV[_R].PH, SHRAV[_R].QB, SHRAV_R.W, SHRA_R.W, SHRL.PH, SHRL.QB, SHRLV.PH and SHRLV.QB instructions
Differential Revision: http://reviews.llvm.org/D14010

llvm-svn: 253041
2015-11-13 13:14:25 +00:00
Ulrich Weigand 19d24d2699 [SystemZ] Simplify boolean conditional return statements
Use clang-tidy to simplify conditonal return statements.

Author: LegalizeAdulthood
Differential Revision: http://reviews.llvm.org/D9986

llvm-svn: 253038
2015-11-13 13:00:27 +00:00
Colin LeMahieu b3c97271e3 [Hexagon] Fixing leak in padEndloop by allocating in MCContext.
llvm-svn: 253019
2015-11-13 07:58:06 +00:00
Dan Gohman f19ed56288 [WebAssembly] Inline asm support.
llvm-svn: 252997
2015-11-13 01:42:29 +00:00
Colin LeMahieu 8bb168b160 [Hexagon] Adding relaxation functionality to backend and test.
llvm-svn: 252989
2015-11-13 01:12:25 +00:00
Dan Gohman bc58a7bad0 [WebAssembly] Un-mangle the conversion instruction names.
This arranges the types in the LLVM instruction names in the same order that
they appear in the WebAssembly opcode names, and eliminates
double-underscores.

llvm-svn: 252988
2015-11-13 00:50:04 +00:00
Dan Gohman 231244c304 [WebAssembly] Rename BR_IF_ to BR_IF
With MC-based instruction printing, we no longer need instruction names to
mangle in hints about how they should be printed.

llvm-svn: 252987
2015-11-13 00:46:31 +00:00
Dan Gohman c9dd057e3c [WebAssembly] Remove unneeded TODO items. NFC.
llvm-svn: 252985
2015-11-13 00:41:25 +00:00
Dan Gohman b1daa3aec7 [WebAssembly] Tidy up and update a TODO item. NFC.
llvm-svn: 252984
2015-11-13 00:40:37 +00:00
Joseph Tremoulet 149c433bcc [WinEH] Find root frame correctly in CLR funclets
Summary:
The value that the CoreCLR personality passes to a funclet for the
establisher frame may be the root function's frame or may be the parent
funclet's (mostly empty) frame in the case of nested funclets.  Each
funclet stores a pointer to the root frame in its own (mostly empty)
frame, as does the root function itself.  All frames allocate this slot at
the same offset, measured from the post-prolog stack pointer, so that the
same sequence can accept any ancestor as an establisher frame parameter
value, and so that a single offset can be reported to the GC, which also
looks at this slot.

This change allocate the slot when processing function entry, and records
its frame index on the WinEHFuncInfo object, then inserts the code to
set/copy it during prolog emission.


Reviewers: majnemer, AndyAyers, pgavlin, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14614

llvm-svn: 252983
2015-11-13 00:39:23 +00:00
Dan Gohman 058fce5435 [WebAssembly] Introduce a new pseudo-operand for unused expression results.
llvm-svn: 252975
2015-11-13 00:21:05 +00:00
Vyacheslav Klochkov cbc56baae6 X86-FMA3: Implemented commute transformations FMA*_Int instructions.
It made it possible to apply the memory folding optimization for the 2nd
operand of FMA*_Int instructions.

Reviewer: Quentin Colombet
Differential Revision: http://reviews.llvm.org/D14550

llvm-svn: 252973
2015-11-13 00:07:35 +00:00
Tom Stellard 0967c91e0c Revert "Remove unnecessary call to getAllocatableRegClass"
This reverts commit r252565.

This also includes the revert of the commit mentioned below in order to
avoid breaking tests in AMDGPU:

Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64"

This reverts commit r252674.

llvm-svn: 252956
2015-11-12 21:43:25 +00:00
Vyacheslav Klochkov 1ff9cbdfc0 My first/test commit. Removed a trailing whitespace.
llvm-svn: 252940
2015-11-12 20:11:57 +00:00
Benjamin Kramer 7c576d8bcf [Hexagon] Allocate MCInst in the MCContext to avoid leaking it.
Found by leaksanitizer.

llvm-svn: 252931
2015-11-12 19:30:40 +00:00
David Blaikie b0311c590d Roll an expression into an assert to fix -Wunused-variable in a -Asserts build
llvm-svn: 252925
2015-11-12 19:07:43 +00:00
Dan Gohman cf4748f180 [WebAssembly] Reapply r252858, with svn add for the new file.
Switch to MC for instruction printing.

This encompasses several changes which are all interconnected:
 - Use the MC framework for printing almost all instructions.
 - AsmStrings are now live.
 - This introduces an indirection between LLVM vregs and WebAssembly registers,
   and a new pass, WebAssemblyRegNumbering, for computing a basic the mapping.
   This addresses some basic issues with argument registers and unused registers.
 - The way ARGUMENT instructions are handled no longer generates redundant
   get_local+set_local for every argument.

This also changes the assembly syntax somewhat; most notably, MC's printing
does not use sigils on label names, so those are no longer present, and
push/pop now have a sigil to keep them unambiguous.

The usage of set_local/get_local/$push/$pop will continue to evolve
significantly. This patch is just one step of a larger change.

llvm-svn: 252910
2015-11-12 17:04:33 +00:00
Michael Zuckerman fd3fe9e45a [x86] translating "fp" (floating point) instructions from {fadd,fdiv,fmul,fsub,fsubr,fdivr} to {faddp,fdivp,fmulp,fsubp,fsubrp,fdivrp}
LLVM Missing the following instructions: fadd\fdiv\fmul\fsub\fsubr\fdivr.
GAS and MS supporting this instruction and lowering them in to a faddp\fdivp\fmulp\fsubp\fsubrp\fdivrp instructions.

Differential Revision: http://reviews.llvm.org/D14217

llvm-svn: 252908
2015-11-12 16:58:51 +00:00
Artyom Skrobov 2c2f378f8a Cull non-standard variants of ARM architectures (NFC)
Summary:
This patch changes ARMV5, ARMV5E, ARMV6SM, ARMV6HL, ARMV7, ARMV7L,
ARMV7HL, ARMV7EM to be treated as aliases for the corresponding
standard architectures, instead of as actual architectures.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14577

llvm-svn: 252903
2015-11-12 15:51:41 +00:00
Hans Wennborg 7384a2de02 Revert r252858: "[WebAssembly] Switch to MC for instruction printing."
It broke the CMake build:

"Cannot find source file: WebAssemblyRegNumbering.cpp"

llvm-svn: 252897
2015-11-12 14:37:56 +00:00
Vasileios Kalintiris 48e0256ed6 Re-apply "[mips] Use correct frame register for DWARF info when dynamically realigning the stack.""
r252219 reversed the direction of subprogram -> function edge. Fixed the
IR to account for this.

llvm-svn: 252895
2015-11-12 14:11:43 +00:00
James Molloy 8e99e97f2a [ARM] CMOV->BFI combining: handle both senses of CMPZ
I completely misunderstood what ARMISD::CMPZ means. It's not "compare equal to zero", it's "compare, only setting the zero/Z flag". It can either be equal-to-zero or not-equal-to-zero, and we weren't checking what sense it was.

If it's equal-to-zero, we can swap the operands around and pretend like it is not-equal-to-zero, which is both a bug fix and lets us handle more cases.

llvm-svn: 252891
2015-11-12 13:49:17 +00:00
Renato Golin 93064025bd Revert "[ARM] Enable shrink-wrapping by default."
This reverts commit r252825, as it broke ASAN on ARM. Investigating...

llvm-svn: 252889
2015-11-12 13:34:50 +00:00
Daniel Sanders 9f6ad49740 Implement .reloc (constant offset only) with support for R_MIPS_NONE and R_MIPS_32.
Summary:
Support for R_MIPS_NONE allows us to parse MIPS16's usage of .reloc.
R_MIPS_32 was included to be able to better test the directive.

Targets can add their relocations by overriding MCAsmBackend::getFixupKind().

Subscribers: grosbach, rafael, majnemer, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D13659

llvm-svn: 252888
2015-11-12 13:33:00 +00:00
Zlatko Buljan 797c2aec6b [mips][microMIPS] Implement LWM16, SB16, SH16, SW16, SWSP and SWM16 instructions
Differential Revision: http://reviews.llvm.org/D11406

llvm-svn: 252885
2015-11-12 13:21:33 +00:00
Vasileios Kalintiris d38860610d Revert "[mips] Use correct frame register for DWARF info when dynamically realigning the stack."
This reverts commit r252882. LLParser complains for invalid field 'function'
in DISubprogram.

llvm-svn: 252884
2015-11-12 13:19:11 +00:00
Vasileios Kalintiris 352eb55baf [mips] Use correct frame register for DWARF info when dynamically realigning the stack.
Summary:
This patch overrides TargetFrameLowering::getFrameIndexReference() in order to
specify the correct register when the function needs dynamic stack realignment.
The values returned from this function are used in order to create DW_AT_locations
for DWARF info. These locations would use the wrong registers as it's been
reported in PR25028.

Reviewers: dsanders

Subscribers: dean, llvm-commits

Differential Revision: http://reviews.llvm.org/D13511

llvm-svn: 252882
2015-11-12 13:04:16 +00:00
Dylan McKay c498ba3a3e Add AVR backend skeleton
This adds part of the target info code, and adds modifications to
the build scripts so that AVR is recognized a supported, experimental
backend.

It does not include any AVR-specific code, just the bare sources required
for a backend to exist.

From D14039.

llvm-svn: 252865
2015-11-12 09:26:44 +00:00
Dan Gohman 9dd55a8065 [WebAssembly] Switch to MC for instruction printing.
This encompasses several changes which are all interconnected:
 - Use the MC framework for printing almost all instructions.
 - AsmStrings are now live.
 - This introduces an indirection between LLVM vregs and WebAssembly registers,
   and a new pass, WebAssemblyRegNumbering, for computing a basic the mapping.
   This addresses some basic issues with argument registers and unused registers.
 - The way ARGUMENT instructions are handled no longer generates redundant
   get_local+set_local for every argument.

This also changes the assembly syntax somewhat; most notably, MC's printing
use sigils on label names, so those are no longer present, and push/pop now
have a sigil to keep them unambiguous.

The usage of set_local/get_local/$push/$pop will continue to evolve
significantly. This patch is just one step of a larger change.

llvm-svn: 252858
2015-11-12 06:10:03 +00:00
Manman Ren 3f2b9c18e2 [TLS on Darwin] use a different mask for tls calls on x86-64.
Calls involved in thread-local variable lookup save more registers
than normal calls.

rdar://problem/23073171

llvm-svn: 252837
2015-11-12 00:54:04 +00:00
Quentin Colombet 10f9813528 [ARM] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14357

rdar://problem/21942589

llvm-svn: 252825
2015-11-11 23:31:46 +00:00
Joseph Tremoulet 9f467353a5 [WinEH] Only generate UnwindHelp slot for MSVCXX
Summary: Other personalities don't use this special frame slot.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14580

llvm-svn: 252778
2015-11-11 19:21:09 +00:00
Sanjay Patel f740129198 [MIPS] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
MIPS32 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any MIPS32
implementation.

The net result of allowing this speculation for the regression tests in this patch is
that we get this code:

ctlz:
  jr  $ra
  clz  $2, $4

cttz:
  addiu  $1, $4, -1
  not  $2, $4
  and  $1, $2, $1
  clz  $1, $1
  addiu  $2, $zero, 32
  jr  $ra
  subu  $2, $2, $1

Instead of:

ctlz:
  beqz  $4, $BB0_2
  addiu  $2, $zero, 32
  clz  $2, $4
$BB0_2:
  jr  $ra
  nop

cttz:
  beqz  $4, $BB1_2
  addiu  $2, $zero, 32
  addiu  $1, $4, -1
  not  $2, $4
  and  $1, $2, $1
  clz  $1, $1
  addiu  $2, $zero, 32
  subu  $2, $2, $1
$BB1_2:
  jr  $ra
  nop

See D14469 for the larger motivation.

Differential Revision: http://reviews.llvm.org/D14500

llvm-svn: 252755
2015-11-11 17:24:56 +00:00
Diego Novillo 0767ae5896 Properly fix unused variable in disable-assert builds.
I missed the side-effects of ParseBFI in my previous attempt (r252748).
Thanks dblaikie for the suggestion of adding a void use of the unused
variable instead.

llvm-svn: 252751
2015-11-11 16:39:22 +00:00
Diego Novillo 29f88a2460 Remove unused variable in disable-assert builds. NFC.
llvm-svn: 252748
2015-11-11 16:14:52 +00:00
Douglas Katzman a14039764b Visibly fail if attempting to encode register AH,BH,CH,DH in a REX-prefixed instruction.
Differential Revision: http://reviews.llvm.org/D13316
Fixes PR25003

llvm-svn: 252743
2015-11-11 15:51:16 +00:00
James Molloy ce12c92f66 [ARM] Combine BFIs together
If we have a chain of BFIs, we may be able to combine several together into one merged BFI. We can do this if the "from" bits from one BFI OR'd with the "from" bits from the other BFI form a contiguous range, and the same with the "to" bits.

llvm-svn: 252740
2015-11-11 15:40:40 +00:00
Aaron Ballman 107bb0d193 Silencing nine warnings for "enumeral and non-enumeral type in conditional expression"; NFC.
llvm-svn: 252728
2015-11-11 13:44:06 +00:00
Michael Kuperstein 12982a816c [X86] Replace LEAs with INC/DEC when profitable
If possible and profitable, replace lea %reg, 1(%reg) and lea %reg, -1(%reg) with inc %reg and dec %reg respectively.

Patch by: anton.nadolsky@intel.com
Differential Revision: http://reviews.llvm.org/D14059

llvm-svn: 252722
2015-11-11 11:44:31 +00:00
Craig Topper b24a58e28f [X86] Fix feature flags on some MMX register instructions that really were introduced with SSE or SSE2.
llvm-svn: 252709
2015-11-11 07:29:25 +00:00
Craig Topper 700a1a23d7 [X86] Remove redundant MMX isel patterns.
llvm-svn: 252708
2015-11-11 07:29:22 +00:00
Dan Gohman 754cd11d90 [WebAssembly] Support non-legal argument and return types.
llvm-svn: 252687
2015-11-11 01:33:02 +00:00
Ahmed Bougacha 4a85643907 [MC] Use LShr for constant evaluation of ">>" on non-arm64 darwin.
Follow-up to r235963: this matches other assemblers and is less
unexpected (e.g. PR23227).

llvm-svn: 252681
2015-11-11 00:51:36 +00:00
Matt Arsenault 8246d4aead AMDGPU: Print more fields in comments
llvm-svn: 252677
2015-11-11 00:27:46 +00:00
Matt Arsenault 61cb6fa848 AMDGPU: Remove dead code
llvm-svn: 252675
2015-11-11 00:01:36 +00:00
Matt Arsenault 6690d7de39 AMDGPU: Set isAllocatable = 0 on VS_32/VS_64
llvm-svn: 252674
2015-11-11 00:01:32 +00:00
Reid Kleckner 7f84a939ed [WinEH] Insert the MBB for EH_RESTORE after the catchret
Inserting it before the target block could be bad, we might already have
a fallthrough edge to it.

llvm-svn: 252670
2015-11-10 23:22:20 +00:00
Dan Gohman 16d314d300 [WebAssembly] Remove special cases for things that are no longer special. NFC.
llvm-svn: 252656
2015-11-10 21:48:21 +00:00
Bill Schmidt 3c44c6f189 Add PPCMIPeephole.cpp to CMakeLists.txt
llvm-svn: 252654
2015-11-10 21:43:45 +00:00
Dan Gohman b84ae9bb38 [WebAssembly] Support for floating point min and max.
llvm-svn: 252653
2015-11-10 21:40:21 +00:00
Bill Schmidt 34af5e1c76 [PowerPC] Add an MI SSA peephole pass.
This patch adds a pass for doing PowerPC peephole optimizations at the
MI level while the code is still in SSA form.  This allows for easy
modifications to the instructions while depending on a subsequent pass
of DCE.  Both passes are very fast due to the characteristics of SSA.

At this time, the only peepholes added are for cleaning up various
redundancies involving the XXPERMDI instruction.  However, I would
expect this will be a useful place to add more peepholes for
inefficiencies generated during instruction selection.  The pass is
placed after VSX swap optimization, as it is best to let that pass
remove unnecessary swaps before performing any remaining clean-ups.

The utility of these clean-ups are demonstrated by changes to four
existing test cases, all of which now have tighter expected code
generation.  I've also added Eric Schweiz's bugpoint-reduced test from
PR25157, for which we now generate tight code.  One other test started
failing for me, and I've fixed it
(test/Transforms/PlaceSafepoints/finite-loops.ll) as well; this is not
related to my changes, and I'm not sure why it works before and not
after.  The problem is that the CHECK-NOT: of "statepoint" from test1
fails because of the "statepoint" in test2, and so forth.  Adding a
CHECK-LABEL in between keeps the different occurrences of that string
properly scoped.

llvm-svn: 252651
2015-11-10 21:38:26 +00:00
Sanjay Patel af1b48bfdc [ARM] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2
implementation.

The net result of allowing this speculation for the regression tests in this patch is
that we get this code:

ctlz:               
  clz  r0, r0
  bx  lr
cttz:              
  rbit  r0, r0
  clz  r0, r0
  bx  lr

Instead of:

ctlz:    
  cmp  r0, #0
  moveq  r0, #32
  clzne  r0, r0
  bx  lr
cttz:     
  cmp   r0, #0
  moveq  r0, #32
  rbitne  r0, r0
  clzne  r0, r0
  bx  lr

This will help solve a general speculation/despeculation problem noted in PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818

Differential Revision: http://reviews.llvm.org/D14469

llvm-svn: 252639
2015-11-10 19:24:31 +00:00
Sanjay Patel 241c31fb64 [AArch64] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
AArch64 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any AArch64
implementation.

The net result of allowing this speculation for the regression tests in this
patch is that we get this code:

ctlz:
  clz  w0, w0
  ret

cttz:
  rbit  w8, w0
  clz  w0, w8
  ret

Instead of:

ctlz:
  cbz  w0, .LBB0_2
  clz  w0, w0
  ret
.LBB0_2:
  orr  w0, wzr, #0x20
  ret

cttz:
  cbz  w0, .LBB1_2
  rbit  w8, w0
  clz  w0, w8
  ret
.LBB1_2:
  orr  w0, wzr, #0x20
  ret

See D14469 for the larger motivation.

Differential Revision: http://reviews.llvm.org/D14505

llvm-svn: 252625
2015-11-10 18:11:37 +00:00
Michael Kuperstein a01a5ee72f [X86] Do not try to custom-lower sitofp/fptosi in soft-float mode
Differential Revision: http://reviews.llvm.org/D14495

llvm-svn: 252621
2015-11-10 17:37:49 +00:00
James Molloy 9d55f19cfa Reapply "[ARM] Combine CMOV into BFI where possible"
Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh!

Original commit message:

If we have a CMOV, OR and AND combination such as:
  if (x & CN)
      y |= CM;

And:
  * CN is a single bit;
    * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252606
2015-11-10 14:22:05 +00:00
Tilmann Scheller 990a8d88c8 [PowerPC] Remove redundant code.
The local variable Hi is never being read.

Issue identified by the Clang static analyzer.

llvm-svn: 252600
2015-11-10 12:29:37 +00:00
Oliver Stannard d414c99b9c [AArch64] Fix halfword load merging for big-endian targets
For big-endian targets, when we merge two halfword loads into a word load, the
order of the halfwords in the loaded value is reversed compared to
little-endian, so the load-store optimiser needs to swap the destination
registers.

This does not affect merging of two word loads, as we use ldp, which treats the
memory as two separate 32-bit words.

llvm-svn: 252597
2015-11-10 11:04:18 +00:00
Igor Breger b6b27af46a AVX512 : Implemented encoding and DAG lowering for VMOVHPS/PD and VMOVLPS/PD instructions.
Differential Revision: http://reviews.llvm.org/D14492

llvm-svn: 252592
2015-11-10 07:09:07 +00:00
David Blaikie 578a31fe0a Remove another variable unused in -Asserts build
llvm-svn: 252582
2015-11-10 04:10:04 +00:00
David Blaikie e35168f008 Remove some unused variables to clean up the -Werror build
llvm-svn: 252580
2015-11-10 03:16:28 +00:00
Colin LeMahieu 3c7ecf9af1 [Hexagon] Adding instruction aliases and tests.
llvm-svn: 252579
2015-11-10 01:58:26 +00:00
Andy Ayers 809cbe9ea0 Support for emitting inline stack probes
For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages
between the current stack limit and the desired new stack pointer location. This implements support for
the inline expansion on x64.

For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call
is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications
that arise when introducing new machine basic blocks during prolog and epilog creation.

Added a new test case, modified an existing one to exclude non-x64 coreclr (for now).

Add test case

Fix tests

llvm-svn: 252578
2015-11-10 01:50:49 +00:00
Colin LeMahieu 13cc3ab785 [Hexagon] Fixing compound register printing and reenabling more tests.
llvm-svn: 252574
2015-11-10 00:51:56 +00:00
Tim Northover 339c83e27f AArch64: add experimental support for address tagging.
AArch64 has the ability to use the top 8-bits of an "address" for extra
information, with the memory subsystem automatically masking them off for loads
and stores. When that's happening, we can sometimes skip masks on memory
operations in the compiler.

However, this requires the host OS and support stack to preserve those bits so
it can't be enabled everywhere. In principle iOS 8.0 and above do take the
required precautions and but we'll put it under a flag for now.

llvm-svn: 252573
2015-11-10 00:44:23 +00:00
Derek Schuff ffa143ce81 [WebAssembly] Support 'unreachable' expression
Lower LLVM's 'unreachable' terminator to ISD::TRAP, and lower ISD::TRAP to
wasm's 'unreachable' expression.

WebAssembly type-checks expressions, but a noreturn function with a
return type that doesn't match the context will cause a check
failure. So we lower LLVM 'unreachable' to ISD::TRAP and then lower that
to WebAssembly's 'unreachable' expression, which typechecks in any
context and causes a trap if executed.

Differential Revision: http://reviews.llvm.org/D14515

llvm-svn: 252566
2015-11-10 00:30:57 +00:00
Colin LeMahieu b7a5f9fc29 [Hexagon] Fixing store instructions and reenabling a few more tests.
llvm-svn: 252561
2015-11-10 00:22:00 +00:00
Akira Hatanaka 3bfc3e2d2a [ARM] Handle t2ADDri in ARMAsmPrinter::EmitUnwindingInstruction.
This fixes a bug in ARMAsmPrinter::EmitUnwindingInstruction where
llvm_unreachable was reached because t2ADDri wasn't handled.

Test case provided by Tim Northover.

rdar://problem/23270609

http://reviews.llvm.org/D14518

llvm-svn: 252557
2015-11-10 00:10:41 +00:00
Colin LeMahieu 8ab7e8e1b5 [Hexagon] Fixing load instruction parsing and reenabling tests.
llvm-svn: 252555
2015-11-10 00:02:27 +00:00
Reid Kleckner 420f0542cc [WinEH] Remove isBarrier from instructions that do not return
Fixes machine verification failures with David's latest EH change.

llvm-svn: 252541
2015-11-09 23:34:42 +00:00
Sanjay Patel 533c10c651 add a SelectionDAG method to check if no common bits are set in two nodes; NFCI
This was suggested in:
http://reviews.llvm.org/D13956

and is a follow-on to:
http://reviews.llvm.org/rL252515
http://reviews.llvm.org/rL252519

This lets us remove logically equivalent/duplicated code from DAGCombiner and X86ISelDAGToDAG.

A corresponding function for IR instructions already exists in ValueTracking.

llvm-svn: 252539
2015-11-09 23:31:38 +00:00
David Majnemer 2652b75700 [WinEH] Don't emit CATCHRET from visitCatchPad
Instead, emit a CATCHPAD node which will get selected to a target
specific sequence.

llvm-svn: 252528
2015-11-09 23:07:48 +00:00
Sanjay Patel 32538d6811 [x86] try harder to match bitwise 'or' into an LEA
The motivation for this patch starts with the epic fail example in PR18007:
https://llvm.org/bugs/show_bug.cgi?id=18007

...unfortunately, this patch makes no difference for that case, but it solves some
simpler cases. We'll get there some day. :)

The current 'or' matching code was using computeKnownBits() via 
isBaseWithConstantOffset() -> MaskedValueIsZero(), but that's an unnecessarily limited use. 
We can do more by copying the logic in ValueTracking's haveNoCommonBitsSet(), so we can 
treat the 'or' as if it was an 'add'.

There's a TODO comment here because we should lift the bit-checking logic into a helper
function, so it's not duplicated in DAGCombiner.

An example of the better LEA matching:

leal (%rdi,%rdi), %eax
andl $1, %esi
orl %esi, %eax

Becomes:

andl $1, %esi
leal (%rsi,%rdi,2), %eax

Differential Revision: http://reviews.llvm.org/D13956

llvm-svn: 252515
2015-11-09 21:16:49 +00:00
Colin LeMahieu 9d851f0435 [Hexagon] Separating statement to match what clang-format would do.
llvm-svn: 252513
2015-11-09 21:06:28 +00:00
Reid Kleckner 64b003f05d [WinEH] Tweak funclet prologue/epilogue insertion to pass verifier
For some reason we'd never run MachineVerifier on WinEH code, and you
explicitly have to ask for it with llc. I added it to a few test cases
to get some coverage.

Fixes PR25461.

llvm-svn: 252512
2015-11-09 21:04:00 +00:00
Reid Kleckner 390191dacc [Hexagon] Fix -Wmicrosoft-enum-value warning with explicit enum type
llvm-svn: 252505
2015-11-09 19:44:38 +00:00
Sanjay Patel 776e59b0fe don't repeat function names in comments; NFC
llvm-svn: 252502
2015-11-09 19:18:26 +00:00
Charlie Turner 90dafb1b6d [AArch64] Add UABDL patterns for log2 shuffle.
Summary:
This matches the sum-of-absdiff patterns emitted by the vectoriser using log2 shuffles.

Relies on D14207 to be able to match the `extract_subvector(..., 0)`

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14208

llvm-svn: 252465
2015-11-09 13:10:52 +00:00
Charlie Turner 7b7b06f737 [AArch64] Handle extract_subvector(..., 0) in ISel.
Summary:
Lowering this pattern early to an `EXTRACT_SUBREG` was making it impossible to match larger patterns in tblgen that use `extract_subvector(..., 0)` as part of the their input pattern.

It seems like there will exist somewhere a better way of specifying this pattern over all relevant register value types, but I didn't manage to find it.

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14207

llvm-svn: 252464
2015-11-09 12:45:11 +00:00
Renato Golin 6d435f12f0 [EABI] Add LLVM support for -meabi flag
"GCC requires the freestanding environment provide memcpy, memmove, memset
and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html

Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent
'__aeabi_memops'. This convertion violates GCC contract.

The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI
targets.

Without -meabi: use the triple default EABI.
With -meabi=default: use the triple default EABI.
With -meabi=gnu: use 'memops'.
With -meabi=4 or -meabi=5: use '__aeabi_memops'.
With -meabi set to an unknown value: same as -meabi=default.

Patch by Vinicius Tinti.

llvm-svn: 252462
2015-11-09 12:40:30 +00:00
Renato Golin 1d8a2c952f Revert "[ARM] Combine CMOV into BFI where possible"
This reverts commit r252057, as it broke ARM self-hosting buildbots, probably
due to a code-gen fault.

llvm-svn: 252460
2015-11-09 12:19:10 +00:00
Colin LeMahieu 9ea507edc7 [Hexagon] Adding override to methods.
llvm-svn: 252453
2015-11-09 07:10:24 +00:00
Colin LeMahieu 775d7ad677 [Hexagon] Fixing warnings.
llvm-svn: 252448
2015-11-09 05:47:56 +00:00
Colin LeMahieu a1adb51e6b [Hexagon] Removing extra gen line.
llvm-svn: 252447
2015-11-09 05:31:39 +00:00
Colin LeMahieu 892f54f408 [Hexagon] Maybe the makefile?
llvm-svn: 252446
2015-11-09 05:16:08 +00:00
Colin LeMahieu d5537bf219 [Hexagon] Adding LLVMBuild.txt reference to HexagonAsmParser.
llvm-svn: 252444
2015-11-09 04:31:02 +00:00
Colin LeMahieu 7cd0892729 [Hexagon] Enabling ASM parsing on Hexagon backend and adding instruction parsing tests. General updating of the code emission.
llvm-svn: 252443
2015-11-09 04:07:48 +00:00
Colin LeMahieu 8a0453e23a [AsmParser] Backends can parameterize ASM tokenization.
llvm-svn: 252439
2015-11-09 00:31:07 +00:00
Hal Finkel f046f72efa [PowerPC] Fix LoopPreIncPrep not to depend on SCEV constant simplifications
Under most circumstances, if SCEV can simplify X-Y to a constant, then it can
also simplify Y-X to a constant. However, there is no guarantee that this is
always true, and concensus is not to consider that a correctness bug in SCEV
(although it is undesirable).

PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and
prefetches) into buckets, where in each bucket the relative pointer offsets are
constant. We used to keep each bucket as a multimap, where SCEV's subtraction
operation was used to define the ordering predicate. Instead, use a fixed SCEV
base expression for each bucket, record the constant offsets from that base
expression, and adjust it later, if desirable, once all pointers have been
collected.

Doing it this way should be more compile-time efficient than the previous
scheme (in addition to making the implementation less sensitive to SCEV
simplification quirks).

Fixes PR25170.

llvm-svn: 252417
2015-11-08 08:04:40 +00:00
David Majnemer e35244cf63 [WinEH] Update PHIs of CATCHRET successors
The TailDuplication machine pass ran across a malformed CFG: a PHI node
referred it's predecessor's predecessor instead of it's predecessor.
This occurred because we split the edge in X86ISelLowering when we
processed the CATCHRET but forgot to do something about the PHI nodes.

This fixes PR25444.

llvm-svn: 252413
2015-11-08 02:36:00 +00:00
Nico Weber 00406472e8 Try to fix build more -- like r252392 but for WebAssembly.
llvm-svn: 252394
2015-11-07 02:47:31 +00:00
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
Duncan P. N. Exon Smith 83c4b68720 ADT: Remove last implicit ilist iterator conversions, NFC
Some implicit ilist iterator conversions have crept back into Analysis,
Transforms, Hexagon, and llvm-stress.  This removes them.

I'll commit a patch immediately after this to disallow them (in a
separate patch so that it's easy to revert if necessary).

llvm-svn: 252371
2015-11-07 00:01:16 +00:00
Ahmed Bougacha cf49b523a0 [AArch64][FastISel] Don't even try to select vector icmps.
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).

Give up, let SDAG agree with itself on a vNi1 legalization strategy.

llvm-svn: 252364
2015-11-06 23:16:53 +00:00
Ahmed Bougacha b49eb3ab4b [X86] Fold (trunc (i32 (zextload i16))) into vbroadcast.
When matching non-LSB-extracting truncating broadcasts, we now insert
the necessary SRL. If the scalar resulted from a load, the SRL will be
folded into it, creating a narrower, offset, load.

However, i16 loads aren't Desirable, so we get i16->i32 zextloads.
We already catch i16 aextloads; catch these as well.

llvm-svn: 252363
2015-11-06 23:16:48 +00:00
Ahmed Bougacha 05a0514b12 [X86] SRL non-LSB extracts when folding to truncating broadcasts.
Now that we recognize this, we can support it instead of bailing out.
That is, we can fold:
  (v8i16 (shufflevector
    (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
    <1,1,...,1>))
into:
  (v8i16 (vbroadcast (i16 (trunc (srl Y, 16)))))

llvm-svn: 252362
2015-11-06 23:16:43 +00:00
Ahmed Bougacha 68614a36d1 [X86] Don't fold non-LSB extracts into truncating broadcasts.
We used to incorrectly assume that the offset we're extracting from
was a multiple of the element size. So, we'd fold:
  (v8i16 (shufflevector
    (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
    <1,1,...,1>))
into:
  (v8i16 (vbroadcast (i16 (trunc Y))))
whereas we should have extracted the higher bits from X.

Instead, bail out if the assumption doesn't hold.

llvm-svn: 252361
2015-11-06 23:16:38 +00:00
Tom Stellard 41b7e63040 AMDGPU/SI: Refactor VOP[12C] tablegen definitions
Summary:
Pass the VOPProfile object all the through to *_m multiclasses.  This will
allow us to do more simplifications in the future.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13437

llvm-svn: 252339
2015-11-06 20:56:18 +00:00
Andrew Kaylor 4731bea3e5 Improved the operands commute transformation for X86-FMA3 instructions.
All 3 operands of FMA3 instructions are commutable now.

Patch by Slava Klochkov

Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab).

Differential Revision: http://reviews.llvm.org/D13269

llvm-svn: 252335
2015-11-06 19:47:25 +00:00
Dan Gohman 4b96d8d1ff [WebAssembly] Make expression-stack pushing explicit
Modelling of the expression stack is evolving. This patch takes another
step by making pushes explicit.

Differential Revision: http://reviews.llvm.org/D14338

llvm-svn: 252334
2015-11-06 19:45:01 +00:00
Matt Arsenault f59e538937 AMDGPU: Cleanup includes
llvm-svn: 252328
2015-11-06 18:23:00 +00:00
Matt Arsenault 0c90e9501e AMDGPU: Create emergency stack slots during frame lowering
Test has a bogus verifier error which will be fixed by later commits.

llvm-svn: 252327
2015-11-06 18:17:45 +00:00
Matt Arsenault 08f14de244 AMDGPU: Remove unused scratch resource operands
The SGPR spill pseudos don't actually use them.

llvm-svn: 252324
2015-11-06 18:07:53 +00:00
Matt Arsenault 3931948bb6 AMDGPU: Add pass to detect used kernel features
Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.

For now this only detects the workitem intrinsics.

llvm-svn: 252323
2015-11-06 18:01:57 +00:00
Matt Arsenault 4dc7a5a5c6 AMDGPU: Fix hardcoded alignment of spill.
Instead of forcing 4 alignment when spilled, set register class
alignments.

llvm-svn: 252322
2015-11-06 17:54:47 +00:00
Matt Arsenault 623e6fd466 AMDGPU: Hack for VS_32 register pressure
For some reason VS_32 ends up factoring into the pressure heuristics
even though we should never see a virtual register with this class.

When SGPRs are reserved for register spilling, this for some reason
triggers reg-crit scheduling.

Setting isAllocatable = 0 may help with this since that seems to remove
it from the default implementation's generated table.

llvm-svn: 252321
2015-11-06 17:54:43 +00:00
Reid Kleckner b8fd162fc5 [WinEH] Mark funclet entries and exits as clobbering all registers
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.

Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer

Subscribers: MatzeB, llvm-commits

Differential Revision: http://reviews.llvm.org/D14407

llvm-svn: 252318
2015-11-06 17:06:38 +00:00
Jun Bum Lim 22fe15ee86 [AArch64]Enable the narrow ld promotion only on profitable microarchitectures
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.

llvm-svn: 252316
2015-11-06 16:27:47 +00:00
Daniel Sanders 5762a4f9d1 [mips][ias] Range check uimm4 operands and fixed a bug this revealed.
Summary:
The bug was that the sldi instructions have immediate widths dependant on
their element size. So sldi.d has a 1-bit immediate and sldi.b has a 4-bit
immediate. All of these were using 4-bit immediates previously.

Reviewers: vkalintiris

Subscribers: llvm-commits, atanasyan, dsanders

Differential Revision: http://reviews.llvm.org/D14018

llvm-svn: 252297
2015-11-06 12:41:43 +00:00
Daniel Sanders 38ce0f629c [mips][ias] Range check uimm3 operands.
Summary:

Reviewers: vkalintiris

Subscribers: atanasyan, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14016

llvm-svn: 252296
2015-11-06 12:31:27 +00:00
Daniel Sanders ea4f653d18 [mips][ias] Range check uimm2 operands and fix a bug this revealed.
Summary:
The bug was that the MIPS32R6/MIPS64R6/microMIPS32R6 versions of LSA and DLSA
(unlike the MSA version) failed to account for the off-by-one encoding of the
immediate. The range is actually 1..4 rather than 0..3.

Reviewers: vkalintiris

Subscribers: atanasyan, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14015

llvm-svn: 252295
2015-11-06 12:22:31 +00:00
Daniel Sanders 52da7af4d2 [mips][ias] Range check uimmz operands.
Reviewers: vkalintiris

Subscribers: dsanders, atanasyan, llvm-commits

Differential Revision: http://reviews.llvm.org/D14013

llvm-svn: 252294
2015-11-06 12:11:03 +00:00
Vasileios Kalintiris b04672cade [mips] Define patterns for the atomic_{load,store}_{8,16,32,64} nodes.
Summary:
Without these patterns we would generate a complete LL/SC sequence.
This would be problematic for memory regions marked as WRITE-only or
READ-only, as the instructions LL/SC would read/write to the protected
memory regions correspondingly.

Reviewers: dsanders

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14397

llvm-svn: 252293
2015-11-06 12:07:20 +00:00
Tom Stellard 1e1b05db24 AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13804

llvm-svn: 252291
2015-11-06 11:45:14 +00:00
Reid Kleckner 51460c139e [WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EH
This adds the EH_RESTORE x86 pseudo instr, which is responsible for
restoring the stack pointers: EBP and ESP, and ESI if stack realignment
is involved. We only need this on 32-bit x86, because on x64 the runtime
restores CSRs for us.

Previously we had to keep the CATCHRET instruction around during SEH so
that we could convince X86FrameLowering to restore our frame pointers.
Now we can split these instructions earlier.

This was confusing, because we had a return instruction which wasn't
really a return and was ultimately going to be removed by
X86FrameLowering. This change also simplifies X86FrameLowering, which
really shouldn't be building new MBBs.

No observable functional change currently, but with the new register
mask stuff in D14407, CATCHRET will become a register allocator barrier,
and our existing tests rely on us having reasonable register allocation
around SEH.

llvm-svn: 252266
2015-11-06 01:49:05 +00:00
Tim Northover 775aaeb765 Remove windows line endings introduced by r252177. NFC.
llvm-svn: 252217
2015-11-05 21:54:58 +00:00
Reid Kleckner 6ddae31045 [WinEH] Fix funclet prologues with stack realignment
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.

While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.

llvm-svn: 252210
2015-11-05 21:09:49 +00:00
Dan Gohman b9ce5a8b6c [WebAssembly] Fix copypasta.
Noticed by dschff in http://reviews.llvm.org/rL252203

llvm-svn: 252208
2015-11-05 20:59:49 +00:00
Dan Gohman da7f428a4a [WebAssembly] Rename Immediate instructions to Const.
This more closely reflects the naming convention in the spec.

llvm-svn: 252204
2015-11-05 20:44:29 +00:00
Dan Gohman af29bd4fd4 [WebAssembly] Add AsmString strings for most instructions.
Mangling type information into MachineInstr opcode names was a temporary
measure, and it's starting to get hairy. At the same time, the MC instruction
printer wants to use AsmString strings for printing. This patch takes the
first step, starting the process of adding AsmStrings for instructions.

llvm-svn: 252203
2015-11-05 20:42:30 +00:00
Dan Gohman d7ffb919c1 [WebAssembly] Update wasm builtin functions to match spec changes.
The page_size operator has been removed from the spec, and the resize_memory
operator has been changed to grow_memory.

llvm-svn: 252202
2015-11-05 20:16:59 +00:00
Sanjay Patel 387e66e79f replace MachineCombinerPattern namespace and enum with enum class; NFCI
Also, remove an enum hack where enum values were used as indexes into an array.

We may want to make this a real class to allow pattern-based queries/customization (D13417).

llvm-svn: 252196
2015-11-05 19:34:57 +00:00
Dan Gohman e9361d58ff [WebAssembly] Add WebAssemblyMCInstLower.cpp.
This isn't used yet; it's just a start towards eventually using MC to
do instruction printing, and eventually binary encoding.

llvm-svn: 252194
2015-11-05 19:28:16 +00:00
Oleg Ranevskyy 057c5a6b2b [DebugInfo] Fix ARM/AArch64 prologue_end position. Related to D11268.
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.

D11268 is quite old and has merge conflicts against the current trunk.
This request 
 - rebases D11268 onto the new trunk;
 - resolves the merge conflicts;
 - fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.

Reviewers: echristo, rengolin, kubabrecka

Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl

Differential Revision: http://reviews.llvm.org/D14338

llvm-svn: 252177
2015-11-05 17:50:17 +00:00
Petar Jovanovic 99fba3c141 Add cfi instr for CFA calculation when movpc is expanded to call and pop
This fixes the issue of wrong CFA calculation in the following case:

0x08048400 <+0>:	push   %ebx
0x08048401 <+1>:	sub    $0x8,%esp
0x08048404 <+4>:	**call   0x8048409 <test+9>**
0x08048409 <+9>:	**pop    %eax**
0x0804840a <+10>:	add    $0x1bf7,%eax
0x08048410 <+16>:	mov    %eax,%ebx
0x08048412 <+18>:	call   0x80483f0 <bar>
0x08048417 <+23>:	add    $0x8,%esp
0x0804841a <+26>:	pop    %ebx
0x0804841b <+27>:	ret

The highlighted instructions are a product of movpc instruction. The call
instruction changes the stack pointer, and pop instruction restores its
value. However, the rule for computing CFA is not updated and is wrong on
the pop instruction. So, e.g. backtrace in gdb does not work when on the pop
instruction. This adds cfi instructions for both call and pop instructions.

cfi_adjust_cfa_offset** instruction is used with the appropriate offset for
setting the rules to calculate CFA correctly.

Patch by Violeta Vukobrat.

Differential Revision: http://reviews.llvm.org/D14021

llvm-svn: 252176
2015-11-05 17:19:59 +00:00
Derek Schuff 8a76b04a63 [WebAssembly] Rename ior operator to or to match the spec
Summary: The spec uses "or" for inclusive-or and "xor" for exclusive-or

Reviewers: sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D14362

llvm-svn: 252174
2015-11-05 17:08:11 +00:00
James Molloy bef6e43107 [ARM] Compute known bits for ARMISD::CMOV
We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities.

I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :(

This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case.

llvm-svn: 252168
2015-11-05 15:21:58 +00:00
Asaf Badouh f99c054ebc revert rev. 252153 due to build failure on ubuntu
[X86][AVX512] add comi with Sae

llvm-svn: 252154
2015-11-05 08:55:54 +00:00
Asaf Badouh 7fdabf0a35 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 252153
2015-11-05 08:45:06 +00:00
Asaf Badouh a8209d92cc [X86][AVX512] small bugfix in VPBROADCASTM
VPBROADCASTMW2D and VPBROADCASTMB2Q

Differential Revision: http://reviews.llvm.org/D14335

llvm-svn: 252151
2015-11-05 08:08:21 +00:00
Matt Arsenault 5b22dfa65d AMDGPU: Also track whether SGPRs were spilled
llvm-svn: 252145
2015-11-05 05:27:10 +00:00
Matt Arsenault d41c0dbff0 AMDGPU: Print number user SGPRs
This doesn't quite match how SC prints it, which doesn't put it in a
comment.

llvm-svn: 252144
2015-11-05 05:27:07 +00:00
Matt Arsenault 68802d3177 AMDGPU: Disallow s[102:103] on VI in assembler
llvm-svn: 252142
2015-11-05 03:11:27 +00:00
Matt Arsenault a40450cba2 AMDGPU: Fix assert when legalizing atomic operands
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.

This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.

llvm-svn: 252140
2015-11-05 02:46:56 +00:00
Matt Arsenault bed42a7320 AMDGPU: Make addr64 atomic operand order consistent
vaddr comes before srsrc in every other MUBUF instruction,
and is the order it is printed.

llvm-svn: 252139
2015-11-05 02:46:53 +00:00
Joseph Tremoulet 6afccf6120 [WinEH] Fix establisher param reg in CLR funclets
Summary:
The CLR's personality routine passes the pointer to the establisher frame
in RCX, not RDX.

Reviewers: pgavlin, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14343

llvm-svn: 252135
2015-11-05 02:20:07 +00:00
Rafael Espindola e61a902371 Go back to producing relocations for out of range symbols.
This brings back the behavior from before r252090 for out of range symbols.

Should bring some arm bots back.

llvm-svn: 252119
2015-11-05 01:10:15 +00:00
Matt Arsenault 6c2e200d38 AMDGPU: Fix typo
llvm-svn: 252116
2015-11-05 01:03:08 +00:00
Rafael Espindola 49b8548903 Slightly saner handling of thumb branches.
The generic infrastructure already did a lot of work to decide if the
fixup value is know or not. It doesn't make sense to reimplement a very
basic case: same fragment.

llvm-svn: 252090
2015-11-04 23:00:39 +00:00
Quentin Colombet 421723cdd8 [x86] Teach the shrink-wrapping hooks to do the proper thing with Win64.
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.

Fixes PR24193.

llvm-svn: 252088
2015-11-04 22:37:28 +00:00
Simon Pilgrim f669d381f9 Warning fix.
llvm-svn: 252078
2015-11-04 21:27:22 +00:00
Simon Pilgrim 7e6606f4f1 [X86][SSE] Add general memory folding for (V)INSERTPS instruction
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.

The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.

This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.

It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.

Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll

Differential Revision: http://reviews.llvm.org/D13988

llvm-svn: 252074
2015-11-04 20:48:09 +00:00
Sanjoy Das b11b440f8e [IR] Add bounds checking to paramHasAttr
Summary:
This is intended to make a later change simpler.

Note: adding this bounds checking required fixing `X86FastISel`.  As
far I can tell I've preserved original behavior but a careful review
will be appreciated.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14304

llvm-svn: 252073
2015-11-04 20:33:45 +00:00
Andrew Kaylor e41a8c4182 Created new X86 FMA3 opcodes (FMA*_Int) that are used now for lowering of scalar FMA intrinsics.
Patch by Slava Klochkov 

The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.

Reviewers: Quentin Colombet, Elena Demikhovsky

Differential Revision: http://reviews.llvm.org/D13710

llvm-svn: 252060
2015-11-04 18:10:41 +00:00
James Molloy e7d679cf4c [ARM] Combine CMOV into BFI where possible
If we have a CMOV, OR and AND combination such as:
  if (x & CN)
    y |= CM;

And:
  * CN is a single bit;
  * All bits covered by CM are known zero in y;

Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).

llvm-svn: 252057
2015-11-04 16:55:07 +00:00
Michael Kuperstein a3b79dd783 [ELF] elfiamcu triple should imply e_machine == EM_IAMCU
Differential Revision: http://reviews.llvm.org/D14109

llvm-svn: 252043
2015-11-04 11:21:50 +00:00
Michael Kuperstein b34de72269 [X86] DAGCombine should not introduce FILD in soft-float mode
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp 
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.

llvm-svn: 252042
2015-11-04 11:17:53 +00:00
Peter Collingbourne 94d778697a CodeGen, Target: Move Mach-O-specific symbol name logic to Mach-O lowering.
A profile of an LTO link of Chrome revealed that we were spending some
~30-50% of execution time in the function Constant::getRelocationInfo(),
which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn
from TargetMachine::getNameWithPrefix().

It turns out that we only need the result of getKindForGlobal() when
targeting Mach-O, so this change moves the relevant part of the logic to
TargetLoweringObjectFileMachO.

NFCI.

Differential Revision: http://reviews.llvm.org/D14168

llvm-svn: 252014
2015-11-03 23:40:03 +00:00
Matt Arsenault aac9b49325 AMDGPU: Make flat_scratch name consistent
The printed name and the parsed assembler names weren't the same.
I'm not sure which name SC prints these as, but I think it's this one.

llvm-svn: 252010
2015-11-03 22:50:34 +00:00
Matt Arsenault 967c2f5dee AMDGPU: Fix asserts on invalid register ranges
If the requested SGPR was not actually aligned, it was
accepted and rounded down instead of rejected.

Also fix an assert if the range is an invalid size.

llvm-svn: 252009
2015-11-03 22:50:32 +00:00
Matt Arsenault 3473c72aab AMDGPU: Fix off by one error in register parsing
If trying to use one past the end, this would assert.

llvm-svn: 252008
2015-11-03 22:50:27 +00:00
Derek Schuff b44d4d350e Align whitespace
llvm-svn: 252003
2015-11-03 22:40:43 +00:00
Derek Schuff 6b5c6da760 [WebAssembly] Support wasm select operator
Summary:
Add support for wasm's select operator, and lower LLVM's select DAG node
to it.

Reviewers: sunfish

Subscribers: dschuff, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D14295

llvm-svn: 252002
2015-11-03 22:40:40 +00:00
Matt Arsenault e8ed13d946 AMDGPU: s[102:103] is unavailable on VI
llvm-svn: 252000
2015-11-03 22:39:52 +00:00
Matt Arsenault 192b282bf3 AMDGPU: Define correct number of SGPRs
There are actually 104 so 2 were missing.

More assembler tests with high register number tuples
will be included in later patches.

llvm-svn: 251999
2015-11-03 22:39:50 +00:00
Matt Arsenault 6c0674112a AMDGPU: Make findUsedSGPR more readable
Add more comments etc.

llvm-svn: 251996
2015-11-03 22:30:15 +00:00
Matt Arsenault 782c03bb7e AMDGPU: Initialize SIFixSGPRCopies so -print-after works
llvm-svn: 251995
2015-11-03 22:30:13 +00:00
Matt Arsenault d9d659aa23 AMDGPU: Alphabetize includes
llvm-svn: 251994
2015-11-03 22:30:08 +00:00
Simon Pilgrim e88dc04c48 [X86][XOP] Add support for the matching of the VPCMOV bit select instruction
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )

This patch adds tablegen pattern matching for this instruction.

Differential Revision: http://reviews.llvm.org/D8841

llvm-svn: 251975
2015-11-03 20:27:01 +00:00
Michael Kuperstein 73dc85293f [X86] Generate .cfi_adjust_cfa_offset correctly when pushing arguments
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.

Darwin does not support this well, so don't use pushes whenever it
would be required.

Differential Revision: http://reviews.llvm.org/D13767

llvm-svn: 251904
2015-11-03 08:17:25 +00:00
Igor Breger 4ec5abffae AVX512: add encoding tests for vmovq/d instructions.
llvm-svn: 251903
2015-11-03 07:30:17 +00:00
Matthias Braun f538e133cc Fix build problme introduced in r251883
llvm-svn: 251888
2015-11-03 02:19:07 +00:00
Matthias Braun 93563e7032 ScheduleDAGInstrs: Remove IsPostRA flag; NFC
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.

The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.

Differential Revision: http://reviews.llvm.org/D14245

llvm-svn: 251883
2015-11-03 01:53:29 +00:00
Colin LeMahieu 160f73e36f [Hexagon] Fixing mistaken case fallthrough.
llvm-svn: 251867
2015-11-03 00:21:19 +00:00
Matt Arsenault f1aebbf33a AMDGPU: Stop assuming vreg for build_vector
This was causing a variety of test failures when v2i64
is added as a legal type.

SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.

llvm-svn: 251860
2015-11-02 23:30:48 +00:00
Derek Schuff 43e96c4feb [WebAssembly] Make WebAssemblyCodeGen depend on WebAssemblyAsmPrinter
llvm-svn: 251859
2015-11-02 23:23:16 +00:00
Matt Arsenault d48da14269 AMDGPU: Error on graphics shaders with HSA
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.

llvm-svn: 251858
2015-11-02 23:23:02 +00:00
Matt Arsenault 0de924b76d AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE
Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.

llvm-svn: 251855
2015-11-02 23:15:42 +00:00
Bill Schmidt 8ed7cec170 [PPC64LE] Properly initialize instr-info in PPCVSXSwapRemoval pass
Replace some hacky code with the proper way to get at this data.

No functional change.

llvm-svn: 251848
2015-11-02 22:43:57 +00:00
Tim Northover 155103ec18 WatchOS: update default CPU for triple after t2dsp -> dsp rename
llvm-svn: 251814
2015-11-02 18:21:07 +00:00
Nemanja Ivanovic be5f0c04f1 Fix for bootstrap bug introduced in r244921
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.

llvm-svn: 251798
2015-11-02 14:01:11 +00:00
Igor Breger fa798a9dbb AVX512: Implemented encoding and intrinsics for VBROADCASTI32x2 and VBROADCASTF32x2 instructions.
Differential Revision: http://reviews.llvm.org/D14216

llvm-svn: 251781
2015-11-02 07:39:36 +00:00
Craig Topper 45e83b8ba7 [X86] Remove assertions that check for valid scale values on scatter/gather intrinsics. Nothing upstream prevented illegal values from getting here.
llvm-svn: 251780
2015-11-02 07:24:40 +00:00
Craig Topper e69eb78510 [X86] Fold 'if' followed by just an llvm_unreachable into an assert.
llvm-svn: 251778
2015-11-02 07:24:34 +00:00
Craig Topper aebab7c03f [X86] Use isa instead of dyn_cast in a bool context. NFC
llvm-svn: 251777
2015-11-02 07:24:32 +00:00
Craig Topper c70af642a2 [X86] Remove some llvm_unreachables after switches that already have an unreachable in their default case.
llvm-svn: 251776
2015-11-02 07:24:30 +00:00
Craig Topper d6a77ca4bb [X86] Remove a 'break' after an llvm_unreachable.
llvm-svn: 251775
2015-11-02 07:24:27 +00:00
Craig Topper d49a41793c [X86] Use cast instead of dyn_cast and a null check marked unreachable.
llvm-svn: 251774
2015-11-02 07:24:25 +00:00
Craig Topper 95ceb5a60a [X86] Use MVT instead of EVT when the type is known to be simple. NFC
llvm-svn: 251772
2015-11-02 05:24:22 +00:00
NAKAMURA Takumi 50df0c2037 Untabify.
llvm-svn: 251769
2015-11-02 01:38:12 +00:00
Elena Demikhovsky db738d9cc3 AVX-512: Optimized SIMD truncate operations for AVX512F set.
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.


Differential Revision: http://reviews.llvm.org/D14108

llvm-svn: 251764
2015-11-01 11:45:47 +00:00
Craig Topper ec2ea4817e [X86] Replace getScalarType with getVectorElementType when the type is already known to be a vector. This should result in slightly less code. NFC
llvm-svn: 251751
2015-10-31 21:44:52 +00:00
Craig Topper 476be8f94a [X86] Convert to MVT instead of calling EVT functions since we already know the type is simple. NFC
llvm-svn: 251745
2015-10-31 18:14:17 +00:00
Craig Topper 0fec4d8ce7 [X86] Call getScalarSizeInBits() instead of getScalarType().getScalarSizeInBits(). NFC
llvm-svn: 251744
2015-10-31 18:14:15 +00:00
Craig Topper 0e7680da9f [X86] Remove two const references to the return value of a constructor and just use normal object creation syntax. NFC
llvm-svn: 251743
2015-10-31 17:28:02 +00:00
Craig Topper 7b1d3a8a6c [X86] Replace EVT with MVT in some more places. NFC
llvm-svn: 251742
2015-10-31 17:27:59 +00:00
Craig Topper 63c2925b87 [X86] Fix indentation of case statements in switch. NFC
llvm-svn: 251741
2015-10-31 17:27:56 +00:00
Craig Topper 5c8a378f48 [X86] Reduce math for index calculation for inserting and extracting subvectors and elements by exploiting the fact that all supported vector types have a power 2 number of elements.
llvm-svn: 251740
2015-10-31 17:27:52 +00:00
JF Bastien 5789a69435 [WebAssembly] Fix import statement
Summary:
Imports should be generated like (param i32 f32...) not (param i32) (param f32) ...

Author: binji
Reviewers: jfb
Subscribers: jfb, dschuff
llvm-svn: 251714
2015-10-30 16:41:21 +00:00
Craig Topper 9377f01f21 [X86] Use is128BitVector/is256BitVector/is512BitVector in place of getSizeInBits == in some places. NFC
llvm-svn: 251687
2015-10-30 04:31:18 +00:00
Craig Topper 62c3ed0ae3 [X86] Minor formatting fixes. NFC.
llvm-svn: 251686
2015-10-30 04:31:14 +00:00
Craig Topper 9ef327c962 [X86] Use MVT instead of EVT in some places. NFC
Prior to this the compiled code probably had extra checks for extended types that won't ever execute.

llvm-svn: 251682
2015-10-30 03:19:12 +00:00
Simon Pilgrim ca56a72af9 [X86][SSE] Shuffle blends with zero
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.

Differential Revision: http://reviews.llvm.org/D14050

llvm-svn: 251659
2015-10-29 22:11:28 +00:00
Jonas Paulsson 45d5c673ec [SystemZ] Make the CCRegs regclass non-allocatable.
This was discovered to be necessary while running memchr-01.ll with
-verify-machinstrs, because it is not allowed to have a phys reg live
accross block boundaries while on SSA form, if the register is
allocatable (expect in entry block and landing pads).

In this test case, stringRRE pseudos are expanded after isel by adding
a loop block which produces a live out CC register. To make the test
pass, it was also necessary to not say that StringRRELoop pseudo uses
R0L, this is only true for the StringRRE opcode.

-verify-machineinstrs added to memchr-01.ll test.

New test case int-cmp-51.ll to test that MachineCSE can eliminate
an identical compare (which it couldn't do before).

Reviewed by Ulrich Weigand

llvm-svn: 251634
2015-10-29 16:13:55 +00:00
Marek Olsak 6f6d318e16 AMDGPU/SI: handle undef for llvm.SI.packf16
llvm-svn: 251632
2015-10-29 15:29:09 +00:00
Marek Olsak 74d084f466 AMDGPU/SI: use S_OR for fneg (fabs f32)
llvm-svn: 251631
2015-10-29 15:29:05 +00:00
Marek Olsak f924dd6f3c AMDGPU/SI: use S_AND for i1 trunc
llvm-svn: 251630
2015-10-29 15:05:03 +00:00
Zoran Jovanovic 796ed6d937 [mips] wrong opcode for ll/sc instructions on mipsr6 when -integrated-as is used
Summary:
This commit resolves wrong opcodes for ll and sc instructions for r6 architecutres, which were generated in method MipsTargetLowering::emitAtomicBinary.

Author: Jelena.Losic

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D13593

llvm-svn: 251629
2015-10-29 14:40:19 +00:00
Artyom Skrobov 0ff1ce4038 Recognize that ARM1176JZ[F]-S support TrustZone
Summary:
ARMv6KZ cores were set up incorrectly in ARM.td; also, the SMI mnemonic
(the old name for SMC, as defined in ARMv6KZ) wasn't supported.

Reviewers: jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14154

llvm-svn: 251627
2015-10-29 13:56:19 +00:00
Vasileios Kalintiris 2f412684a9 [mips] Check the register class before replacing materializations of zero with $zero in microMIPS.
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:

  [d]addiu, $dst, $zero, 0

with the $zero register, without checking for membership in the register
class of the target machine operand.

Reviewers: dsanders

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13984

llvm-svn: 251622
2015-10-29 10:17:16 +00:00
JF Bastien 7b452e2c63 [WebAssembly] Update opcode name format for conversions
Summary:
Conversion opcode name format should be f64.convert_u/i64 not f64_convert_u

Author: s3ththompson
Reviewers: jfb
Subscribers: sunfish, jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D14160

llvm-svn: 251613
2015-10-29 04:10:52 +00:00
Benjamin Kramer 4e4ca38bcf Remove CRLF line endings.
llvm-svn: 251594
2015-10-29 02:33:05 +00:00
Hal Finkel 7d0e34eb33 [PowerPC] Recurse through constants when looking for TLS globals
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.

Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.

llvm-svn: 251582
2015-10-28 23:43:00 +00:00
Hal Finkel bdd292ae22 [PowerPC] Don't return unsupported register classes for asm constraints
As a follow-up to r251566, do the same for the other optionally-supported
register classes (mostly for vector registers). Don't return an unavailable
register class (which would cause an assert later), but fail cleanly when
provided an unsupported inline asm constraint.

llvm-svn: 251575
2015-10-28 23:03:45 +00:00
Tim Northover f8e47e4868 ARM: add support for WatchOS's compact unwind information.
llvm-svn: 251573
2015-10-28 22:56:36 +00:00
Tim Northover 8b40366b54 ARM: teach backend about WatchOS and TvOS libcalls.
The most substantial changes are again for watchOS: libcalls are hard-float if
needed and sincos has a different calling convention.

llvm-svn: 251571
2015-10-28 22:51:16 +00:00
Tim Northover e0ccdc6de9 ARM: add backend support for the ABI used in WatchOS
At the LLVM level this ABI is essentially a minimal modification of AAPCS to
support 16-byte alignment for vector types and the stack.

llvm-svn: 251570
2015-10-28 22:46:43 +00:00
Tim Northover 2d4d161519 ARM: support .watchos_version_min and .tvos_version_min.
These MachO file directives are used by linkers and other tools to provide
compatibility information, much like the existing .ios_version_min and
.macosx_version_min.

llvm-svn: 251569
2015-10-28 22:36:05 +00:00
Hal Finkel 34d4149452 [PowerPC] Cleanly reject asm crbit constraint with -crbits
When crbits are disabled, cleanly reject the constraint (return the register
class only to cause an assert later).

llvm-svn: 251566
2015-10-28 22:25:52 +00:00
Cong Hou da4e8aeec6 [X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is simple before calling getSimpleVT().
llvm-svn: 251538
2015-10-28 18:15:46 +00:00
Artyom Skrobov b43981076a [ARM] Allow SP in rGPR, starting from ARMv8
Summary:
This patch handles assembly and disassembly, but not codegen, as of yet.

Additionally, it fixes a bug whereby SP and PC as shifted-reg operands
were treated as predictable in ARMv7 Thumb; and it enables the tests
for invalid and unpredictable instructions to run on both ARMv7 and ARMv8.

Reviewers: jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14141

llvm-svn: 251516
2015-10-28 13:58:36 +00:00
Benjamin Kramer 039b10423a Put global classes into the appropriate namespace.
Most of the cases belong into an anonymous namespace. No
functionality change intended.

llvm-svn: 251515
2015-10-28 13:54:36 +00:00
Hrvoje Varga 18148671ee [mips][microMIPS] Implement PAUSE, RDHWR, RDPGPR, SDBBP, SSNOP, SYNC, SYNCI and WAIT instructions
Differential Revision: http://reviews.llvm.org/D12628

llvm-svn: 251510
2015-10-28 11:04:29 +00:00
Craig Topper 93d4a9e117 [X86] Make some for loops over MVTs more explicit (and shorter) by just mentioning all the relevant types in an initializer list. NFC
llvm-svn: 251500
2015-10-28 05:48:32 +00:00
Craig Topper 3a47587c41 Use range-based for loops and use initializer list to remove a small static array. NFC
llvm-svn: 251494
2015-10-28 04:53:27 +00:00
Craig Topper 4b27576001 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

llvm-svn: 251490
2015-10-28 04:02:12 +00:00
Hal Finkel f4052340a4 [PowerPC] Replace cntlz[.] with cntlzw[.]
cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic.

This change fixes an issue when -no-integrated-as: The opcode cntlz is
unrecognized by gas

Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.]
This is done for because the POWER cntlz mnemonic has be used by LLVM for
a very long time. We need to make sure that assembly programs
that are using the cntlz[.] do not break with this change.

Change PowerPC tests to reflect the insn change from cntlz to cntlzw.
Add assembly test to verify cntlz[.] is encoded correctly.

Patch by Tom Rix!

llvm-svn: 251489
2015-10-28 03:26:45 +00:00
Jun Bum Lim c9879ecfbc [AArch64]Merge halfword loads into a 32-bit load
This recommits r250719, which caused a failure in SPEC2000.gcc
because of the incorrect insert point for the new wider load.

Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
  ldrh w0, [x2]
  ldrh w1, [x2, #2]
becomes
  ldr w0, [x2]
  ubfx w1, w0, #16, #16
  and  w0, w0, #ffff

llvm-svn: 251438
2015-10-27 19:16:03 +00:00
Cong Hou 07eeb8001e Create a new interface addSuccessorWithoutWeight(MBB*) in MBB to add successors when optimization is disabled.
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.

We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.

In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.

Differential revision: http://reviews.llvm.org/D13963

llvm-svn: 251429
2015-10-27 17:59:36 +00:00
Asaf Badouh c7cb880669 [X86][AVX512] [X86][AVX512] add convert float to half
convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem).

Differential Revision: http://reviews.llvm.org/D14113

llvm-svn: 251409
2015-10-27 15:37:17 +00:00
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
Michael Kuperstein e1194bdb4f [X86] Make elfiamcu an OS, not an environment.
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.

Differential Revision: http://reviews.llvm.org/D14081

llvm-svn: 251390
2015-10-27 07:23:59 +00:00
Craig Topper ee0c859788 Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

llvm-svn: 251382
2015-10-27 04:14:24 +00:00
Sanjay Patel 309c4f93e5 [x86] replace integer logic ops with packed SSE FP logic ops
If we have an operand to a bitwise logic op that's already in
an XMM register and the result is going to be sent to an XMM
register, then use an SSE logic op to avoid moves between the
integer and vector register files.

Related commits:
http://reviews.llvm.org/rL248395
http://reviews.llvm.org/rL248399
http://reviews.llvm.org/rL248404
http://reviews.llvm.org/rL248409
http://reviews.llvm.org/rL248415

This should solve PR22428:
https://llvm.org/bugs/show_bug.cgi?id=22428

llvm-svn: 251378
2015-10-27 01:28:07 +00:00
Daniel Sanders 5bf6eab6b8 [mips][ias] Fold needsExpansion() and expandInstruction() together. NFC.
Summary:
Previously we maintained two separate switch statements that had to be kept in
sync. This patch merges them into a single switch.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14012

llvm-svn: 251369
2015-10-26 23:50:00 +00:00
Sanjay Patel e9b500f722 reorganize logic; NFCI (retry r251349)
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

..and I really hope it's NFC this time!

llvm-svn: 251357
2015-10-26 21:54:14 +00:00
Tim Northover 939f089242 ARM: make sure VFP loads and stores are properly aligned.
Both VLDRS and VLDRD fault if the memory is not 4 byte aligned, which wasn't
really being checked before, leading to faults at runtime.

llvm-svn: 251352
2015-10-26 21:32:53 +00:00
Sanjay Patel f29fed423a revert r251349; it included code for a functional change
llvm-svn: 251350
2015-10-26 21:28:02 +00:00
Sanjay Patel fdf75452e4 reorganize logic; NFCI
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

llvm-svn: 251349
2015-10-26 21:24:09 +00:00
Peter Collingbourne 99fac80db2 ARM/ELF: Restore original (pre-r251322) logic for deciding whether to use GOT.
Unbreaks linking with gold, which cannot resolve direct relocations referring
to global symbols.

llvm-svn: 251342
2015-10-26 20:46:44 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Peter Collingbourne 97aae40880 ARM/ELF: Better codegen for global variable addresses.
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.

This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.

As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.

We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.

This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.

Differential Revision: http://reviews.llvm.org/D13650

llvm-svn: 251322
2015-10-26 18:23:16 +00:00
Jonas Paulsson 83553d0cac [SystemZ] LTGFR use regclass should be GR32, not GR64.
Discovered by testing int-cmp-44.ll with -verify-machineinstrs (added to
test run).

llvm-svn: 251299
2015-10-26 15:03:49 +00:00
Jonas Paulsson 7da3820882 [SystemZ] Also clear kill flag for index reg in splitMove().
Discovered by running fp-move-05.ll with -verify-machineinstrs (added
to test case run).

llvm-svn: 251298
2015-10-26 15:03:41 +00:00
Jonas Paulsson 9525b2c0c8 [SystemZ] Don't forget the CC def op on LTEBRCompare pseudos
Discovered by running fp-cmp-02.ll with -verify-machineinstrs (now added
to test run).

llvm-svn: 251297
2015-10-26 15:03:32 +00:00
Jonas Paulsson dab7407258 [SystemZ] Tie operands in SystemZShorteInst if MI becomes 2-address.
Discovered by testing fp-add-02.ll with -verify-machineinstrs.

Test case updated to always run with -verify-machineinstrs.

llvm-svn: 251296
2015-10-26 15:03:07 +00:00
Vasileios Kalintiris 165121f326 [mips] Check for the correct error message in tests for interrupt attributes.
Instead of XFAIL-ing the tests with the wrong usage of the "interrupt"
attribute, we should check that we emit the correct error messages to
the user.

llvm-svn: 251295
2015-10-26 14:24:30 +00:00
Igor Breger e4ddc3f4cd AVX512: Enabled VPBROADCASTB lowering for v64i8 vectors.
Differential Revision: http://reviews.llvm.org/D13896

llvm-svn: 251287
2015-10-26 13:01:02 +00:00