llvm-project

Commit Graph

Author	SHA1	Message	Date
Jozef Kolek	e8c9d1eaf7	[mips][microMIPS] Implement LBU16, LHU16, LW16, SB16, SH16 and SW16 instructions Differential Revision: http://reviews.llvm.org/D5122 llvm-svn: 222653	2014-11-24 14:39:13 +00:00
Jozef Kolek	1904fa2197	[mips][microMIPS] Implement 16-bit instructions registers including ZERO instead of S0 Implement microMIPS 16-bit instructions register set: $0, $2-$7 and $17. Differential Revision: http://reviews.llvm.org/D5780 llvm-svn: 222652	2014-11-24 14:25:53 +00:00
Aaron Ballman	41580deaf1	Removing a variable that is initialized but never read. The original author has been alerted to the warning, in case this variable is meant to be used. Fixes -Werror builds in the meantime. llvm-svn: 222649	2014-11-24 14:03:16 +00:00
Jozef Kolek	ea22c4cfbb	[mips][microMIPS] Implement disassembler support for 16-bit instructions With the help of new method readInstruction16() two bytes are read and decodeInstruction() is called with DecoderTableMicroMips16, if this fails four bytes are read and decodeInstruction() is called with DecoderTableMicroMips32. Differential Revision: http://reviews.llvm.org/D6149 llvm-svn: 222648	2014-11-24 13:29:59 +00:00
Andrea Di Biagio	23e2cfa834	[X86] Improved target specific combine on VSELECT dag nodes. This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1. On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd. Also, removed a target specific combine that performed a premature lowering of VSELECT nodes to target specific MOVSS/MOVSD nodes. llvm-svn: 222647	2014-11-24 12:23:15 +00:00
Michael Kuperstein	9ef90647b9	[X86] Fixes bug in build_vector v4x32 lowering r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where: 1. A single extracted element is used twice. 2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask. This caused a crash, since the source value for the insertps ends-up uninitialized. Differential Revision: http://reviews.llvm.org/D6377 llvm-svn: 222635	2014-11-23 13:09:06 +00:00
Craig Topper	8c5128bf1b	Add missing override keywords. llvm-svn: 222634	2014-11-23 09:40:13 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Matt Arsenault	2a495975ed	R600: Fix extloads of i1 on R600/Evergreen llvm-svn: 222631	2014-11-23 02:57:54 +00:00
Matt Arsenault	28638f1e2c	R600: Fix assert on copy of an i1 on pre-SI i1 is not a legal type on Evergreen, so this combine proceeded and tried to produce a bitcast between i1 and i8. llvm-svn: 222630	2014-11-23 02:57:52 +00:00
Simon Pilgrim	a279410ede	Tidied up target triple OS detection. NFC Use Triple::isOS*() helper functions where possible. llvm-svn: 222622	2014-11-22 19:12:10 +00:00
Chandler Carruth	5852b1f4c2	[x86] Teach the vector shuffle yet another step of canonicalization. No functionality changed yet, but this will prevent subsequent patches from having to handle permutations of various interleaved shuffle patterns. llvm-svn: 222614	2014-11-22 09:18:53 +00:00
Joerg Sonnenberger	02b13a8d9b	Fix transformation of add with pc argument to adr for non-immediate arguments. llvm-svn: 222587	2014-11-21 22:39:34 +00:00
Tom Stellard	91c7ef529d	R600/SI: Add an s_mov_b32 to patterns which use the M0RegClass We need to use a s_mov_b32 rather than a copy, so that CSE will eliminate redundant moves to the m0 register. llvm-svn: 222584	2014-11-21 22:31:46 +00:00
Tom Stellard	a99ada528c	R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction This s_mov_b32 will write to a virtual register from the M0Reg class and all the ds instructions now take an extra M0Reg explicit argument. This change is necessary to prevent issues with the scheduler mixing together instructions that expect different values in the m0 registers. llvm-svn: 222583	2014-11-21 22:31:44 +00:00
Tom Stellard	6596ba7933	R600/SI: Add SIFoldOperands pass This pass attempts to fold the source operands of mov and copy instructions into their uses. llvm-svn: 222581	2014-11-21 22:06:37 +00:00
Jozef Kolek	3b8ddb665b	[mips][microMIPS] This patch implements functionality in MIPS delay slot filler such as if delay slot filler have to put NOP instruction into the delay slot of microMIPS BEQ or BNE instruction which uses the register $0, then instead of emitting NOP this instruction is replaced by the corresponding microMIPS compact branch instruction, i.e. BEQZC or BNEZC. Differential Revision: http://reviews.llvm.org/D3566 llvm-svn: 222580	2014-11-21 22:04:35 +00:00
Tom Stellard	fb0011cf1a	R600/SI: Mark s_mov_b32 and s_mov_b64 as rematerializable llvm-svn: 222579	2014-11-21 22:00:16 +00:00
Colin LeMahieu	310991c66f	[Hexagon] Adding sxth instruction. llvm-svn: 222577	2014-11-21 21:54:59 +00:00
Colin LeMahieu	91ffec908f	[Hexagon] Adding sxtb instruction. Renaming some identically named classes that will be removed after converting referencing defs. llvm-svn: 222575	2014-11-21 21:35:52 +00:00
Colin LeMahieu	e88447d8de	[Hexagon] Removing SUB_rr and replacing with A2_sub. llvm-svn: 222571	2014-11-21 21:19:18 +00:00
Sanjay Patel	501890e909	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 llvm-svn: 222544	2014-11-21 17:40:04 +00:00
Chandler Carruth	ce5a26b0e7	[x86] Restructure the checking patterns for v16 and v32 avx2 vector shuffle lowering to allow much better blend matching. Specifically, with the new structure the code seems clearer to me and we correctly can hit the cases where merging two 128-bit lanes is a clear win and can be shuffled cheaply afterward. llvm-svn: 222539	2014-11-21 14:53:03 +00:00
Chandler Carruth	6c4d1ea8c4	[x86] Make the previous logic significantly less conservative and get a bunch more improvements. Non-lane-crossing is fine, the key is that lane merging only makes sense for single-input shuffles. Not sure why I got so turned around here. The code all works, I was just using the wrong model for it. This only updates v4 and v8 lowering. The v16 and v32 lowering requires restructuring the entire check sequence. llvm-svn: 222537	2014-11-21 14:33:24 +00:00
Chandler Carruth	d2b19bc867	[x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit lanes. By special casing these we can often either reduce the total number of shuffles significantly or reduce the number of (high latency on Haswell) AVX2 shuffles that potentially cross 128-bit lanes. Even when these don't actually cross lanes, they have much higher latency to support that. Doing two of them and a blend is worse than doing a single insert across the 128-bit lanes to blend and then doing a single interleaved shuffle. While this seems like a narrow case, it kept cropping up on me and the difference is huge as you can see in many of the test cases. I first hit this trying to perfectly fix the interleaving shuffle patterns used by Halide for AVX2. llvm-svn: 222533	2014-11-21 13:56:05 +00:00
Alexey Volkov	fd1731d876	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 llvm-svn: 222521	2014-11-21 11:19:34 +00:00
NAKAMURA Takumi	7495dc76e8	Add LLVMScalarOpts to LLVMPowerPCCodeGen. llvm-svn: 222516	2014-11-21 09:14:45 +00:00
Hao Liu	44e5d7a131	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00
Craig Topper	61e88f44f9	Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *' llvm-svn: 222509	2014-11-21 05:58:21 +00:00
Hal Finkel	f413be11f0	[PPC] Use SeparateConstOffsetFromGEP This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. llvm-svn: 222504	2014-11-21 04:35:51 +00:00
Quentin Colombet	a7439d4483	[X86] Do not custom lower UINT_TO_FP when the target type does not match the custom lowering. <rdar://problem/19026326> llvm-svn: 222489	2014-11-21 00:47:19 +00:00
Reid Kleckner	343c395f11	Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/ Follow up to r221940, where I must not have caught em all. NFC llvm-svn: 222481	2014-11-20 23:51:47 +00:00
Reid Kleckner	357600eab5	Add out of line virtual destructors to all LLVMTargetMachine subclasses These recently all grew a unique_ptr<TargetLoweringObjectFile> member in r221878. When anyone calls a virtual method of a class, clang-cl requires all virtual methods to be semantically valid. This includes the implicit virtual destructor, which triggers instantiation of the unique_ptr destructor, which fails because the type being deleted is incomplete. This is just part of the ongoing saga of PR20337, which is affecting Blink as well. Because the MSVC ABI doesn't have key functions, we end up referencing the vtable and implicit destructor on any virtual call through a class. We don't actually end up emitting the dtor, so it'd be good if we could avoid this unneeded type completion work. llvm-svn: 222480	2014-11-20 23:37:18 +00:00
Mehdi Amini	fee89e43ad	Update Makefile following directory removal in r222466 llvm-svn: 222475	2014-11-20 22:48:24 +00:00
Colin LeMahieu	ff06261aed	[Hexagon] [NFC] Merging InstPrinter directory in to MCTargetDesc since they have a circular dependency. llvm-svn: 222458	2014-11-20 21:56:35 +00:00
Saleem Abdulrasool	2f3b3f3182	X86: use the correct alloca symbol for Windows Itanium Windows itanium targets the MSVCRT, and the stack probe symbol is provided by MSVCRT. This corrects the emission of stack probes on i686-windows-itanium. llvm-svn: 222439	2014-11-20 18:01:26 +00:00
Jyoti Allur	5b9f35220e	[ELF] Prevent ARM ELF object writer from generating deprecated relocation code R_ARM_PLT32 llvm-svn: 222414	2014-11-20 05:58:11 +00:00
Craig Topper	e9891afa46	Fix a typo in a comment. llvm-svn: 222412	2014-11-20 05:22:37 +00:00
Colin LeMahieu	ac00643603	[Hexagon] Adding A2_xor instruction with IR selection pattern and test. llvm-svn: 222399	2014-11-19 23:22:23 +00:00
Colin LeMahieu	21866546ae	[Hexagon] Adding A2_or instruction with IR selection pattern and test. llvm-svn: 222396	2014-11-19 22:58:04 +00:00
Andrea Di Biagio	1b657bfcc8	[X86] Improved lowering of v4x32 build_vector dag nodes. This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes that are known to have at least two non-zero elements. With this patch, a build_vector that performs a blend with zero is converted into a shuffle. This is done to let the shuffle legalizer expand the dag node in a optimal way. For example, if we know that a build_vector performs a blend with zero, we can try to lower it as a movq/blend instead of always selecting an insertps. This patch also improves the logic that lowers a build_vector into a insertps with zero masking. See for example the extra test cases added to test sse41.ll. Differential Revision: http://reviews.llvm.org/D6311 llvm-svn: 222375	2014-11-19 19:34:29 +00:00
Tom Stellard	e0ddfd11ea	R600/SI: Make SIInstrInfo::isOperandLegal() more strict A register operand that has a common sub-class with its instruction's defined register class is not always legal. For example, SReg_32 and M0Reg both have a common sub-class, but we can't use an SReg_32 in instructions that expect a M0Reg. This prevents the llvm.SI.sendmsg.ll test from failing when the fold operand pass is added. llvm-svn: 222368	2014-11-19 16:58:49 +00:00
Zoran Jovanovic	a4c4b5fc01	[mips][micromips] Implement SWM32 and LWM32 instructions Differential Revision: http://reviews.llvm.org/D5519 llvm-svn: 222367	2014-11-19 16:44:02 +00:00
Jozef Kolek	ffeed44190	[mips][microMIPS] Fix opcodes of MFHC1 and MTHC1 instructions. Differential Revision: http://reviews.llvm.org/D6169 llvm-svn: 222355	2014-11-19 13:37:51 +00:00
Jozef Kolek	4d55b4d768	[mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2. Differential Revision: http://reviews.llvm.org/D5800 llvm-svn: 222352	2014-11-19 13:23:58 +00:00
Jozef Kolek	73f64eac8c	[mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction. Differential Revision: http://reviews.llvm.org/D5799 llvm-svn: 222351	2014-11-19 13:11:09 +00:00
Jozef Kolek	5f95dd2b65	[mips][microMIPS] Implement LWXS instruction. Differential Revision: http://reviews.llvm.org/D5407 llvm-svn: 222348	2014-11-19 11:39:12 +00:00
Jozef Kolek	dc62fc4a8f	[mips][microMIPS] Implement SDBBP and RDHWR instructions. Differential Revision: http://reviews.llvm.org/D5240 llvm-svn: 222347	2014-11-19 11:25:50 +00:00
Simon Pilgrim	3ac3b251a9	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2 This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340	2014-11-19 10:06:49 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Hao Liu	2aa06a989d	[AArch64] Disable useAA for Cortex-A57. Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved and benefical for ooo cores, we can enable it again. llvm-svn: 222333	2014-11-19 06:48:56 +00:00
Hao Liu	fd46bea46a	[AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend. SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE and LICM. By enabling these passes we can have better address calculations and generate a better addressing mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57. Reviewed in http://reviews.llvm.org/D5864. llvm-svn: 222331	2014-11-19 06:39:53 +00:00
David Blaikie	5106ce7897	Remove StringMap::GetOrCreateValue in favor of StringMap::insert Having two ways to do this doesn't seem terribly helpful and consistently using the insert version (which we already has) seems like it'll make the code easier to understand to anyone working with standard data structures. (I also updated many references to the Entry's key and value to use first() and second instead of getKey{Data,Length,} and get/setValue - for similar consistency) Also removes the GetOrCreateValue functions so there's less surface area to StringMap to fix/improve/change/accommodate move semantics, etc. llvm-svn: 222319	2014-11-19 05:49:42 +00:00
Weiming Zhao	7a2d15678e	[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability llvm-svn: 222292	2014-11-19 00:29:14 +00:00
Matt Arsenault	c09cc3c5b0	R600/SI: Implement areMemAccessesTriviallyDisjoint This partially makes up for not having address spaces used for alias analysis in some simple cases. This is not yet enabled by default so shouldn't change anything yet. llvm-svn: 222286	2014-11-19 00:01:31 +00:00
Matt Arsenault	9a072c19ae	R600/SI: Set hasSideEffects = 0 on load and store instructions. Assuming unmodeled side effects interferes with some scheduling opportunities. Don't put it in the base class of DS instructions since there are a few weird effecting, non load/store instructions there. llvm-svn: 222285	2014-11-18 23:57:33 +00:00
Simon Pilgrim	9c1e4123f8	[X86][AVX] 256-bit vector stack unaligned load/stores identification Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills. This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills. This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll Differential Revision: http://reviews.llvm.org/D6252 llvm-svn: 222281	2014-11-18 23:38:19 +00:00
Colin LeMahieu	44fd1c8bdf	[Hexagon] Adding A2_and instruction. llvm-svn: 222274	2014-11-18 22:45:47 +00:00
Chad Rosier	c250881838	[FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic shift-right for booleans (i1). Arithmetic shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222272	2014-11-18 22:41:49 +00:00
Chad Rosier	e16d16ae41	[FastISel][AArch64] Also allow folding of sign-/zero-extend and logical shift-right for booleans (i1). Logical shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222270	2014-11-18 22:38:42 +00:00
Colin LeMahieu	38765e6d89	[Hexagon] Adding A2_sub instruction Renaming test files. llvm-svn: 222263	2014-11-18 21:51:51 +00:00
Juergen Ributzka	cdda930843	[FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts." Shifts also perform sign-/zero-extends to larger types, which requires us to emit an integer extend instead of a simple COPY. Related to PR21594. llvm-svn: 222257	2014-11-18 21:20:17 +00:00
Matt Arsenault	162c1010bd	R600/SI: Move SIFixSGPRCopies to inst selector passes This should expose more of the actually used VALU instructions to the machine optimization passes. This also should help getting i1 handling into a better state. For not entirly understood reasons, this fixes the split-scalar-i64-add.ll test where a 64-bit add would only partially be moved to the VALU resulting in use of undefined VCC. llvm-svn: 222256	2014-11-18 21:06:58 +00:00
Juergen Ributzka	7a7c4684e4	[AArch64] Don't optimize all compare instructions. "optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add instructions when the flags are not used anymore. This conversion is valid for most instructions, but not all. Some instructions that don't set the flags (e.g. sub with immediate) can set the SP, whereas the flag setting version uses the same encoding for the "zero" register. Update the code to also check for the return register before performing the optimization to make sure that a cmp doesn't suddenly turn into a sub that sets the stack pointer. I don't have a test case for this, because it isn't easy to trigger. llvm-svn: 222255	2014-11-18 21:02:40 +00:00
Tom Stellard	f0a2107c6b	R600/SI: Make sure resource descriptors are always stored in SGPRs llvm-svn: 222253	2014-11-18 20:39:39 +00:00
Colin LeMahieu	efa74e0280	[Hexagon] Converting from ADD_rr to A2_add which has encoding bits. Adding test to show correct instruction selection and encoding. llvm-svn: 222249	2014-11-18 20:28:11 +00:00
Juergen Ributzka	4328fd94b0	[FastISel][AArch64] Fix shift-immediate emission for "zero" shifts. This change emits a COPY for a shift-immediate with a "zero" shift value. This fixes PR21594 where we emitted a shift instruction with an incorrect immediate operand. llvm-svn: 222247	2014-11-18 19:58:59 +00:00
Jozef Kolek	52e84e99a1	Test commit to verify that commit access works. llvm-svn: 222244	2014-11-18 19:20:34 +00:00
Reid Kleckner	d970702ab3	Revert "ADT: correctly report isMSVCEnvironment for windows itanium" This reverts commit r222180. llvm-svn: 222188	2014-11-17 22:55:59 +00:00
Saleem Abdulrasool	76f2c77070	ADT: correctly report isMSVCEnvironment for windows itanium The itanium environment on Windows uses MSVC and is a MSVC environment. Report this correctly. llvm-svn: 222180	2014-11-17 22:13:26 +00:00
Matt Arsenault	7480a0e163	R600/SI: Don't copy flags when extracting subreg This was resulting in use of a register after a kill. For some reason this showed up as a problem in many tests when moving the SIFixSGPRCopies pass closer to instruction selection. llvm-svn: 222175	2014-11-17 21:11:37 +00:00
Matt Arsenault	6f679785f4	R600/SI: Assume SIFixSGPRCopies makes changes I'm not sure if this was breaking anything. llvm-svn: 222174	2014-11-17 21:11:34 +00:00
Alexey Volkov	7de210bd52	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs Differential Revision: http://reviews.llvm.org/D5934 llvm-svn: 222141	2014-11-17 16:17:51 +00:00
Oliver Stannard	970b0d576c	[Thumb1] Re-write emitThumbRegPlusImmediate This was motivated by a bug which caused code like this to be miscompiled: declare void @take_ptr(i8) define void @test() { %addr1.32 = alloca i8 %addr2.32 = alloca i32, i32 1028 call void @take_ptr(i8 %addr1) ret void } This was emitting the following assembly to get the value of %addr1: add r0, sp, #1020 add r0, r0, #8 However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this could not be assembled. The generated object file contained this, resulting in r0 holding SP+8 rather tha SP+1028: add r0, sp, #1020 add r0, sp, #8 This function looked like it could have caused miscompilations for other combinations of registers and offsets (though I don't think it is currently called with these), and the heuristic it used did not match the emitted code in all cases. llvm-svn: 222125	2014-11-17 11:18:10 +00:00
Craig Topper	7f416c8acb	Convert some EVTs to MVTs where only a SimpleValueType is needed. llvm-svn: 222109	2014-11-16 21:17:18 +00:00
Craig Topper	949d50bc71	[x86] Remove two redundant isel patterns. They equivalent already exists in the instruction pattern. llvm-svn: 222094	2014-11-16 09:24:16 +00:00
Simon Pilgrim	6d675f4e35	[X86][SSE] Improve legal SHUFP and PSHUFD shuffle matching Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input. As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector. Differential Revision: http://reviews.llvm.org/D6287 llvm-svn: 222087	2014-11-15 21:13:05 +00:00
Matt Arsenault	36094d788a	R600: Permute operands when selecting legacy min/max This gets the correct NaN behavior based on the compare type the hardware uses. This now passes the new piglit test I have for this on SI. Add stricter tests for the operand order. llvm-svn: 222079	2014-11-15 05:02:57 +00:00
Tom Stellard	83171b32ed	R600: Fix 64-bit integer division This fixes a failure in one of the oclconform tests. Patch by: Jan Vesely llvm-svn: 222073	2014-11-15 01:07:57 +00:00
Tom Stellard	bf69d76106	R600: Factor i64 UDIVREM lowering into its own fuction This is so it could potentially be used by SI. However, the current implementation does not always produce correct results, so the IntegerDivisionPass is being used instead. llvm-svn: 222072	2014-11-15 01:07:53 +00:00
Reid Kleckner	c2291f3905	Rename EH related stuff to be more precise Summary: The current "WinEH" exception handling type is more about Itanium-style LSDA tables layered on top of the Windows native unwind info format instead of .eh_frame tables or EHABI unwind info. Use the name "ItaniumWinEH" to better reflect the hybrid nature of the design. Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions, since the LSDA is part of the Itanium C++ ABI document, and not the DWARF standard. Reviewers: echristo Subscribers: llvm-commits, compnerd Differential Revision: http://reviews.llvm.org/D6279 llvm-svn: 222062	2014-11-14 23:31:07 +00:00
Tim Northover	603d316517	ARM: refactor .cfi_def_cfa_offset emission. We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057	2014-11-14 22:45:33 +00:00
Tim Northover	9d2d218f49	ARM: correctly calculate the offset of FP in its push. When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056	2014-11-14 22:45:31 +00:00
Tom Stellard	e63d5ed2f9	R600/SI: Mark s_movk_i32 as rematerializable llvm-svn: 222037	2014-11-14 20:43:28 +00:00
Tom Stellard	bdd567d86d	R600/SI: Fix spilling of m0 register If we have spilled the value of the m0 register, then we need to restore it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't write to m0. v_readlane_b32 can't write to m0, so llvm-svn: 222036	2014-11-14 20:43:26 +00:00
Matt Arsenault	cc3c2b3946	R600/SI: Combine min3/max3 instructions llvm-svn: 222032	2014-11-14 20:08:52 +00:00
Matt Arsenault	72858935f7	R600/SI: Fix verifier error from a branch on IMPLICIT_DEF SIILowerI1Copies wasn't correctly handling this case. llvm-svn: 222020	2014-11-14 18:43:41 +00:00
Matt Arsenault	6ad34266e3	Fix unused variable warning without asserts llvm-svn: 222017	2014-11-14 18:40:49 +00:00
Matt Arsenault	d28a7fde32	R600/SI: Match integer min / max instructions llvm-svn: 222015	2014-11-14 18:30:06 +00:00
Matt Arsenault	94812216ef	R600/SI: Use S_BFE_I64 for 64-bit sext_inreg llvm-svn: 222012	2014-11-14 18:18:16 +00:00
Cameron McInally	04400449c5	[AVX512] Add 512b masked integer shift by immediate patterns. llvm-svn: 222002	2014-11-14 15:43:00 +00:00
Tom Stellard	9dec074399	R600/SI: Fix assembly names for exec_hi and exec_lo llvm-svn: 221995	2014-11-14 14:08:04 +00:00
Tom Stellard	9d7ddd516e	R600/SI: Start implementing an assembler This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994	2014-11-14 14:08:00 +00:00
Bill Schmidt	7674692961	[PowerPC] Add VSX builtins for vec_div This patch adds builtin support for xvdivdp and xvdivsp, along with a test case. Straightforward stuff. There's a companion patch for Clang. llvm-svn: 221983	2014-11-14 12:10:40 +00:00
Matt Arsenault	21c938e14b	R600/SI: Make constant array static llvm-svn: 221965	2014-11-14 02:21:58 +00:00
Tim Northover	d3be12a6c7	X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR. getTargetConstant should only be used when you can guarantee the instruction selected will be able to cope with the raw value. BUILD_VECTOR is rather too generic for this so we should use getConstant instead. In that case, an instruction can still consume the constant, but if it doesn't it'll be materialised through its own round of ISel. Should fix PR21352. llvm-svn: 221961	2014-11-14 01:30:14 +00:00
Reid Kleckner	d378174d54	Fix build of Mips code with MSVC by using our macro instead of __attribute__((unused)) directly llvm-svn: 221956	2014-11-14 00:39:33 +00:00
Reed Kotler	d5c4196cb6	First stage of call lowering for Mips fast-isel Summary: This has most of what is needed for mips fast-isel call lowering for O32. What is missing I will add on the next patch because this patch is already too large. It should not be doing anything wrong but it will punt on some cases that it is basically capable of doing. The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues. The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers. However there is a way to think about this all very simply and this implementation reflects that. Basically, the ABI rules are written as if everything is passed on the stack and aligned as such. Once that is conceptually done, it is nearly trivial to reassign those locations to registers and then all the complexity disappears. So I have told tablegen that all the data is passed on the stack and during the lowering I fix this by assigning to registers as per the ABI doc. This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what is going on. Test Plan: callabi.ll Reviewers: dsanders Reviewed By: dsanders Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5714 llvm-svn: 221948	2014-11-13 23:37:45 +00:00
Matt Arsenault	da59f3de45	R600/SI: Fix fmin_legacy / fmax_legacy matching for SI select_cc is expanded on SI, so this was never matched. llvm-svn: 221941	2014-11-13 23:03:09 +00:00
Aditya Nandakumar	3053155652	We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed. llvm-svn: 221926	2014-11-13 21:29:21 +00:00
Juergen Ributzka	0af310d052	[FastISel][AArch64] Don't bail during simple GEP instruction selection. The generic FastISel code would bail, because it can't emit a sign-extend for AArch64. This copies the code over and uses AArch64 specific emit functions. This is not ideal and 'computeAddress' should handles this, so it can fold the address computation into the memory operation. I plan to clean up 'computeAddress' anyways, so I will add that in a future commit. Related to rdar://problem/18962471. llvm-svn: 221923	2014-11-13 20:50:44 +00:00
Matt Arsenault	7784992999	R600/SI: Use s_movk_i32 llvm-svn: 221922	2014-11-13 20:44:23 +00:00
Matt Arsenault	1a179e8219	R600/SI: Fix definition for s_cselect_b32 These were directly using the old base instruction class, and specifying the wrong register classes for operands. The operands can be the other special inputs besides SGPRs. The op name was also being directly used for the asm string, so this was printed without any operands. llvm-svn: 221921	2014-11-13 20:23:36 +00:00
Matt Arsenault	6ef66144f3	R600: Fix assert on empty function If a function is just an unreachable, this would hit a "this is not a MachO target" assertion because of setting HasSubsectionViaSymbols. llvm-svn: 221920	2014-11-13 20:07:40 +00:00
Matt Arsenault	cc8d3b8774	R600: Error on initializer for LDS. Also give a proper error for other address spaces. llvm-svn: 221917	2014-11-13 19:56:13 +00:00
Matt Arsenault	1cffa4c191	R600/SI: Get rid of FCLAMP_SI pseudo It's not necessary. Also use complex patterns to allow src modifier usage. llvm-svn: 221916	2014-11-13 19:49:04 +00:00
Matt Arsenault	581a7a6933	R600/SI: Allow commuting with src2_modifiers llvm-svn: 221911	2014-11-13 19:26:50 +00:00
Matt Arsenault	95e48668b6	R600/SI: Allow commuting some 3 op instructions e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c This simplifies matching v_madmk_f32. This looks somewhat surprising, but it appears to be OK to do this. We can commute src0 and src1 in all of these instructions, and that's all that appears to matter. llvm-svn: 221910	2014-11-13 19:26:47 +00:00
Tim Northover	631cc9ce1a	ARM: allow constpool entry to be moved to the user's block in all cases. Normally entries can only move to a lower address, but when that wasn't viable, the user's block was considered anyway. Unfortunately, it went via createNewWater which wasn't designed to handle the case where there's already an island after the block. Unfortunately, the test we have is slow and fragile, and I couldn't reduce it to anything sane even with the @llvm.arm.space intrinsic. The test change here is recreating the previous one after the change. rdar://problem/18545506 llvm-svn: 221905	2014-11-13 17:58:53 +00:00
Tim Northover	ab85dcc7b8	ARM: avoid duplicating branches during constant islands. We were using a naive heuristic to determine whether a basic block already had an unconditional branch at the end. This mostly corresponded to reality (assuming branches got optimised) because there's not much point in a branch to the next block, but could go wrong. llvm-svn: 221904	2014-11-13 17:58:51 +00:00
Tim Northover	650b0ee53b	ARM: add @llvm.arm.space intrinsic for testing ConstantIslands. Creating tests for the ConstantIslands pass is very difficult, since it depends on precise layout details. Having the ability to precisely inject a number of bytes into the stream helps greatly. llvm-svn: 221903	2014-11-13 17:58:48 +00:00
Colin LeMahieu	b6bbeee744	[Hexagon] NFC Renaming reserved identifier. llvm-svn: 221898	2014-11-13 16:36:30 +00:00
Elena Demikhovsky	d5e95b57e0	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883	2014-11-13 11:46:16 +00:00
Aditya Nandakumar	a27193297f	This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878	2014-11-13 09:26:31 +00:00
Chandler Carruth	fee91883f4	[x86] Teach the vector shuffle lowering to make a more nuanced decision between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is always better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. llvm-svn: 221861	2014-11-13 04:06:10 +00:00
Chandler Carruth	253dd39a9a	[x86] Don't form overly fragmented blends when splitting and re-combining shuffles because nothing was available in the wider vector type. The key observation (which I've put in the comments for future maintainers) is that at this point, no further combining is really possible. And so even though these shuffles trivially could be combined, we need to actually do that as we produce them when producing them this late in the lowering. This fixes another (huge) part of the Halide vector shuffle regressions. As it happens, this was already well covered by the tests, but I hadn't noticed how bad some of these got. The specific patterns that turn directly into unpckl/h patterns were occurring many times in common vector processing code. There are still more problems here sadly, but trying to incrementally tease them apart and it looks like this is the core of the problem in the splitting logic. There is some chance of regression here, you can see it in the test changes. Specifically, where we stop forming pshufb in some cases, it is possible that pshufb was in fact faster. Intel "says" that pshufb is slower than the instruction sequences replacing it. llvm-svn: 221852	2014-11-13 02:42:08 +00:00
Juergen Ributzka	957a1454cc	[FastISel][AArch64] Optimize select when one of the operands is a 'true' or 'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848	2014-11-13 00:36:46 +00:00
Juergen Ributzka	424c5fd12f	[FastISel][AArch64] Fold the cmp into the select when possible. This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847	2014-11-13 00:36:43 +00:00
Juergen Ributzka	d1a042abd0	[FastISel][AArch64] Extend 'select' lowering to support also i1 to i16. Related to rdar://problem/18960150. llvm-svn: 221846	2014-11-13 00:36:38 +00:00
Sanjay Patel	f6f7d5d1dd	Expose the number of Newton-Raphson iterations applied to the hardware's reciprocal estimate as a parameter (x86). This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385. This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests. llvm-svn: 221819	2014-11-12 21:39:01 +00:00
Ahmed Bougacha	55a333d89b	Add fortified (__*_chk) library functions to TLI (NFC) One of them (__memcpy_chk) was already there, the others were checked by comparing function names. Note that the fortified libfuncs are now part of TLI, but are always available, because they aren't generated, only optimized into the non-checking versions. Differential Revision: http://reviews.llvm.org/D6179 llvm-svn: 221817	2014-11-12 21:23:34 +00:00
Cameron McInally	73a6bca32b	[AVX512] Add integer shift by immediate intrinsics. llvm-svn: 221811	2014-11-12 19:58:54 +00:00
Jingyue Wu	a41cf018b8	Fix broken doxygen annotations, NFC llvm-svn: 221801	2014-11-12 18:25:06 +00:00
Jingyue Wu	8a12cea5f1	Disable indvar widening if arithmetics on the wider type are more expensive Summary: Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test was run whether NVPTX was enabled or not. IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. This test is run only when NVPTX is enabled. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221799	2014-11-12 18:09:15 +00:00
Justin Hibbits	a88b605721	Add support for small-model PIC for PowerPC. Summary: Large-model was added first. With the addition of support for multiple PIC models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This generates more optimal code, for shared libraries with less than about 16380 data objects. Test Plan: Test cases added or updated Reviewers: joerg, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, mcrosier, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D5399 llvm-svn: 221791	2014-11-12 15:16:30 +00:00
Zoran Jovanovic	fd888630b5	[mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for microMIPS instructions Differential Revision: http://reviews.llvm.org/D6198 llvm-svn: 221780	2014-11-12 13:30:10 +00:00
Chandler Carruth	0c922fcec5	[x86] Start improving the matching of unpck instructions based on test cases from Halide folks. This initial step was extracted from a prototype change by Clay Wood to try and address regressions found with Halide and the new vector shuffle lowering. llvm-svn: 221779	2014-11-12 10:05:18 +00:00
Elena Demikhovsky	be8808dc3f	AVX-512: Intrinsics for ERI 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774	2014-11-12 07:31:03 +00:00
Jingyue Wu	a48273390c	Reverts r221772 which fails tests llvm-svn: 221773	2014-11-12 07:19:25 +00:00
Jingyue Wu	635a9b14fa	Disable indvar widening if arithmetics on the wider type are more expensive Summary: IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221772	2014-11-12 06:58:45 +00:00
Bill Schmidt	729547847f	[PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767	2014-11-12 04:19:40 +00:00
Rafael Espindola	7fc5b87480	Pass an ArrayRef to MCDisassembler::getInstruction. With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t> instead of a MemoryObject. Even on X86 there is a maximum size an instruction can have. Given that, it seems way simpler and more efficient to just pass an ArrayRef to the disassembler instead of a MemoryObject and have it do a virtual call every time it wants some extra bytes. llvm-svn: 221751	2014-11-12 02:04:27 +00:00
Rafael Espindola	35a12a85a1	Remove a bit of dead code. Every "real" object file implements this an ptx doesn't use it. llvm-svn: 221746	2014-11-12 01:27:22 +00:00
Sanjay Patel	50fc6ff5e3	Initialize new subtarget feature variable for generating reciprocal estimate instructions. This was missed in r221706. llvm-svn: 221731	2014-11-11 23:13:15 +00:00
Juergen Ributzka	89441b0dd8	[FastISel][AArch64] Add support for fabs intrinsic. Lower the llvm.fabs intrinsic to the 'fabs' MI instruction. This fixes rdar://problem/18946552. llvm-svn: 221729	2014-11-11 23:10:44 +00:00
Duncan P. N. Exon Smith	de36e8040f	Revert "IR: MDNode => Value" Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711	2014-11-11 21:30:22 +00:00
Tom Roeder	eb7a303d1b	Add Forward Control-Flow Integrity. This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 llvm-svn: 221708	2014-11-11 21:08:02 +00:00
Sanjay Patel	e2e589288f	Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385). This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ). For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput. We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate. Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips. More background here: http://llvm.org/bugs/show_bug.cgi?id=21385 Differential Revision: http://reviews.llvm.org/D6175 llvm-svn: 221706	2014-11-11 20:51:00 +00:00
Bill Schmidt	3d9674cfb1	[PowerPC] Replace foul hackery with real calls to __tls_get_addr My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get_addr when lowering a TargetGlobalAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping with unhappy results. Additionally, since these weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls. The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp. The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence. The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs. During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method. Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed. There are existing TLS tests to verify the changes haven't messed anything up). I've added one new test that verifies that the problem with the original code has been fixed. llvm-svn: 221703	2014-11-11 20:44:09 +00:00
Rafael Espindola	a9c28b68cd	Use a 8 bit immediate when possible. This fixes pr21529. llvm-svn: 221700	2014-11-11 19:46:36 +00:00
Dario Domizioli	e904e85faf	[X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access The ISel lowering for global TLS access in PIC mode was creating a pseudo instruction that is later expanded to a call, but the code was not setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack flag. This caused some functions to be mistakenly recognized as leaf functions, and this in turn affected the decision to eliminate the frame pointer. With the fix, hasCalls is properly set and the leaf frame pointer is correctly preserved. llvm-svn: 221695	2014-11-11 18:44:49 +00:00
Vasileios Kalintiris	b2dd15f8c7	[mips] Add preliminary support for the MIPS II target. Summary: This patch enables code generation for the MIPS II target. Pre-Mips32 targets don't have the MUL instruction, so we add the correspondent pattern that uses the MULT/MFLO combination in order to retrieve the product. This is WIP as we don't support code generation for select nodes due to the lack of conditional-move instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6150 llvm-svn: 221686	2014-11-11 11:43:55 +00:00
Vasileios Kalintiris	8c1c95e95c	[mips] Add hardware register name "hwr_ulr" ($29) The canonical name when printing assembly is still $29. The reason is that GAS does not accept "$hwr_ulr" at the moment. This addresses the comments from r221307, which reverted the original commit r221299. llvm-svn: 221685	2014-11-11 11:22:39 +00:00
Andrea Di Biagio	5fa2e15453	[X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'. This helps the DAGCombiner to identify more opportunities to fold shuffles. llvm-svn: 221684	2014-11-11 11:20:31 +00:00
Vasileios Kalintiris	10b5ba3f6e	Recommit "[mips] Add names and tests for the hardware registers" The original commit r221299 was reverted in r221307. I removed the name "hrw_ulr" ($29) from the original commit because two tests were failing. llvm-svn: 221681	2014-11-11 10:31:31 +00:00
Craig Topper	f655cddb13	Use uint64_t as the type for the X86 TSFlag format enum. Allows removal of the VEXShift hack that was used to access the higher bits of TSFlags. llvm-svn: 221673	2014-11-11 07:32:32 +00:00
Michael Kuperstein	3fe15e498f	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Recommitting - This time, with a hopefully working test. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221672	2014-11-11 07:07:40 +00:00
Jingyue Wu	dfd4eb9285	[NVPTX] Remove dead code in NVPTXTargetTransformInfo (NFC) llvm-svn: 221668	2014-11-11 05:24:04 +00:00
Rafael Espindola	961d469445	MCAsmParserExtension has a copy of the MCAsmParser. Use it. Base classes were storing a second copy. llvm-svn: 221667	2014-11-11 05:18:41 +00:00
Quentin Colombet	360460ba64	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657	2014-11-11 02:23:47 +00:00
Michael Kuperstein	217e1eec0d	Reverting r221626 due to a too-strict test. llvm-svn: 221629	2014-11-10 21:07:41 +00:00
Juergen Ributzka	ea5870a530	[AArch64][FastISel] Fix kill flags for integer extends. In the case we optimize an integer extend away and replace it directly with the source register, we also have to clear all kill flags at all its uses. This is necessary, because the orignal IR instruction might be trivially dead, but we replaced it with a nop at MI level. llvm-svn: 221628	2014-11-10 21:05:31 +00:00
Michael Kuperstein	3218b942f4	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221626	2014-11-10 20:40:21 +00:00
Jingyue Wu	0c981bd7df	[NVPTX] Add an NVPTX-specific TargetTransformInfo Summary: It currently only implements hasBranchDivergence, and will be extended in later diffs. Split from D6188. Test Plan: make check-all Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6195 llvm-svn: 221619	2014-11-10 18:38:25 +00:00
Rafael Espindola	4aa6bea7a2	Misc style fixes. NFC. This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615	2014-11-10 18:11:10 +00:00
Zoran Jovanovic	37bca10148	[mips][microMIPS] Fix issue with delay slot filler and microMIPS Differential Revision: http://reviews.llvm.org/D6193 llvm-svn: 221612	2014-11-10 17:27:56 +00:00
Daniel Sanders	87f9b88bfb	[mips] Fix sret arguments for N32/N64 which were accidentally broken in r221534. llvm-svn: 221604	2014-11-10 15:57:53 +00:00
Matt Arsenault	23755997e4	R600: Remove unused define llvm-svn: 221543	2014-11-07 20:45:00 +00:00
Daniel Sanders	c43cda84ff	[mips] Promote i32 arguments to i64 for the N32/N64 ABI and fix <64-bit structs... Summary: ... and after all that refactoring, it's possible to distinguish softfloat floating point values from integers so this patch no longer breaks softfloat to do it. Remove direct handling of i32's in the N32/N64 ABI by promoting them to i64. This more closely reflects the ABI documentation and also fixes problems with stack arguments on big-endian targets. We now rely on signext/zeroext annotations (already generated by clang) and the Assert[SZ]ext nodes to avoid the introduction of unnecessary sign/zero extends. It was not possible to convert three tests to use signext/zeroext. These tests are bswap.ll, ctlz-v.ll, ctlz-v.ll. It's not possible to put signext on a vector type so we just accept the sign extends here for now. These tests don't pass the vectors the same way clang does (clang puts multiple elements in the same argument, these map 1 element to 1 argument) so we don't need to worry too much about it. With this patch, all known N32/N64 bugs should be fixed and we now pass the first 10,000 tests generated by ABITest.py. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6117 llvm-svn: 221534	2014-11-07 16:54:21 +00:00
Daniel Sanders	b315c8c762	[mips] Removed the remainder of MipsCC. NFC. Summary: One of the calls to AllocateStack (the one in LowerCall) doesn't look like it should be there but it was there before and removing it breaks the frame size calculation. Reviewers: vmedic, theraven Reviewed By: theraven Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6116 llvm-svn: 221529	2014-11-07 15:33:08 +00:00
Daniel Sanders	2c6f4b430b	[mips] Remove MipsCC::reservedArgArea() in favour of MipsABIInfo::GetCalleeAllocdArgSizeInBytes(). NFC. Summary: Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6115 llvm-svn: 221528	2014-11-07 15:03:53 +00:00
NAKAMURA Takumi	0ebd071450	MipsCCState.h: Use LLVM_DELETED_FUNCTION for msc17. llvm-svn: 221527	2014-11-07 14:56:31 +00:00
Daniel Sanders	0456c15c58	[mips] Move MipsCCState to a separate file and clang-formatted it. Summary: Depends on D6113 Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6114 llvm-svn: 221525	2014-11-07 14:24:31 +00:00
Daniel Sanders	892cf8af46	[mips] Fix unused variable warnings introduced in r221521 llvm-svn: 221522	2014-11-07 12:43:01 +00:00
Daniel Sanders	d7eba31508	[mips] Remove remaining use of MipsCC::intArgRegs() in favour of MipsABIInfo::GetByValArgRegs() and MipsABIInfo::GetVarArgRegs() Summary: Depends on D6112 Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6113 llvm-svn: 221521	2014-11-07 12:21:37 +00:00
Daniel Sanders	4f1bedaa47	[mips] Remove MipsCC::getRegVT(). NFC Summary: It's no longer used. Reviewers: vmedic, theraven Reviewed By: theraven Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6112 llvm-svn: 221519	2014-11-07 12:02:59 +00:00
Daniel Sanders	cfad1e3fca	[mips] Remove MipsCC::analyzeCallOperands in favour of CCState::AnalyzeCallOperands. NFC Summary: In addition to the usual f128 workaround, it was also necessary to provide a means of accessing ArgListEntry::IsFixed. Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6111 llvm-svn: 221518	2014-11-07 11:43:49 +00:00
Daniel Sanders	41a64c407f	[mips] Move SpecialCallingConv to MipsCCState and use it from tablegen-erated code. NFC Summary: In the long run, it should probably become a calling convention in its own right but for now just move it out of MipsISelLowering::analyzeCallOperands() so that we can drop this function in favour of CCState::AnalyzeCallOperands(). Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6085 llvm-svn: 221517	2014-11-07 11:10:48 +00:00
Daniel Sanders	f3096a1c8d	[mips] Removed IsVarArg from MipsISelLowering::analyzeCallOperands(). NFC. Summary: CCState objects already carry this information in their isVarArg() method. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6084 llvm-svn: 221516	2014-11-07 10:45:16 +00:00
Ahmed Bougacha	72001cf287	[AArch64] Keep flags on condition vreg when instantiating a CB branch. Reversing a CB* instruction used to drop the flags on the condition. On the included testcase, this lead to a read from an undefined vreg. Using addOperand keeps the flags, here <undef>. Differential Revision: http://reviews.llvm.org/D6159 llvm-svn: 221507	2014-11-07 02:50:00 +00:00
Simon Pilgrim	615ab8e721	[X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq) Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions. Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables. Differential Revision: http://reviews.llvm.org/D6001 llvm-svn: 221489	2014-11-06 22:15:41 +00:00
Ahmed Bougacha	b5367eeea3	[X86] Add VFMADDSUB cases for the 213->231 custom inserter. Also add tests for vfmadd/vfmsub. llvm-svn: 221488	2014-11-06 22:04:15 +00:00
Ahmed Bougacha	9152361d73	[X86] Add missing FMA3 VFMADDSUB in the emitter. Also reuse the fma4 intrinsic test to cover fma3 instructions too. llvm-svn: 221487	2014-11-06 21:58:11 +00:00
Colin LeMahieu	2c769209a1	[Hexagon] Adding basic Hexagon ELF object emitter. llvm-svn: 221465	2014-11-06 17:05:51 +00:00
Eli Bendersky	799c564236	Clean up NVPTXLowerStructArgs.cpp. NFC * Remove unnecessary const_casts and C-style casts * Simplify attribute access code * Simplify ArrayRef creation * 80-col and clang-format llvm-svn: 221464	2014-11-06 17:05:49 +00:00
Daniel Sanders	2373af3475	[mips] Removed IsSoftFloat from MipsISelLowering::analyzeCallOperands(). NFC Summary: It isn't used anymore. Depends on D6081 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6083 llvm-svn: 221463	2014-11-06 16:48:57 +00:00
Daniel Sanders	b70e27ca7b	[mips] Removed MipsISelLowering::analyzeFormalArguments() in favour of CCState::AnalyzeFormalArguments() Summary: As with returns, we must be able to identify f128 arguments despite them being lowered away. We do this with a pre-analyze step that builds a vector and then we use this vector from the tablegen-erated code. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6081 llvm-svn: 221461	2014-11-06 16:36:30 +00:00
Andrea Di Biagio	7ecd22ca4a	[X86] When commuting SSE immediate blend, make sure that the new blend mask is a valid imm8. Example: define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) { %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3> ret <4 x i32> %shuffle } Before llc (-mattr=+sse4.1), produced the following assembly instruction: pblendw $4294967103, %xmm1, %xmm0 After pblendw $63, %xmm1, %xmm0 llvm-svn: 221455	2014-11-06 14:36:45 +00:00
Aaron Ballman	e77ffe35bf	Fixing some -Wcast-qual warnings; NFC. llvm-svn: 221454	2014-11-06 14:32:30 +00:00
Toma Tabacu	27cab751ca	[mips] Tolerate the use of the %z inline asm operand modifier with non-immediates. Summary: Currently, we give an error if %z is used with non-immediates, instead of continuing as if the %z isn't there. For example, you use the %z operand modifier along with the "Jr" constraints ("r" makes the operand a register, and "J" makes it an immediate, but only if its value is 0). In this case, you want the compiler to print "$0" if the inline asm input operand turns out to be an immediate zero and you want it to print the register containing the operand, if it's not. We give an error in the latter case, and we shouldn't (GCC also doesn't). Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6023 llvm-svn: 221453	2014-11-06 14:25:42 +00:00
Sasa Stankovic	b38db1eff8	[mips] Add the following MIPS options that control gp-relative addressing of small data items: -mgpopt, -mlocal-sdata, -mextern-sdata. Implement gp-relative addressing for constants. Differential Revision: http://reviews.llvm.org/D4903 llvm-svn: 221450	2014-11-06 13:20:12 +00:00
Toma Tabacu	dde4c464dd	[mips] Improve error/warning messages and testing for the .cpload assembler directive. Summary: Improved warning message when using .cpload inside a reorder section and added an error message for using .cpload with Mips16 enabled. Modified the tests to fit with the changes mentioned above, added a test-case for the N32 ABI in cpload.s and did some reformatting to make the tests easier to read. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5465 llvm-svn: 221447	2014-11-06 10:02:45 +00:00
David Majnemer	03d2c51cf2	X86, MC: Tidy up some whitespace in GetRelocType No functionality change intended. llvm-svn: 221443	2014-11-06 08:10:37 +00:00
Quentin Colombet	dbe33e7aa4	[X86] Lower VSELECT into SHRUNKBLEND when we shrink the bits used into the condition to match a blend. This prevents optimizations that work on VSELECT to perform invalid transformations. Indeed, the optimized condition does not match the vector boolean content that is expected and bad things may happen. This patch yields the exact same code on the whole test-suite + specs (-O3 and -O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a bug reduced in vselect-avx.ll. <rdar://problem/18819506> llvm-svn: 221429	2014-11-06 02:25:03 +00:00
Simon Pilgrim	1fc483d991	[X86][SSE] Vector integer to float conversion memory folding Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works. Differential Revision: http://reviews.llvm.org/D5981 llvm-svn: 221407	2014-11-05 22:28:25 +00:00
Matt Arsenault	f2676a5afc	R600/SI: Fix omod display for VOP3b llvm-svn: 221387	2014-11-05 19:35:00 +00:00
Derek Schuff	a54222045e	[x86 fast-isel] Materialize allocas with the correct-sized lea for ILP32 Summary: X86FastISel::fastMaterializeAlloca was incorrectly conditioning its opcode selection on subtarget bitness rather than pointer size. Differential Revision: http://reviews.llvm.org/D6136 llvm-svn: 221386	2014-11-05 19:27:21 +00:00
Matt Arsenault	f3cd4512ac	R600/SI: Move all rsrc building functions to SIISelLowering llvm-svn: 221383	2014-11-05 19:01:19 +00:00
Matt Arsenault	485defe58c	R600/SI: Remove SI_ADDR64_RSRC llvm-svn: 221382	2014-11-05 19:01:17 +00:00
Justin Holewinski	3d140fcfd1	[NVPTX] Add NVPTXLowerStructArgs pass This works around the limitation that PTX does not allow .param space loads/stores with arbitrary pointers. If a function has a by-val struct ptr arg, say foo(%struct.x byval %d), then add the following instructions to the first basic block : %temp = alloca %struct.x, align 8 %tt1 = bitcast %struct.x %d to i8 * %tt2 = llvm.nvvm.cvt.gen.to.param %tt2 %tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) * %tv = load %struct.x addrspace(101) * %tempd store %struct.x %tv, %struct.x * %temp, align 8 The above code allocates some space in the stack and copies the incoming struct from param space to local space. Then replace all occurences of %d by %temp. Fixes PR21465. llvm-svn: 221377	2014-11-05 18:19:30 +00:00
Duncan P. N. Exon Smith	c5754a65e6	IR: MDNode => Value: NamedMDNode::getOperator() Change `NamedMDNode::getOperator()` from returning `MDNode ` to returning `Value `. To reduce boilerplate at some call sites, add a `getOperatorAsMDNode()` for named metadata that's expected to only return `MDNode` -- for now, that's everything, but debug node named metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change. This is part of PR21433. Note that there's a follow-up patch to clang for the API change. llvm-svn: 221375	2014-11-05 18:16:03 +00:00
Tilmann Scheller	30c5ca25a5	[ARM] Remove more dead code. Dead code identified by the Clang static analyzer. llvm-svn: 221372	2014-11-05 17:45:04 +00:00
Zoran Jovanovic	06c9d55123	ps][microMIPS] Implement CodeGen support for ANDI16 instruction llvm-svn: 221371	2014-11-05 17:43:00 +00:00
Colin LeMahieu	816ef086f6	[Hexagon] [NFC] Alphabetizing cmake files. llvm-svn: 221370	2014-11-05 17:38:48 +00:00
Zoran Jovanovic	9f99723d92	ps][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions llvm-svn: 221369	2014-11-05 17:38:31 +00:00
Tilmann Scheller	c339992338	[ARM] Remove another redundant assignment. Found by the Clang static analyzer. llvm-svn: 221368	2014-11-05 17:34:04 +00:00
Zoran Jovanovic	8853171b46	[mips][microMIPS] Implement ANDI16 instruction llvm-svn: 221367	2014-11-05 17:31:00 +00:00
Tilmann Scheller	219ad28076	[ARM] Remove redundant assignment. Found by the Clang static analyzer. llvm-svn: 221366	2014-11-05 17:28:19 +00:00
Tilmann Scheller	f2572c5097	[ARM] Remove dead code identified by the Clang static analyzer. llvm-svn: 221358	2014-11-05 17:10:43 +00:00
Zoran Jovanovic	9c654830f7	[mips][microMIPS] Mark symbols as microMIPS if necessary Differential Revision: http://reviews.llvm.org/D6039 llvm-svn: 221355	2014-11-05 16:35:20 +00:00
Zoran Jovanovic	a87308c84c	Reverted revisions 221351, 221352 and 221353. llvm-svn: 221354	2014-11-05 16:19:59 +00:00
Zoran Jovanovic	3038500f3b	[mips][microMIPS] Implement CodeGen support for ANDI16 instruction Differential Revision: http://reviews.llvm.org/D5797 llvm-svn: 221353	2014-11-05 15:54:05 +00:00
Zoran Jovanovic	f4f5f1e272	[mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions Differential Revision: http://reviews.llvm.org/D5933 llvm-svn: 221352	2014-11-05 15:46:53 +00:00
Zoran Jovanovic	e548bb0634	[mips][microMIPS] Implement ANDI16 instruction Differential Revision: http://reviews.llvm.org/D5163 llvm-svn: 221351	2014-11-05 15:39:41 +00:00
Tom Stellard	326d6ece94	R600/SI: Change all instruction assembly names to lowercase. This matches the format produced by the AMD proprietary driver. //==================================================================// // Shell script for converting .ll test cases: (Pass the .ll files you want to convert to this script as arguments). //==================================================================// ; This was necessary on my system so that A-Z in sed would match only ; upper case. I'm not sure why. export LC_ALL='C' TEST_FILES="$" MATCHES=`grep -v Patterns SIInstructions.td \| grep -o '"[A-Z0-9_]\+["e]' \| grep -o '[A-Z0-9_]\+' \| sort -r` for f in $TEST_FILES; do # Check that there are SI tests: grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f if [ $? -eq 0 ]; then for match in $MATCHES; do sed -i -e "s/$[ :]$match$/\L\1/" $f done # Try to get check lines with partial instruction names sed -i 's/$;[ ]SI[A-Z\\-]: $$[A-Z_0-9]\+$/\1\L\2/' $f fi done sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.32.ll ../../../test/CodeGen/R600/sext-in-reg.ll sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll sed -i 's/$; CHECK[-NOT]*: $$[A-Z_0-9]\+$/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll //==================================================================// // Shell script for converting .td files (run this last) //==================================================================// export LC_ALL='C' sed -i -e '/Patterns/!s/$"[A-Z0-9_]\+[ "e]$/\L\1/g' SIInstructions.td sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td llvm-svn: 221350	2014-11-05 14:50:53 +00:00
Andrea Di Biagio	ce46b97b48	[X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks. This patch improves the folding of vector AND nodes into blend operations for targets that feature SSE4.1. A vector AND node where one of the operands is a constant build_vector with elements that are either zero or all-ones can be converted into a blend. This allows for example to simplify the following code: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1> %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0> %3 = or <4 x i32> %1, %2 ret <4 x i32> %3 } Before this patch llc (-mcpu=corei7) generated: andps LCPI1_0(%rip), %xmm0, %xmm0 andps LCPI1_1(%rip), %xmm1, %xmm1 orps %xmm1, %xmm0, %xmm0 retq With this patch we generate a single 'vpblendw'. llvm-svn: 221343	2014-11-05 13:04:14 +00:00
Oliver Stannard	9e89d8cc5c	[ARM] Honor FeatureD16 in the assembler and disassembler Some ARM FPUs only have 16 double-precision registers, rather than the normal 32. LLVM represents this with the D16 target feature. This is currently used by CodeGen to avoid using high registers when they are not available, but the assembler and disassembler do not. I fix this in the assmebler and disassembler rather than the InstrInfo.td files, as the latter would require a large number of changes everywhere one of the floating-point instructions is referenced in the backend. This solution is similar to the one used for co-processor numbers and MSR masks. llvm-svn: 221341	2014-11-05 12:06:39 +00:00
Tim Northover	dc0d9e46a5	ARM: try to add extra CS-register whenever stack alignment >= 8. We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the stack alignment is prcisely 8. Many of the reasons for doing it apply also when that alignment > 8 (the extra store is often free, and can save another stack adjustment, though less frequently for 16-byte stack alignment). llvm-svn: 221321	2014-11-05 00:27:20 +00:00
Tim Northover	228c943f31	ARM/Dwarf: correctly align stack before callee-saved VPRs We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that failed we went ahead and pushed anyway. This had a couple of potential issues: + The .cfi directives we emit misplaced dN because they were based on PrologEpilogInserter's calculation. + Unaligned stores can be less efficient. + Unaligned stores can actually fault (likely only an issue in niche cases, but possible). This adds a final explicit stack adjustment if all other options fail, so that the actual locations of the registers match up with where they should be. llvm-svn: 221320	2014-11-05 00:27:13 +00:00
Simon Pilgrim	c9a0779309	[X86][SSE] Enable commutation for SSE immediate blend instructions Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask. This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests). Differential Revision: http://reviews.llvm.org/D6015 llvm-svn: 221313	2014-11-04 23:25:08 +00:00
Juergen Ributzka	f9660f0712	[AArch64] Use the correct register class for ORR. While fixing up the register classes in the machine combiner in a previous commit I missed one. This fixes the last one and adds a test case. llvm-svn: 221308	2014-11-04 22:20:07 +00:00
Rafael Espindola	d85260827c	Revert "[mips] Add names and tests for the hardware registers" This reverts commit r221299. The tests LLVM :: MC/Disassembler/Mips/mips32.txt LLVM :: MC/Disassembler/Mips/mips32_le.txt were failing. llvm-svn: 221307	2014-11-04 22:15:05 +00:00
Vasileios Kalintiris	a16974a5c0	[mips] Move COP2 & COP3 load/store instructions from MipsInstrFPU.td to MipsInstrInfo.td. NFC. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5843 llvm-svn: 221300	2014-11-04 21:45:16 +00:00
Vasileios Kalintiris	df6e0d0371	[mips] Add names and tests for the hardware registers Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5763 llvm-svn: 221299	2014-11-04 21:30:44 +00:00
Andrea Di Biagio	f5b34e535d	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver3'. Also explicit set FeatureAVX and FeatureSSE4A for all the bdver* cpus. This patch adds 'FeatureSlowSHLD' to 'bdver3'. According to the official AMD optimization guide for amdfam15: "Using alternative code in place of SHLD achieves lower overall latency and requires fewer execution resources. The 32-bit and 64-bit forms of ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath instructions, while SHLD is a VectorPath instruction." This patch also explicitly sets feature AVX and SSE4A for all the bdver* cpus. This part of the patch is a non-functional change and it is mainly done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A). llvm-svn: 221296	2014-11-04 21:18:09 +00:00
Matt Arsenault	a95f5a0ec1	R600/SI: Rename div_scale dest operands to match documentation llvm-svn: 221291	2014-11-04 20:29:20 +00:00
Benjamin Kramer	185dc0da1f	AArch64: Pattern match integer vector abs like we do on ARM. This kind of pattern is emitted by the loop vectorizer. llvm-svn: 221289	2014-11-04 20:10:06 +00:00
Toma Tabacu	cc2502d8f3	[mips] Improve support for the .set mips16/nomips16 assembler directives. Summary: Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted). These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled). Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5462 llvm-svn: 221277	2014-11-04 17:18:07 +00:00
NAKAMURA Takumi	8d8f396d86	R600/LLVMBuild.txt: Add TransformUtils. llvm-svn: 221228	2014-11-04 02:16:53 +00:00
Colin LeMahieu	5241881bbc	[Hexagon] Reverting 220584 to address ASAN errors. llvm-svn: 221210	2014-11-04 00:14:36 +00:00
Akira Hatanaka	bc950d52d7	Rename variables to conform to llvm coding standards. Differential Revision: http://reviews.llvm.org/D6062 llvm-svn: 221204	2014-11-03 23:24:10 +00:00
Akira Hatanaka	9ee2c26b49	[AArch64] Make function processLogicalImmediate more efficient. NFC. llvm-svn: 221199	2014-11-03 23:06:31 +00:00
Ahmed Bougacha	ec0f3d755f	[X86] Add debug print name for X86ISD::[US]MUL8. NFC-ish. The opcodes were added in r220516, but I forgot to add the print names. llvm-svn: 221185	2014-11-03 21:25:18 +00:00
Akira Hatanaka	b961534818	[ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return register class tGPRRegClass if the target is thumb1. This commit fixes a crash that occurs during register allocation which was triggered when a virtual register defined by an inline-asm instruction had to be spilled. rdar://problem/18740489 llvm-svn: 221178	2014-11-03 20:37:04 +00:00
Ahmed Bougacha	12eb558bd9	[X86] 8bit divrem: Improve codegen for AH register extraction. For 8-bit divrems where the remainder is used, we used to generate: divb %sil shrw $8, %ax movzbl %al, %eax That was to avoid an H-reg access, which is problematic mainly because it isn't possible in REX-prefixed instructions. This patch optimizes that to: divb %sil movzbl %ah, %eax To do that, we explicitly extend AH, and extract the L-subreg in the resulting register. The extension is done using the NOREX variants of MOVZX. To support signed operations, MOVSX_NOREX is also added. Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is then lowered to a sequence containing a single zext (rather than 2). Differential Revision: http://reviews.llvm.org/D6064 llvm-svn: 221176	2014-11-03 20:26:35 +00:00
Tom Stellard	5cbb53c41e	Reapply: R600: Make sure to inline all internal functions Function calls aren't supported yet. This was reverted due to build breakages, which should be fixed now. llvm-svn: 221173	2014-11-03 19:49:05 +00:00
Duncan P. N. Exon Smith	3d5a02f677	IR: MDNode => Value: Instruction::getAllMetadataOtherThanDebugLoc() Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of `MDNode` to one of `Value`. Part of PR21433. llvm-svn: 221167	2014-11-03 18:13:57 +00:00
Charlie Turner	1d8cc909cc	Remove the cortex-a9-mp CPU. This CPU definition is redundant. The Cortex-A9 is defined as supporting multiprocessing extensions. Remove its definition and update appropriate tests. LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only difference between the two CPU definitions in ARM.td is that cortex-a9-mp contains the feature FeatureMP for multiprocessing extensions. This is redundant since the Cortex-A9 is defined as having multiprocessing extensions in the TRMs. armcc also defines the Cortex-A9 as having multiprocessing extensions by default. Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7 llvm-svn: 221166	2014-11-03 17:38:00 +00:00
Oliver Stannard	269a275cb4	[AArch64] Fix miscompile of comparison with 0xffffffffffffffff Some literals in the AArch64 backend had 15 'f's rather than 16, causing comparisons with a constant 0xffffffffffffffff to be miscompiled. llvm-svn: 221157	2014-11-03 15:28:40 +00:00
Sid Manning	326f8af463	Handle ctor/init_array initialization. Hexagon was not calling InitializeELF and could not select between ctors and init_array. Phabricator revision: http://reviews.llvm.org/D6061 llvm-svn: 221156	2014-11-03 14:56:05 +00:00
Daniel Sanders	0ad1719d15	[mips] Remove unused prototype and variable. NFC. llvm-svn: 221146	2014-11-03 10:14:57 +00:00
Matt Arsenault	4cd1d4ecb1	R600: Don't unnecessarily repeat the register class llvm-svn: 221119	2014-11-02 23:46:59 +00:00
Matt Arsenault	7d858d87cd	R600/SI: Use REG_SEQUENCE instead of INSERT_SUBREGs llvm-svn: 221118	2014-11-02 23:46:54 +00:00
Matt Arsenault	eb49216bba	Support REG_SEQUENCE in tablegen. The problem is mostly that variadic output instruction aren't handled, so it is rejected for having an inconsistent number of operands, and then the right number of operands isn't emitted. llvm-svn: 221117	2014-11-02 23:46:51 +00:00
Daniel Sanders	23e987766b	Re-commit r221056 and others with fix, "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC." sret arguments can never originate from an f128 argument so we detect sret arguments and push false into OriginalArgWasF128. llvm-svn: 221102	2014-11-02 16:09:29 +00:00
NAKAMURA Takumi	cd2996c3e3	Revert r221056 and others, "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC." r221056 "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC." r221058 "[mips] Fix unused variable warning introduced in r221056" r221059 "[mips] Move all ByVal handling into CCState and tablegen-erated code. NFC." r221061 "Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC." It cuased an undefined behavior in LLVM :: CodeGen/Mips/return-vector.ll. llvm-svn: 221081	2014-11-02 04:43:54 +00:00
Daniel Sanders	8104b75c9f	Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC. Reviewers: rnk Reviewed By: rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D5978 llvm-svn: 221061	2014-11-01 19:32:23 +00:00
Daniel Sanders	88e1c7393b	[mips] Move all ByVal handling into CCState and tablegen-erated code. NFC. Summary: CCState already contains a byval implementation that is very similar to the Mips custom code. This patch merges the custom code into the existing common code and tablegen-erated code. Reviewers: vmedic Reviewed By: vmedic Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D5977 llvm-svn: 221059	2014-11-01 19:17:10 +00:00
Daniel Sanders	658dc47179	[mips] Fix unused variable warning introduced in r221056 llvm-svn: 221058	2014-11-01 18:53:01 +00:00
Daniel Sanders	f43e68793b	[mips] Remove ByValArgInfo::Address in favour of CCValAssign::getMemLocOffset(). NFC. Summary: ByValArgInfo is practically the same as CCState::ByValInfo now. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5976 llvm-svn: 221057	2014-11-01 18:44:56 +00:00
Daniel Sanders	eac09608d0	[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC. Summary: There are a couple more changes to make before analyzeFormalArguments can be merged into the standard AnalyzeFormalArguments. I've had to temporarily poke a couple holes in MipsCCState's encapsulation to save having to make all the required changes for this merge all at once. These will be removed shortly. We must merge our ByVal argument handling with the implementation in CCState. This will be done over the next three patches, then the fourth will merge analyzeFormalArguments with AnalyzeFormalArguments. Depends on D5967 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5969 llvm-svn: 221056	2014-11-01 18:38:03 +00:00
Daniel Sanders	853c2435b6	[mips] Remove MipsCC::CCInfo. NFC. Summary: It's now passed in as an argument to functions that need it. Eventually this argument will be replaced by the 'this' pointer for a MipsCCState object. Depends on D5966 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5967 llvm-svn: 221054	2014-11-01 18:13:52 +00:00
Daniel Sanders	068eea2d14	[mips] Removed MipsCC::fixedArgFn(). NFC Summary: There is one remaining trace of it in MipsCC::analyzeCallOperands() where Mips16 might override the calling convention. This will moved into tablegen-erated code later. Depends on D5965 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5966 llvm-svn: 221053	2014-11-01 17:44:51 +00:00
Daniel Sanders	ca80f1a05a	[tablegen] Add CustomCallingConv and use it to tablegen-erate the outermost parts of the Mips O32 implementation Summary: CustomCallingConv is simply a CallingConv that tablegen should not generate the implementation for. It allows regular CallingConv's to delegate to these custom functions. This is (currently) necessary for Mips and we cannot use CCCustom without having to adapt to the different API that CCCustom uses. This brings us a bit closer to being able to remove MipsCC::analyzeCallOperands and MipsCC::analyzeFormalArguments in favour of the common implementation. No functional change to the targets. Depends on D3341 Reviewers: vmedic Reviewed By: vmedic Subscribers: vmedic, llvm-commits Differential Revision: http://reviews.llvm.org/D5965 llvm-svn: 221052	2014-11-01 17:38:22 +00:00
Rafael Espindola	246c4fb5d9	Remove redundant calls to isMaterializable. This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050	2014-11-01 16:46:18 +00:00
Daniel Sanders	a017974b9a	Revert r221048 - Test commit It seems I can't commit unless $DBUS_SESSION_BUS_ADDRESS is set correctly and it is not set for ssh sessions. llvm-svn: 221049	2014-11-01 16:08:03 +00:00
Daniel Sanders	5903eb50b1	Test commit Added some whitespace to debug some authentication issues I'm having. llvm-svn: 221048	2014-11-01 16:00:40 +00:00
Adrian Prantl	a0852d2be3	Revert "Temporarily revert r220777 to sort out build bot breakage." This reverts commit r221028. Later commits depend on this and reverting just this one causes even more bots to fail. llvm-svn: 221041	2014-11-01 03:19:45 +00:00
NAKAMURA Takumi	2ab9801509	Revert r220779, "[AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine." Since r221028 (reverting r220777), this caused failures. llvm-svn: 221040	2014-11-01 01:36:14 +00:00
Adrian Prantl	cd4872399a	Temporarily revert r220777 to sort out build bot breakage. "[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros." llvm-svn: 221028	2014-11-01 00:26:59 +00:00
Duncan P. N. Exon Smith	3872d0084c	IR: MDNode => Value: Instruction::getMetadata() Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024	2014-11-01 00:10:31 +00:00
Reid Kleckner	6ce76925de	Revert "R600: Add missing file to CMakeLists.txt" This reverts commit r220998. It should've been reverted with the other change. llvm-svn: 221021	2014-10-31 23:39:10 +00:00
Reid Kleckner	9abe268adb	Revert "R600: Make sure to inline all internal functions" This reverts commit r220996. It introduced layering violations causing link errors in many configurations. llvm-svn: 221020	2014-10-31 23:35:26 +00:00
Reid Kleckner	da00cf5f73	Work around bugs in MSVC "14" CTP 3's conversion logic It appears to ignore or find ambiguous MachineInstrBuilder's conversion operators that allow conversion to MachineInstr* and MachineBasicBlock::bundle_iterator. As a workaround, add an explicit way to get the MachineInstr. llvm-svn: 221017	2014-10-31 23:19:46 +00:00
Tom Stellard	6693f9cf3d	R600: Add IPO to the list of required libraries llvm-svn: 221004	2014-10-31 21:52:08 +00:00
Tom Stellard	4ad4177399	R600: Add missing file to CMakeLists.txt llvm-svn: 220998	2014-10-31 20:56:36 +00:00
Tom Stellard	5b2927fe83	R600: Don't promote allocas when one of the users is a ptrtoint instruction We need to figure out how to track ptrtoint values all the way until result is converted back to a pointer in order to correctly rewrite the pointer type. llvm-svn: 220997	2014-10-31 20:52:04 +00:00
Tom Stellard	aa73831757	R600: Make sure to inline all internal functions Function calls aren't supported yet. llvm-svn: 220996	2014-10-31 20:52:02 +00:00
Bill Schmidt	1ca69fa64d	[PowerPC] Initial VSX intrinsic support, with min/max for vector double Now that we have initial support for VSX, we can begin adding intrinsics for programmer access to VSX instructions. This patch adds basic support for VSX intrinsics in general, and tests it by implementing intrinsics for minimum and maximum for the vector double data type. The LLVM portion of this is quite straightforward. There is a companion patch for Clang. llvm-svn: 220988	2014-10-31 19:19:07 +00:00
Chad Rosier	7bb413e3ba	[AArch64] Check Dest Register Liveness in CondOpt pass. Our internal test reveals such case should not be transformed: cmp x17, #3 b.lt .LBB10_15 ... subs x12, x12, #1 b.gt .LBB10_1 where x12 is a liveout, becomes: cmp x17, #2 b.le .LBB10_15 ... subs x12, x12, #2 b.ge .LBB10_1 Unable to provide test case as it's difficult to reproduce on community branch. http://reviews.llvm.org/D6048 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220987	2014-10-31 19:02:38 +00:00
Quentin Colombet	c32615dfef	[CodeGenPrepare] Move extractelement close to store if they can be combined. This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978	2014-10-31 17:52:53 +00:00
Chad Rosier	a675e550ca	[AArch64] CondOpt pass is missing FCMP instructions when searching backward for a CMP which defines the flags used by B.CC. http://reviews.llvm.org/D6047 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220961	2014-10-31 15:17:36 +00:00
Ulrich Weigand	c8c2ea2854	[PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code Since block address values can be larger than 2GB in 64-bit code, they cannot be loaded simply using an @l / @ha pair, but instead must be loaded from the TOC, just like GlobalAddress, ConstantPool, and JumpTable values are. The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where temporary labels could not be used as TOC values, since code would attempt (and fail) to use GetOrCreateSymbol to create a symbol of the same name as the temporary label. llvm-svn: 220959	2014-10-31 10:33:14 +00:00
Robert Khasanov	af318f7073	[AVX512] Added VBROADCAST{SS/SD} encoding for VL subset. Refactored through AVX512_maskable llvm-svn: 220908	2014-10-30 14:21:47 +00:00
Robert Khasanov	595e598869	[AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP, VSUBP, VMULP, VDIVP, VMAXP, VMINP) Refactored through AVX512_maskable Added encoding tests for them. llvm-svn: 220858	2014-10-29 15:43:02 +00:00
Robert Khasanov	1cf354c92f	[AVX512] Fix VSQRT packed instructions internal names. No functional change llvm-svn: 220808	2014-10-28 18:22:41 +00:00
Robert Khasanov	eb12639375	[AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset. Refactored through AVX512_maskable llvm-svn: 220806	2014-10-28 18:15:20 +00:00
Robert Khasanov	3e534c93b9	[AVX-512] Expanded rsqrt/rcp instructions to VL subset. Refactored multiclass through AVX512_maskable llvm-svn: 220783	2014-10-28 16:37:13 +00:00
Robert Khasanov	784c3f0d6e	[AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine. No functional change. llvm-svn: 220779	2014-10-28 16:17:14 +00:00
Robert Khasanov	4441c4d31b	[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros. This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case. Added tests for vselect of <X x i1> values. llvm-svn: 220777	2014-10-28 15:59:40 +00:00
Robert Khasanov	dd09a8f320	[AVX512] Bring back vector-shuffle lowering support through broadcasts Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774	2014-10-28 12:28:51 +00:00
Reid Kleckner	9ccce99e1d	X86: Implement the vectorcall calling convention This is a Microsoft calling convention that supports both x86 and x86_64 subtargets. It passes vector and floating point arguments in XMM0-XMM5, and passes them indirectly once they are consumed. Homogenous vector aggregates of up to four elements can be passed in sequential vector registers, but this part is not implemented in LLVM and will be handled in Clang. On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as integer register parameters and is callee cleanup. On x86_64, it delegates to the normal win64 calling convention. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D5943 llvm-svn: 220745	2014-10-28 01:29:26 +00:00
Tim Northover	00917897b2	AArch64: enable Cortex-A57 FP balancing on Cortex-A53. Benchmarks have shown that it's harmless to the performance there, and having a unified set of passes between the two cores where possible helps big.LITTLE deployment. Patch by Z. Zheng. llvm-svn: 220744	2014-10-28 01:24:32 +00:00
NAKAMURA Takumi	949fb6d276	AArch64InstrInfo.h: Fix a warning introduced in clang r220703. [-Winconsistent-missing-override] llvm-svn: 220739	2014-10-27 23:29:27 +00:00
Adam Nemet	cf7a4a2660	[AVX512] Add vpermil variable version This is implemented via a multiclass that derives from the vperm imm multiclass. Fixes <rdar://problem/18426089> llvm-svn: 220737	2014-10-27 23:08:40 +00:00
Adam Nemet	8d85b0cd3e	[AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo No functionality change. No change in X86.td.expanded except that we only set the CD8 attributes for the memory variants. (This shouldn't be used unless we have a memory operand.) llvm-svn: 220736	2014-10-27 23:08:37 +00:00
Adam Nemet	9aad13164e	[AVX512] Derive vpermil* from avx512_perm_imm This used to derive from avx512_pshuf_imm which is confusing. NFC. Compared X86.td.expanded. llvm-svn: 220735	2014-10-27 23:08:34 +00:00
Adam Nemet	c51cee85b7	[AVX512] Fix copy-and-paste bugs in vpermil 1) i512mem -> f512mem (this is the packed FP input being permuted) 2) element size is 64 bits in EVEX_CD8 for PD. (A good illustration why X86VectorVTInfo is useful) llvm-svn: 220734	2014-10-27 23:08:31 +00:00
Pete Cooper	7c801dc90b	Fix a stackmap bug introduced in r220710. For a call to not return in to the stackmap shadow, the shadow must end with the call. To do this, we must insert any required nops before the call, and not after it. llvm-svn: 220728	2014-10-27 22:38:45 +00:00
Juergen Ributzka	7ccebec668	[FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713	2014-10-27 19:58:36 +00:00
Juergen Ributzka	0190fea941	[FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'. Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712	2014-10-27 19:46:23 +00:00
Pete Cooper	3c0af35232	Stackmap shadows should consider call returns a branch target. To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets. A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return. llvm-svn: 220710	2014-10-27 19:40:35 +00:00
Juergen Ributzka	90f741a2ce	[FastISel][AArch64] Use 'cbz' also for null values (pointers). The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709	2014-10-27 19:38:05 +00:00
Juergen Ributzka	eae91040d8	[FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704	2014-10-27 19:16:48 +00:00
Juergen Ributzka	6de054a25a	[FastISel][AArch64] Fix load/store with frame indices. At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700	2014-10-27 18:21:58 +00:00
Lang Hames	5fe30ca56f	[PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688	2014-10-27 17:44:25 +00:00
NAKAMURA Takumi	729be14435	Prune CRLF. llvm-svn: 220678	2014-10-27 12:37:26 +00:00
Oliver Stannard	79efe41a0c	[ARM] Select VMAXNM and VMINNM regardless of operand order Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671	2014-10-27 09:23:02 +00:00
Yuri Gorshenin	3e22bb8c54	[asan-asm-instrumentation] Added comment describing how asm instrumentation works. Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5970 llvm-svn: 220670	2014-10-27 08:38:54 +00:00
Elena Demikhovsky	4b01b7306c	AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction llvm-svn: 220638	2014-10-26 09:52:24 +00:00
Simon Pilgrim	a63672665f	[X86][SSE] Vector integer/float conversion memory folding Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1). Minor patch agreed with qcolombet. llvm-svn: 220613	2014-10-25 08:11:20 +00:00
Jingyue Wu	ea51161a94	[NVPTX] aligned byte-buffers for vector return types Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607	2014-10-25 03:46:16 +00:00
Kevin Enderby	2813f496d9	Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol. In a Mach-O object file a relocatable expression of the form SymbolA - SymbolB + constant is allowed when both symbols are defined in a section. But when either symbol is undefined it is an error. The code was crashing when it had an undefined symbol in this case. And should have printed a error message using the location information in the relocation entry. rdar://18678402 llvm-svn: 220599	2014-10-24 22:39:40 +00:00
Simon Pilgrim	fd080af0c5	[X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element. An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle. Differential Revision: http://reviews.llvm.org/D5917 llvm-svn: 220592	2014-10-24 21:04:41 +00:00
Colin LeMahieu	838307b31f	[Hexagon] Resubmission of 220427 Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst. Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220584	2014-10-24 19:00:32 +00:00
Sanjay Patel	f924e11967	Allow AVX vrsqrtps generation. This is a follow-on to r220570 that allows a 256-bit (v8f32) version of vrsqrtps to be generated. llvm-svn: 220579	2014-10-24 17:59:18 +00:00
Sanjay Patel	957efc23bb	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Daniel Sanders	e2e25da4b6	[mips] Replace MipsABIEnum with a MipsABIInfo class. Summary: No functional change yet, it's just an object replacement for an enum. It will allow us to gather ABI information in a single place so that we can start testing for properties of the ABI's instead of the ABI itself. For example we will eventually be able to use: ABI.MinStackAlignmentInBytes() instead of: (isABI_N32() \|\| isABI_N64()) ? 16 : 8 which is clearer and more maintainable. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3341 llvm-svn: 220568	2014-10-24 16:15:27 +00:00
Daniel Sanders	f815c137e6	[mips] Fix >80-column line llvm-svn: 220564	2014-10-24 14:46:00 +00:00
Daniel Sanders	8d69aedbd5	[mips] Remove redundant code in RetCC_MipsN. NFC. Summary: i32 is always promoted to i64 so it no longer makes sense to assign i32 to registers. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5964 llvm-svn: 220561	2014-10-24 13:49:54 +00:00
Daniel Sanders	19f01658fe	[mips] For N32/N64, structs must be passed in the upper bits of a register. Summary: Most structs were fixed by r218451 but those of between >32-bits and <64-bits remained broken since they were not marked with [ASZ]ExtUpper. This patch fixes the remaining cases by using CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5963 llvm-svn: 220556	2014-10-24 13:09:19 +00:00
Oliver Stannard	f7a5afc3f2	[AArch64] Fix fast-isel of cbz of i1, i8, i16 This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553	2014-10-24 09:54:41 +00:00
Adam Nemet	832ec5e911	[AVX512] FMA support for the 231 variants This is asm/diasm-only support, similar to AVX. For ISeling the register variant, they are no different from 213 other than whether the multiplication or the addition operand is destructed. For ISeling the memory variant, i.e. to fold a load, they are no different than the 132 variant. The addition operand (op3) in both cases can come from memory. Again the ony difference is which operand is destructed. There could be a post-RA pass that would convert a 213 or 132 into a 231. Part of <rdar://problem/17082571> llvm-svn: 220540	2014-10-24 00:03:00 +00:00
Adam Nemet	26371ce131	[AVX512] Introduce fma3p_forms from AVX This multiclass generates the different forms: 213, 231, 132 in AVX. 132 in AVX512 is a separate class but I am planning to use this same multiclass to generate 231 relying on the nice the null_frag trick from AVX to disable codegen pattern for 231. No functionality change, no change in X86.td.expanded except for the different instruction definition names. llvm-svn: 220539	2014-10-24 00:02:55 +00:00
Ahmed Bougacha	5175bcf43a	[X86] Improve mul w/ overflow codegen, to MUL8+SETO. Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516	2014-10-23 21:55:31 +00:00
Renato Golin	6fb9c2ea70	Do not emit intermediate register for zero FP immediate This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486	2014-10-23 15:31:50 +00:00
NAKAMURA Takumi	5b6f789d7a	Hexagon/Disassembler/LLVMBuild.txt: Update libdeps. llvm-svn: 220482	2014-10-23 11:32:16 +00:00
NAKAMURA Takumi	f459febb15	Hexagon/LLVMBuild.txt: Prune CRLF. llvm-svn: 220481	2014-10-23 11:32:03 +00:00
NAKAMURA Takumi	bd20251a4a	[CMake] Prune CRLF in CMakeLists.txt(s). llvm-svn: 220480	2014-10-23 11:31:50 +00:00
NAKAMURA Takumi	504bbf91cd	Revert r220427, "[Hexagon] Adding encoding bits for add opcode." It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc. llvm-svn: 220478	2014-10-23 11:31:22 +00:00
Zoran Jovanovic	42b8444372	[mips][microMIPS] Implement ADDIUR1SP instruction Differential Revision: http://reviews.llvm.org/D5153 llvm-svn: 220477	2014-10-23 11:13:59 +00:00
Zoran Jovanovic	bac3619b29	ps][microMIPS] Implement ADDIUR2 instruction Differential Revision: http://reviews.llvm.org/D5151 llvm-svn: 220476	2014-10-23 11:06:34 +00:00
Zoran Jovanovic	9bda2f1926	ps][microMIPS] Implement LI16 instruction Differential Revision: http://reviews.llvm.org/D5149 llvm-svn: 220475	2014-10-23 10:59:24 +00:00
Zoran Jovanovic	4a00fdc2e3	[mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions Differential Revision: http://reviews.llvm.org/D5774 llvm-svn: 220474	2014-10-23 10:42:01 +00:00
Oliver Stannard	39a85abddf	[Thumb2] Improve disassembly of memory hints Currently, the ARM disassembler will disassemble the Thumb2 memory hint instructions (PLD, PLDW and PLI), even for targets which do not have these instructions. This patch adds the required checks to the disassmebler. llvm-svn: 220472	2014-10-23 08:52:58 +00:00
Akira Hatanaka	2ee0e9e6ee	[ARM, stack protector] If supported, use armv7 instructions. This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470	2014-10-23 04:17:05 +00:00
Colin LeMahieu	73a51a1a68	[Hexagon] Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220427	2014-10-22 20:58:35 +00:00
Chad Rosier	dcd2a3014c	[AArch64] Add support for the .inst directive. This has been implement using the MCTargetStreamer interface as is done in the ARM, Mips and PPC backends. Phabricator: http://reviews.llvm.org/D5891 PR20964 llvm-svn: 220422	2014-10-22 20:35:57 +00:00
Hans Wennborg	db08566588	Fix VS2012 build; C++11 type aliases are not supported. llvm-svn: 220399	2014-10-22 17:47:49 +00:00
Colin LeMahieu	b424cb1e57	Ammending 220393 - Removing unused decoding tables. llvm-svn: 220397	2014-10-22 17:23:01 +00:00
Colin LeMahieu	9950d5c59a	Ammending 220393 - Removing unused functions. llvm-svn: 220396	2014-10-22 17:03:19 +00:00
Bill Schmidt	9c54bbd791	[PATCH] Support select-cc for VSFRC when VSX is enabled A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to handle <2 x double> cases. This patch adds SELECT_VSFRC and SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the f64 type when VSX is enabled. The changes are analogous to those in the previous patch. I've added a new variant to vsx.ll to test the code generation. (I also cleaned up a little formatting in PPCInstrVSX.td from the previous patch.) llvm-svn: 220395	2014-10-22 16:58:20 +00:00
Colin LeMahieu	88ebb9e2da	[Hexagon] Adding basic disassembler. Marking all instructions as CodeGenOnly since encoding bits are not set yet. http://reviews.llvm.org/D5829?vs=on&id=15023&whitespace=ignore-all#toc llvm-svn: 220393	2014-10-22 16:49:14 +00:00
Bill Schmidt	61e652334f	[PowerPC] Support select-cc for VSX The tests test/CodeGen/Generic/select-cc.ll and test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The problem is that the lowering logic for the SELECT and SELECT_CC operations doesn't currently support the VSX registers. This patch fixes that. In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this for other register classes. Similar pseudos are added in PPCInstrVSX.td (they must be there, because the "vsrc" register class definition appears there) for the VSRC register class. The SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC. The rest of the patch just adds logic for SELECT_VSRC wherever similar logic appears for SELECT_VRRC. There are no new test cases because the existing tests above test this, along with a variant in test/CodeGen/PowerPC/vsx.ll. After discussion with Hal, a future patch will add similar _VSFRC variants to override f64 type handling (currently using F8RC). llvm-svn: 220385	2014-10-22 13:13:40 +00:00
Arnaud A. de Grandmaison	9b3330546b	[AArch64] Cleanup A57PBQPConstraints And add a long awaited testcase. llvm-svn: 220381	2014-10-22 12:40:20 +00:00
Jyoti Allur	3b68607eac	[Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode llvm-svn: 220379	2014-10-22 10:41:14 +00:00
Matt Arsenault	7c93690be0	Add minnum / maxnum codegen llvm-svn: 220342	2014-10-21 23:01:01 +00:00
Matt Arsenault	75c658e2cc	R600/SI: Add missing parameter to div_fmas intrinsic llvm-svn: 220338	2014-10-21 22:20:55 +00:00
Matt Arsenault	8c4fb7cae0	R600: Use default GlobalDirective The overridden one wasn't inserting a space, so you would end up with .globalfoo llvm-svn: 220329	2014-10-21 21:08:36 +00:00
Arnaud A. de Grandmaison	a61262f989	[PBQP] Teach PassConfig to tell if the default register allocator is used. This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321	2014-10-21 20:47:22 +00:00
Rafael Espindola	f03ae4efa7	Drop support for an old version of ld64 (from darwin 9). llvm-svn: 220310	2014-10-21 18:31:09 +00:00
Matt Arsenault	e306a32325	R600/SI: Add pattern for bswap llvm-svn: 220304	2014-10-21 16:25:08 +00:00
NAKAMURA Takumi	9ff272f382	X86AsmInstrumentation.cpp: Dissolve initializer-ranged-for. MSC17 disliked it. llvm-svn: 220301	2014-10-21 16:22:52 +00:00
Colin LeMahieu	7055365e77	Test commit Fixing brief comment. llvm-svn: 220299	2014-10-21 16:03:10 +00:00
Bill Schmidt	5c6cb813b6	[PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in the FMA mutation pass. If we have a situation where a killed product register is the same register as the FMA target, such as: %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5, %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 then the substitution makes no sense. We end up getting a crash when we try to extend the interval associated with the killed product register, as there is already a live range for %vreg5 there. This patch just disables the mutation under those circumstances. Since recipest.ll generates different code with VMX enabled, I've modified that test to use -mattr=-vsx. I've borrowed the code from that test that exposed the bug and placed it in fma-mutate.ll, where it tests several mutation opportunities including the "bad" one. llvm-svn: 220290	2014-10-21 13:02:37 +00:00
Oliver Stannard	cdb8db8d3c	[ARM] NEON 32-bit scalar moves are also available in VFPv2 The 32-bit variants of the NEON scalar<->GPR move instructions are also available in VFPv2. The 8- and 16-bit variants do require NEON. Note that the checks in the test file are all -DAG because they are checking a mixture of stdout and stderr, and the ordering is not guaranteed. llvm-svn: 220288	2014-10-21 11:49:14 +00:00
Yuri Gorshenin	171eb8dbeb	[asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register. Summary: Fixed memory accesses with rbp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5819 llvm-svn: 220283	2014-10-21 10:22:27 +00:00
Oliver Stannard	38e6d45a46	[Thumb2] LDRS?[BH] cannot load to the PC The Thumb2 LDRS?[BH] instructions are not valid when the destination register is the PC (these encodings are used for preload hints). llvm-svn: 220278	2014-10-21 09:14:15 +00:00
Zoran Jovanovic	592239d498	[mips][microMIPS] Implement ADDU16 and SUBU16 instructions Differential Revision: http://reviews.llvm.org/D5118 llvm-svn: 220276	2014-10-21 08:44:58 +00:00
Zoran Jovanovic	81ceebc56e	[mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions Differential Revision: http://reviews.llvm.org/D5117 llvm-svn: 220275	2014-10-21 08:32:40 +00:00
Zoran Jovanovic	b0852e5410	[mips][microMIPS] Implement microMIPS 16-bit instructions registers Differential Revision: http://reviews.llvm.org/D5116 llvm-svn: 220273	2014-10-21 08:23:11 +00:00
Rafael Espindola	c606bfe660	Fix a bit of confusion about .set and produce more readable assembly. Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256	2014-10-21 01:17:30 +00:00
Quentin Colombet	06355199f1	[X86] Fix a bug in the lowering of the mask of VSELECT. X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT when it knows it can be lowered into BLEND. Indeed, only the high bits need to be set for those and it optimizes those accordingly. However, when the mask is a compile time constant, the lowering will be handled by the generic optimizer and those modifications will generate bad code in the generic optimizer. This patch fixes that by preventing the optimization if the VSELECT will be handled by the generic optimizer. <rdar://problem/18675020> llvm-svn: 220242	2014-10-20 23:13:30 +00:00
Simon Pilgrim	2f9548a3ef	[X86] Memory folding for commutative instructions (updated) This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt. Added additional regression test case provided by Joerg Sonnenberger. Differential Revision: http://reviews.llvm.org/D5818 llvm-svn: 220239	2014-10-20 22:14:22 +00:00
Tim Northover	23075ccee7	ARM: rework Thumb1 frame index rewriting The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236	2014-10-20 21:28:41 +00:00
Oliver Stannard	6672f37ed2	[Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M These instructions are related to the v7[AR] exception model, and are not defined on v7M. llvm-svn: 220204	2014-10-20 15:37:35 +00:00
Sid Manning	c374ac97c8	Remove unnecessary else. llvm-svn: 220200	2014-10-20 13:08:19 +00:00
Oliver Stannard	e8f63a54b4	[ARM] Do not select SMULW[BT] or SMLAW[BT] The current instruction selection patterns for SMULW[BT] and SMLAW[BT] are incorrect. These instructions multiply a 32-bit and a 16-bit value (both signed) and return the top 32 bits of the 48-bit result. This preserves the 16 bits of overflow, whereas the patterns they currently match truncate the result to 16 bits then sign extend. To select these instructions, we would need to match an ISD::SMUL_LOHI, a sign extend, two shifts and an or. There is no way to match SMUL_LOHI in an instruction pattern as it defines multiple values, so this would have to be done in C++. I have raised http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct selection of these instructions. This fixes http://llvm.org/bugs/show_bug.cgi?id=19396 llvm-svn: 220196	2014-10-20 11:30:35 +00:00
Oliver Stannard	fce039240a	[Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex This function can, for some offsets from the SP, split one instruction into two. Since it re-uses the original instruction as the first instruction of the result, we need ensure its result register is not marked as dead before we use it in the second instruction. llvm-svn: 220194	2014-10-20 11:00:18 +00:00
Bob Wilson	1e1f13862e	Use triple predicate functions instead of checking values directly. NFC. llvm-svn: 220155	2014-10-19 00:39:30 +00:00
Aaron Watry	8114437a8f	R600/SI: Add global atomicrmw xchg v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220110	2014-10-17 23:33:03 +00:00
Aaron Watry	d672ee2a47	R600/SI: Add global atomicrmw xor v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220109	2014-10-17 23:33:01 +00:00
Aaron Watry	8a911e6926	R600/SI: Add global atomicrmw or v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220108	2014-10-17 23:32:59 +00:00
Aaron Watry	58c9992f15	R600/SI: Add global atomicrmw min/umin v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220107	2014-10-17 23:32:57 +00:00
Aaron Watry	29f295d7a5	R600/SI: Add global atomicrmw max/umax v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220106	2014-10-17 23:32:56 +00:00
Aaron Watry	621278034c	R600/SI: Add global atomicrmw and v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220105	2014-10-17 23:32:54 +00:00
Aaron Watry	328f1bae8e	R600/SI: Add global atomicrmw sub v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220104	2014-10-17 23:32:52 +00:00
Bill Schmidt	ba637db298	[PowerPC] Change assert to better form llvm-svn: 220092	2014-10-17 21:19:59 +00:00
Matt Arsenault	a708358e93	R600/SI: Remove redundant setting of instruction bits These are all set on the instruction base classes. llvm-svn: 220091	2014-10-17 21:13:11 +00:00
Bill Schmidt	a087d74250	[PowerPC] Change liveness testing in VSX FMA mutation pass With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090	2014-10-17 21:02:44 +00:00
Matt Arsenault	933c38df40	Fix typo llvm-svn: 220068	2014-10-17 18:02:31 +00:00
Matt Arsenault	e184482bf8	R600/SI: Also check for FPImm literal constants llvm-svn: 220067	2014-10-17 18:00:50 +00:00
Matt Arsenault	d282ada508	R600/SI: Allow commuting with source modifiers llvm-svn: 220066	2014-10-17 18:00:48 +00:00
Matt Arsenault	8943d24949	R600/SI: Simplify code with hasModifiersSet llvm-svn: 220065	2014-10-17 18:00:45 +00:00
Matt Arsenault	ace5b76739	R600/SI: Fix general commuting breaking src mods The generic code trying to use findCommutedOpIndices won't understand that it needs to swap the modifier operands also, so it should fail if they are set. llvm-svn: 220064	2014-10-17 18:00:43 +00:00
Matt Arsenault	ffc5d5bbf0	R600/SI: Cleanup code with ChangeToFPImmediate llvm-svn: 220063	2014-10-17 18:00:41 +00:00
Matt Arsenault	6d3cd544bb	R600/SI: Allow comuting fp immediates llvm-svn: 220062	2014-10-17 18:00:39 +00:00
Matt Arsenault	aa5ccfb566	R600/SI: Use early return instead of checking condition twice Any commutable instruction will have at least src1. llvm-svn: 220061	2014-10-17 18:00:37 +00:00
Matt Arsenault	328b1193b5	R600/SI: Use complex pattern for MUBUF load patterns. This eliminates a use of the SI_ADDR64_RSRC pseudo llvm-svn: 220057	2014-10-17 17:43:00 +00:00
Matt Arsenault	83a535ff6b	R600/SI: Remove SI_BUFFER_RSRC pseudo Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056	2014-10-17 17:42:56 +00:00
Andrea Di Biagio	c48cb86f05	[X86] Fix missed selection of non-temporal store of zero vector. When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054	2014-10-17 17:27:06 +00:00
James Molloy	f497d5511d	[AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering. We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051	2014-10-17 17:06:31 +00:00
Bill Schmidt	2d1128acb2	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047	2014-10-17 15:13:38 +00:00
Jan Vesely	54468a5a58	Mips: Only set divrem i64 to custom on 64bit Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220046	2014-10-17 14:45:28 +00:00
Vasileios Kalintiris	238692beb9	[mips] Add support for COP1's Branch-On-Cond-Likely instructions Summary: Depends on D5782 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5802 llvm-svn: 220042	2014-10-17 14:08:28 +00:00
Vasileios Kalintiris	6d1e64896d	[mips] Add support for COP0's Branch-On-Cond-Likely instructions Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5782 llvm-svn: 220036	2014-10-17 12:38:35 +00:00
Akira Hatanaka	0d0c78180d	ARM: Fix a bug which was causing convergence failure in constant-island pass. The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015	2014-10-17 01:31:47 +00:00
Matt Arsenault	bfaab76f6b	R600/SI: Simplify debug printing llvm-svn: 219999	2014-10-17 00:36:20 +00:00
Matt Arsenault	661a031af6	R600/SI: Remove another VALU pattern llvm-svn: 219988	2014-10-16 23:33:37 +00:00
Robin Morisset	e2de06bef6	Erase fence insertion from SelectionDAGBuilder.cpp (NFC) Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain unsound, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957	2014-10-16 20:34:57 +00:00
Matt Arsenault	70c82173f3	R600/SI: Remove unnecessary VALU patterns These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956	2014-10-16 20:31:50 +00:00
Matt Arsenault	a3fe7c62d1	R600: Fix nonsensical implementation of computeKnownBits for BFE This was resulting in invalid simplifications of sdiv llvm-svn: 219953	2014-10-16 20:07:40 +00:00
Rafael Espindola	11aaaeebe0	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Juergen Ributzka	03a0611061	[AArch64] Fix miscompile of sdiv-by-power-of-2. When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934	2014-10-16 16:41:15 +00:00
Vasileios Kalintiris	167c372118	[mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931	2014-10-16 15:41:51 +00:00
Vasileios Kalintiris	711028f718	[mips] Marked the DI/EI instruction aliases as MIPS32r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5751 llvm-svn: 219927	2014-10-16 15:23:52 +00:00
Vasileios Kalintiris	f445a56b61	Test commit access: remove extra new line at the end of file llvm-svn: 219925	2014-10-16 14:37:00 +00:00
Matt Arsenault	f1b34cf6b6	R600: Remove dead function llvm-svn: 219879	2014-10-16 00:08:09 +00:00
Adam Nemet	4285c1f8cc	[AVX512] Add DQ subvector inserts In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874	2014-10-15 23:42:17 +00:00
Adam Nemet	449b3f0931	[AVX512] Two new attributes in X86VectorVTInfo for subvector insert The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871	2014-10-15 23:42:09 +00:00
Adam Nemet	b1c3ef4b60	[AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870	2014-10-15 23:42:04 +00:00
Matt Arsenault	20893b3611	R600: Remove unnecessary part of computeKnownBitsForTargetNode Zero-width BFEs are combined away already, so there's no point in handling them. llvm-svn: 219868	2014-10-15 23:37:49 +00:00
Matt Arsenault	6de7af4242	Move variable down to use llvm-svn: 219867	2014-10-15 23:37:42 +00:00
Tom Stellard	c8d7920ad9	R600/SI: Fix bug where immediates were being used in DS addr operands The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848	2014-10-15 21:08:59 +00:00
Sid Manning	a002296427	Wrong attribute. LLVM_ATTRIBUTE_UNUSED not LLVM_ATTRIBUTE_USED This original fix for the build break was correct. LLVM_ATTRIBUTE_USED removes the warning message because it keeps the function in the object file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used depending on build settings. llvm-svn: 219846	2014-10-15 20:41:17 +00:00
Sid Manning	2ceaeb6baf	Wrong attribute. LLVM_ATTRIBUTE_USED not LLVM_ATTRIBUTE_UNUSED llvm-svn: 219837	2014-10-15 19:32:52 +00:00
Sid Manning	74cd020fca	Add LLVM_ATTRIBUTE_UNUSED to function currently just used in an assert Fixes break when -Wunused-function is used. llvm-svn: 219833	2014-10-15 19:24:14 +00:00
Juergen Ributzka	f82c987a5c	Reapply "[FastISel][AArch64] Add custom lowering for GEPs." This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831	2014-10-15 18:58:07 +00:00
Juergen Ributzka	6780f0f7a0	[FastISel][AArch64] Factor out add with immediate emission into a helper function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830	2014-10-15 18:58:02 +00:00
Sid Manning	12cd21aacd	Enable the instruction printer in HexagonMCTargetDesc This adds the MCInstPrinter to the LLVMHexagonDesc library and removes the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is a prerequisite needed by the disassembler. Phabricator Revision: http://reviews.llvm.org/D5734 llvm-svn: 219826	2014-10-15 18:27:40 +00:00
Matt Arsenault	1a74aff846	R600/SI: Also try to use 0 base for misaligned 8-byte DS loads. llvm-svn: 219823	2014-10-15 18:06:43 +00:00
Matt Arsenault	7b68fdf3c0	R600: Fix miscompiles when BFE has multiple uses SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819	2014-10-15 17:58:34 +00:00
Rafael Espindola	7b61ddfa6e	Simplify handling of --noexecstack by using getNonexecutableStackSection. llvm-svn: 219799	2014-10-15 16:12:52 +00:00
Rafael Espindola	ad33dd2914	Move getNonexecutableStackSection up to the base ELF class. The .note.GNU-stack section is not SystemZ/X86 specific. llvm-svn: 219796	2014-10-15 15:44:16 +00:00
Matt Arsenault	f179420c57	R600: Use existing variable llvm-svn: 219778	2014-10-15 05:07:00 +00:00
Matt Arsenault	7acfddf17c	R600: Remove outdated comment llvm-svn: 219777	2014-10-15 05:06:57 +00:00
Juergen Ributzka	42379d4cf7	Revert "[FastISel][AArch64] Add custom lowering for GEPs." This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776	2014-10-15 04:55:48 +00:00
Tim Northover	e9ff4c29b9	ARM: drop check for triple that's no longer used. Early attempts to support AAPCS bare metal MachO targets based the decision on the CPU being compiled for. This was not a particularly great idea and we've got a better option now, but this check remained. No functional change for any target we care about. llvm-svn: 219767	2014-10-15 01:05:01 +00:00
Eric Christopher	7396cc9978	Remove unused variable. llvm-svn: 219750	2014-10-15 00:09:07 +00:00
Gerolf Hoflehner	5d26d40fc5	[AArch64] Wrong CC access in CSINC-conditional branch sequence This is a follow up to commit r219742. It removes the CCInMI variable and accesses the CC in CSCINC directly. In the case of a conditional branch accessing the CC with CCInMI was wrong. llvm-svn: 219748	2014-10-14 23:55:00 +00:00
Gerolf Hoflehner	a4c96d02a2	[AAarch64] Optimize CSINC-branch sequence Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742	2014-10-14 23:07:53 +00:00
Simon Pilgrim	a798e9ffdf	[X86][SSE] pslldq/psrldq shuffle mask decodes Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738	2014-10-14 22:31:34 +00:00
Tim Northover	cf6ce0c8f7	ARM: remove ARM/Thumb distinction for preferred alignment. Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734	2014-10-14 22:12:17 +00:00
Tim Northover	9a4c043d67	ARM: allow misaligned local variables in Thumb1 mode. There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733	2014-10-14 22:12:14 +00:00
Juergen Ributzka	4dfd590eaa	[FastISel][AArch64] Add custom lowering for GEPs. This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726	2014-10-14 21:41:23 +00:00
Hans Wennborg	f6aafeee60	[x86 asm] allow fwait alias in both At&t and Intel modes (PR21208) Differential Revision: http://reviews.llvm.org/D5741 llvm-svn: 219725	2014-10-14 21:41:17 +00:00
Tim Northover	aa09ac6e83	ARM: set preferred aggregate alignment to 32 universally. Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719	2014-10-14 20:57:26 +00:00
Juergen Ributzka	cd11a2806b	[FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved. Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716	2014-10-14 20:36:02 +00:00
Jan Vesely	e5121f3c10	Reapply "R600: Add new intrinsic to read work dimensions" This effectively reverts revert 219707. After fixing the test to work with new function name format and renamed intrinsic. Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219710	2014-10-14 20:05:26 +00:00
Rafael Espindola	db3f0a24ec	Revert "R600: Add new intrinsic to read work dimensions" This reverts commit r219705. CodeGen/R600/work-item-intrinsics.ll was failing on linux. llvm-svn: 219707	2014-10-14 18:58:04 +00:00
Jan Vesely	86187d231a	R600: Add new intrinsic to read work dimensions v2: Add SI lowering Add test v3: Place work dimensions after the kernel arguments. v4: Calculate offset while lowering arguments v5: rebase v6: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219705	2014-10-14 18:52:07 +00:00
Jan Vesely	df19696374	R600: FMA is VecALU only instruction Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219704	2014-10-14 18:52:04 +00:00
Reed Kotler	d4ea29e6b6	Finish getting Mips fast-isel to match up with AArch64 fast-isel Summary: In order to facilitate use of common code, checking by reviewers of other fast-isel ports, and hopefully to eventually move most of Mips and other fast-isel ports into target independent code, I've tried to get the two implementations to line up. There is no functional code change. Just methods moved in the file to be in the same order as in AArch64. Test Plan: No functional change. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D5692 llvm-svn: 219703	2014-10-14 18:27:58 +00:00
Matt Arsenault	e775f5fe76	R600/SI: Use DS offsets for constant addresses Use 0 as the base address for a constant address, so if we have a constant address we can save moves and form read2/write2s. llvm-svn: 219698	2014-10-14 17:21:19 +00:00
Robert Khasanov	1a77f6664e	[AVX512] Extended avx512_binop_rm to DQ/VL subsets. Added encoding tests. llvm-svn: 219686	2014-10-14 15:13:56 +00:00
Robert Khasanov	545d1b7726	[AVX512] Extended avx512_binop_rm to BW/VL subsets. Added encoding tests. llvm-svn: 219685	2014-10-14 14:36:19 +00:00
Bradley Smith	698e08f4cf	[AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround llvm-svn: 219684	2014-10-14 14:02:41 +00:00
Eric Christopher	7c558cf4d6	Grab the subtarget info off of the MachineFunction rather than indirecting through the TargetMachine. llvm-svn: 219674	2014-10-14 08:44:19 +00:00
Eric Christopher	da84e33791	Use the triple to figure out if this is a darwin target, not the subtarget. llvm-svn: 219673	2014-10-14 08:25:26 +00:00
Hao Liu	3cb826ca10	[AArch64]Select wide immediate offset into [Base+XReg] addressing mode e.g Currently we'll generate following instructions if the immediate is too wide: MOV X0, WideImmediate ADD X1, BaseReg, X0 LDR X2, [X1, 0] Using [Base+XReg] addressing mode can save one ADD as following: MOV X0, WideImmediate LDR X2, [BaseReg, X0] Differential Revision: http://reviews.llvm.org/D5477 llvm-svn: 219665	2014-10-14 06:50:36 +00:00
Eric Christopher	4c67d5a1e3	Include map into the A15SDOptimizer rather than pick it up transitively from the DFAPacketizer via TargetInstrInfo.h. llvm-svn: 219652	2014-10-14 01:13:51 +00:00
Eric Christopher	2a321f74f0	Remove the TargetMachine from DFAPacketizer since it was only being used to grab subtarget specific things that we can grab from the MachineFunction anyhow. llvm-svn: 219650	2014-10-14 01:03:16 +00:00
Reed Kotler	a562b46db7	Make first of several changes to bring up to AArch64 fast-isel style Summary: Make Mips fast-isel track the form of AArch64 where practical. This makes it easier for people to review the code, to borrow similar code, and to see how to eventually move a lot of this target code for fast-isels into target independent code. These are just cosmetic changes. Should be no functional difference. Test Plan: make check test-suite for 4 flavors mips32 r1/r2 , -O0/-O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: aemerson, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5595 llvm-svn: 219633	2014-10-13 21:46:41 +00:00
Filipe Cabecinhas	9d7bd78ffa	Fix a broadcast related regression on the vector shuffle lowering. Summary: Test by Robert Lougher! Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5745 llvm-svn: 219617	2014-10-13 16:16:16 +00:00
Matt Arsenault	3f3a2751e0	R600/SI: Minor cleanup of function llvm-svn: 219616	2014-10-13 15:47:59 +00:00
Yuri Gorshenin	ab1b88ab59	[asan-asm-instrumentation] Follow-up fixes to r219602: asserts are moved into function. llvm-svn: 219610	2014-10-13 11:44:06 +00:00
Renato Golin	16ea8ba3bc	Adds support for the Cortex-A17 to the ARM backend Patch by Matthew Wahab. llvm-svn: 219606	2014-10-13 10:22:19 +00:00
Bradley Smith	f2a801d8ac	[AArch64] Add workaround for Cortex-A53 erratum (835769) Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. This patch implements such work-around in the backend, enabled via the option -aarch64-fix-cortex-a53-835769. The work-around code generation is not enabled by default. llvm-svn: 219603	2014-10-13 10:12:35 +00:00
Yuri Gorshenin	46853b55fa	[asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5599 llvm-svn: 219602	2014-10-13 09:37:47 +00:00
NAKAMURA Takumi	75a0240056	Revert r219584, "[X86] Memory folding for commutative instructions." It broke i686 selfhosting. llvm-svn: 219595	2014-10-13 04:17:34 +00:00
Simon Pilgrim	77ac26d279	[X86] Memory folding for commutative instructions. This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 llvm-svn: 219584	2014-10-12 10:52:55 +00:00
Simon Pilgrim	3c1e1e9498	Test commit access (email fix) Indentation tidyup. llvm-svn: 219577	2014-10-11 20:28:56 +00:00
Benjamin Kramer	3e67db92bc	MC: Bit pack MCSymbolData. On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573	2014-10-11 15:07:21 +00:00
Simon Pilgrim	d89591e0a1	Test commit access Fix comment typo + spelling. llvm-svn: 219572	2014-10-11 14:23:36 +00:00
Reed Kotler	62de6b96b5	Add basic conditional branches in mips fast-isel Summary: Implement the most basic form of conditional branches in Mips fast-isel. Test Plan: br1.ll run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5583 llvm-svn: 219556	2014-10-11 00:55:18 +00:00
Matt Arsenault	61cc9083d0	R600/SI: Change how DS offsets are printed Match SC by using offset/offset0/offset1 and printing in decimal. llvm-svn: 219537	2014-10-10 22:16:07 +00:00
Matt Arsenault	fe0a2e677b	R600/SI: Match read2/write2 stride 64 versions llvm-svn: 219536	2014-10-10 22:12:32 +00:00
Matt Arsenault	410332860d	R600/SI: Add load / store machine optimizer pass. Currently this only functions to match simple cases where ds_read2_* / ds_write2_* instructions can be used. In the future it might match some of the other weird load patterns, such as direct to LDS loads. Currently enabled only with a subtarget feature to enable easier testing. llvm-svn: 219533	2014-10-10 22:01:59 +00:00
Chandler Carruth	38811ccb97	[mips] Actually mark that the default case is unreachable as this switch is over a subset of condition codes. This fixes the -Werror build which warns about use of uninitialized variables in the default case. llvm-svn: 219531	2014-10-10 21:07:03 +00:00
Reed Kotler	1f64ecab79	Implement floating point compare for mips fast-isel Summary: Expand SelectCmp to handle floating point compare Test Plan: fpcmpa.ll run 4 flavors of test-suite, mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5567 llvm-svn: 219530	2014-10-10 20:46:28 +00:00
Matt Arsenault	a39da09eb6	R600/SI: Disable copying of SCC llvm-svn: 219519	2014-10-10 17:44:47 +00:00
Reed Kotler	497311ab99	implement integer compare in mips fast-isel Summary: implement SelectCmp (integer compare ) in mips fast-isel Test Plan: icmpa.ll also ran 4 test-suite flavors mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5566 llvm-svn: 219518	2014-10-10 17:39:51 +00:00
Bill Schmidt	dcce023549	[PowerPC] Reduce names from Power8Vector to P8Vector Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514	2014-10-10 17:21:15 +00:00
Reed Kotler	12f9488e33	Implement floating point to integer conversion in mips fast-isel Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel Test Plan: fpintconv.ll ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5562 llvm-svn: 219511	2014-10-10 17:00:46 +00:00
Benjamin Kramer	2c99e413ba	Reduce double set lookups. NFC. llvm-svn: 219505	2014-10-10 15:32:50 +00:00
Bill Schmidt	cfc4a54a48	[PowerPC] Add feature for Power8 vector extensions The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501	2014-10-10 15:09:28 +00:00
Zoran Jovanovic	98bd58ca33	[mips][microMIPS] Implement ADDIUSP instruction Differential Revision: http://reviews.llvm.org/D5084 llvm-svn: 219500	2014-10-10 14:37:30 +00:00
Zoran Jovanovic	95e14e711d	[mips][microMIPS] Implement JR16 instruction Differential Revision: http://reviews.llvm.org/D5062 llvm-svn: 219498	2014-10-10 14:02:44 +00:00
Zoran Jovanovic	b26f889afa	[mips][microMIPS] Implement ADDIUS5 instruction Differential Revision: http://reviews.llvm.org/D5049 llvm-svn: 219495	2014-10-10 13:45:34 +00:00
Zoran Jovanovic	b39a174f11	ps][microMIPS] Implement JRC instruction Differential Revision: http://reviews.llvm.org/D5045 llvm-svn: 219494	2014-10-10 13:31:18 +00:00
Zoran Jovanovic	6097bad3f8	[mips][microMIPS] Implement JALRS16 instruction Differential Revision: http://reviews.llvm.org/D5027 llvm-svn: 219493	2014-10-10 13:22:28 +00:00
Chandler Carruth	82cc9641f7	Don't use an unqualified 'abs' function call with a builtin type. This is dangerous for numerous reasons. The primary risk here is with floating point or double types where if the wrong header files are included in a strange order this can implicitly convert to integers and then call the C abs function on the integers. There is a secondary risk that even impacts integers where if the namespace the code is written in ever defines an abs overload for types within that namespace the global abs will be hidden. The correct form is to call std::abs or write 'using std::abs' for builtin types (and only the latter is correct in any generic context). I've also added the requisite header to be a bit more explicit here. llvm-svn: 219484	2014-10-10 08:27:19 +00:00
Samuel Antao	1194b8fd40	Fix bug in GPR to FPR moves in PPC64LE. The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem. llvm-svn: 219441	2014-10-09 20:42:56 +00:00
Benjamin Kramer	2c3778dc51	Remove a compiler bug workaround from 2007. The affected versions of gcc are long gone. NFC. llvm-svn: 219433	2014-10-09 19:50:39 +00:00
Matt Arsenault	a52a41be4a	Fix typo llvm-svn: 219429	2014-10-09 19:15:15 +00:00
Tom Stellard	3457a8495a	R600/SI: Legalize CopyToReg during instruction selection The instruction emitter will crash if it encounters a CopyToReg node with a non-register operand like FrameIndex. llvm-svn: 219428	2014-10-09 19:06:00 +00:00
Lang Hames	6aed984e13	[PBQP] Add missing headers from r219421. llvm-svn: 219425	2014-10-09 18:36:59 +00:00
Lang Hames	8f31f448c5	[PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint). This patch removes the PBQPBuilder class and its subclasses and replaces them with a composable constraints class: PBQPRAConstraint. This allows constraints that are only required for optimisation (e.g. coalescing, soft pairing) to be mixed and matched. This patch also introduces support for target writers to supply custom constraints for their targets by overriding a TargetSubtargetInfo method: std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const; This patch should have no effect on allocations. llvm-svn: 219421	2014-10-09 18:20:51 +00:00
Tom Stellard	8dd392e135	R600/SI: Legalize INSERT_SUBREG instructions during PostISelFolding LLVM assumes INSERT_SUBREG will always have register operands, so we need to legalize non-register operands, like FrameIndexes, to avoid random assertion failures. llvm-svn: 219420	2014-10-09 18:09:15 +00:00
Bill Schmidt	cb34fd09cd	[PPC64] VSX indexed-form loads use wrong instruction format The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x incorrectly use the XForm_1 instruction format, rather than the XX1Form instruction format. This is likely a pasto when creating these instructions, which were based on lvx and so forth. This patch uses the correct format. The existing reformatting test (test/MC/PowerPC/vsx.s) missed this because the two formats differ only in that XX1Form has an extension to the target register field in bit 31. The tests for these instructions used a target register of 7, so the default of 0 in bit 31 for XForm_1 didn't expose a problem. For register numbers 32-63 this would be noticeable. I've changed the test to use higher register numbers to verify my change is effective. llvm-svn: 219416	2014-10-09 17:51:35 +00:00
Kevin Qin	72a799a68a	[AArch64] Enable partial & runtime unrolling on cortex-a57. llvm-svn: 219401	2014-10-09 10:13:27 +00:00
Robert Khasanov	d5b14f7994	[AVX512] Extended avx512_binop_rm for AVX512VL subsets. Added avx512_binop_rm_vl multiclass for VL subset Added encoding tests llvm-svn: 219390	2014-10-09 08:38:48 +00:00
Bob Wilson	9868d71ffe	Use triple's isiOS() and isOSDarwin() methods. These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386	2014-10-09 05:43:30 +00:00
Eric Christopher	143f02c47d	Remove unused argument to CreateTargetScheduleState and change the TargetMachine to a TargetSubtargetInfo since everything we wanted is off of that. llvm-svn: 219382	2014-10-09 01:59:35 +00:00
Adam Nemet	3480142ef0	[AVX512] Rename AVX512_masking* to AVX512_maskable* No functional change. This is the current AVX512_maskable multiclass hierarchy: maskable_custom / \ / \ maskable_common maskable_in_asm / \ / \ maskable maskable_3src llvm-svn: 219363	2014-10-08 23:25:39 +00:00
Adam Nemet	47b2d5f1e0	[AVX512] Intrinsics for vextract*x4 This adds the Pat<>'s for the intrinsics. These are necessary because we don't lower these intrinsics to SDNodes but match them directly. See the rational in the previous commit. llvm-svn: 219362	2014-10-08 23:25:37 +00:00
Adam Nemet	2b5cdbb3de	[AVX512] Add asm-only support for vextractx4 masking variants These derive from the new asm-only masking definitions. Unfortunately I wasn't able to find a ISel pattern that we could legally generate for the masking variants. The problem is that since the destination is v4 we would need VK4 register classes and v4i1 value types to express the masking. These are however not legal types/classes in AVX512f but only in VL, so things get complicated pretty quickly. We can revisit this question later if we have a more pressing need to express something like this. So the ISel patterns are empty for the masking instructions and the next patch will add Pat<>s instead to match the intrinsics calls with instructions. llvm-svn: 219361	2014-10-08 23:25:33 +00:00
Adam Nemet	0937723b49	[AVX512] Move DAG for all-zero node to X86VectorVTInfo No functional change. No change in X86.td.expanded except for the appearance of the new attributes. The new attributes will be used in the subsequent patch. llvm-svn: 219360	2014-10-08 23:25:31 +00:00
Adam Nemet	52bb6cfad6	[AVX512] Peel off an asm-only class from AVX512_masking_common. No functional change. This enables the generation of masking instructions that don't provide a ISel pattern. llvm-svn: 219358	2014-10-08 23:25:23 +00:00
Robin Morisset	6f3d04e4b6	[X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slow llvm-svn: 219357	2014-10-08 23:16:23 +00:00
Robin Morisset	f9e8721564	[X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load()) Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 llvm-svn: 219336	2014-10-08 19:38:18 +00:00
Robert Khasanov	b51bb22611	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VPCMP/VPCMPU{BWDQ} Added CMP_MASK_CC intrinsic type. Added tests for intrinsics. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 219316	2014-10-08 15:49:26 +00:00
Robert Khasanov	44241440e1	[AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking. Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. llvm-svn: 219310	2014-10-08 14:37:45 +00:00
Renato Golin	0595a26c25	Emit unaligned access build attribute for ARM Patch by Charlie Turner. llvm-svn: 219301	2014-10-08 12:26:22 +00:00
Renato Golin	bab5ace6aa	Refactor isThumb1Only() && isMClass() into a predicate called isV6M() This must be enforced for all v6M cores, not just the cortex-m0, irregardless of the user-specified alignment. Patch by Charlie Turner. llvm-svn: 219300	2014-10-08 12:26:16 +00:00
Renato Golin	51dc3f4701	Simplify switch statement in ARM subtarget align access This switch can be reduced to a simpler if/else statement. Patch by Charlie Turner. llvm-svn: 219299	2014-10-08 12:26:13 +00:00
Eric Christopher	b17140de35	Cache TargetLowering on SelectionDAGISel and update previous calls to getTargetLowering() with the cached variable. llvm-svn: 219284	2014-10-08 07:32:17 +00:00
Chad Rosier	d9d0f86a79	[AArch64] Generate vector signed/unsigned mul and mla/mls long. Phabricator Revision: http://reviews.llvm.org/D5589 Patch by Balaram Makam <bmakam@codeaurora.org>!! llvm-svn: 219276	2014-10-08 02:31:24 +00:00
Robin Morisset	880580b88f	[X86] Fix a bug with fetch_add(INT32_MIN) Summary: Fix pr21099 The pseudocode of what we were doing (spread through two functions) was: if (operand.doesNotFitIn32Bits()) Opc.initializeWithFoo(); if (operand < 0) operand = -operand; if (operand.doesFitIn8Bits()) Opc.initializeWithBar(); else if (operand.doesFitIn32Bits()) Opc.initializeWithBlah(); doStuff(Opc); So for operand == INT32_MIN, Opc was never initialized because the operand changes from fitting in 32 bits to not fitting, causing the various bugs/error messages noted by pr21099. This patch adds an extra test at the beginning for this case, and an llvm_unreachable to have better error message if the operand ends up not fitting in 32-bits at the end. Test Plan: new test + make check Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5655 llvm-svn: 219257	2014-10-07 23:53:57 +00:00
Tom Stellard	845bb3c2fd	R600/SI: Refactor VOP3 instruction defs llvm-svn: 219256	2014-10-07 23:51:41 +00:00
Tom Stellard	0aec5877b6	R600/SI: Refactor VOPC instruction defs llvm-svn: 219255	2014-10-07 23:51:39 +00:00
Tom Stellard	bec5a249b3	R600/SI: Refactor VOP2 instruction defs llvm-svn: 219254	2014-10-07 23:51:38 +00:00
Tom Stellard	94d2e99ceb	R600/SI: Refactor VOP1 instruction defs llvm-svn: 219253	2014-10-07 23:51:34 +00:00
Matt Arsenault	1f0227a452	R600: Remove dead code llvm-svn: 219242	2014-10-07 21:29:56 +00:00
Tom Stellard	2b8baaa546	R600: Remove some redundant initializations from AMDGPUMCAsmInfo llvm-svn: 219238	2014-10-07 21:09:25 +00:00
Tom Stellard	022802ab37	R600: Use MCAsmInfoELF as AMDGPUMCAsmInfo base class The main reason for this is that the MCAsmInfo class, which we were previously using as the base class, sets PrivateGlobalPrefix to "L", which causes all global functions that start with L to be treated as local symbols. MCAsmInfoELF sets PrivateGlobalPrefix to ".L", which is what we want, and it is probably a good idea to use this as the base class anyway, since we are emitting ELF binaries. llvm-svn: 219237	2014-10-07 21:09:23 +00:00
Tom Stellard	20fa0be97f	R600/SI: Remove assertion in SIInstrInfo::areLoadsFromSameBasePtr() Added a FIXME coment instead, we need to handle the case where the two DS instructions being compared have different numbers of operands. llvm-svn: 219236	2014-10-07 21:09:20 +00:00
Yuri Gorshenin	e8c81fd25a	[asan-asm-instrumentation] CFI directives are generated for .S files. Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 llvm-svn: 219199	2014-10-07 11:03:09 +00:00
Daniel Sanders	f3fe49aac6	[mips] Return {f128} correctly for N32/N64. Summary: According to the ABI documentation, f128 and {f128} should both be returned in $f0 and $f2. However, this doesn't match GCC's behaviour which is to return f128 in $f0 and $f2, but {f128} in $f0 and $f1. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5578 llvm-svn: 219196	2014-10-07 09:29:59 +00:00
Craig Topper	0676b902ad	[X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. llvm-svn: 219194	2014-10-07 07:29:50 +00:00
Craig Topper	273515eb12	Formatting fixes. Most putting 'else' on the same line as the preceding curly brace. llvm-svn: 219193	2014-10-07 07:29:48 +00:00
Craig Topper	abfe07e9fc	Fix filename in header and use C++ version of the C header files. llvm-svn: 219192	2014-10-07 07:29:46 +00:00
Juergen Ributzka	ef3722d8e9	[FastISel][AArch64] Teach the address computation code to also fold sign-/zero-extends. The code already folds sign-/zero-extends, but only if they are arguments to mul and shift instructions. This extends the code to also fold them when they are direct inputs. llvm-svn: 219187	2014-10-07 03:40:06 +00:00
Juergen Ributzka	75b2f34069	[FastISel][AArch64] Teach the address computation to also fold sub instructions. Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186	2014-10-07 03:40:03 +00:00
Juergen Ributzka	42bf665f2b	[FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction." This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185	2014-10-07 03:39:59 +00:00
NAKAMURA Takumi	c62436c60a	ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable] llvm-svn: 219172	2014-10-06 23:48:04 +00:00
Tim Northover	ea964f53c3	ARM: silence unused variable warning llvm-svn: 219128	2014-10-06 17:26:36 +00:00
Tim Northover	8997fedfc6	ARM: remove dead InstPrinting code This instruction form is handled by different AsmOperands now, so the code is completely dead (and wrong anyway). llvm-svn: 219127	2014-10-06 17:10:13 +00:00
Benjamin Kramer	4ba642a2f7	X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't convert them anymore. llvm-svn: 219112	2014-10-06 09:56:40 +00:00
Eric Christopher	3faf2f1e02	Add subtarget caches to aarch64, arm, ppc, and x86. These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106	2014-10-06 06:45:36 +00:00
Chandler Carruth	0927da4583	[x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd. This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090	2014-10-05 22:57:31 +00:00
Benjamin Kramer	77b0e13aba	X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084	2014-10-05 16:14:29 +00:00
Elena Demikhovsky	44bf0637d5	AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q. This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083	2014-10-05 14:11:08 +00:00
Chandler Carruth	acecdc0211	[x86] Fix PR21139, one of the last remaining regressions found in the new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081	2014-10-05 12:07:34 +00:00
Chandler Carruth	9f4d9fa54e	[x86] Teach the new vector shuffle lowering how to lower 128-bit shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079	2014-10-05 11:41:36 +00:00
NAKAMURA Takumi	2a295fd337	HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation] llvm-svn: 219073	2014-10-05 04:54:54 +00:00
NAKAMURA Takumi	431c9d3f1f	HexagonDesc: Update LLVMBuild.txt. llvm-svn: 219071	2014-10-05 04:54:29 +00:00
Benjamin Kramer	4b92c6b8e5	[SystemZ] Make operator bool explicit. NFC. llvm-svn: 219069	2014-10-04 22:44:35 +00:00
Benjamin Kramer	2e52f02864	Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it. llvm-svn: 219068	2014-10-04 22:44:29 +00:00
Benjamin Kramer	c6cc58e703	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
Chandler Carruth	99627bfbff	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Jingyue Wu	4938e271c6	Add fake use to suppress defined-but-unused warnings llvm-svn: 219045	2014-10-04 03:50:10 +00:00
Chandler Carruth	200e87c0c5	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful. It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044	2014-10-04 02:51:03 +00:00
Chandler Carruth	7e26a67ffa	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042	2014-10-04 01:05:48 +00:00
Matt Arsenault	c996175b57	R600/SI: Custom lower f64 -> i64 conversions llvm-svn: 219038	2014-10-03 23:54:56 +00:00
Matt Arsenault	f7c95e3eda	R600: Custom lower [s\|u]int_to_fp for i64 -> f64 llvm-svn: 219037	2014-10-03 23:54:41 +00:00
Matt Arsenault	6cda887776	R600/SI: Fix ftrunc f64 conformance failures. Re-add the tests since they were deleted at some point llvm-svn: 219036	2014-10-03 23:54:27 +00:00
Chandler Carruth	f3e880697a	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033	2014-10-03 22:43:17 +00:00
Chandler Carruth	0adda1e4d4	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Richard Smith	1ed4229f6f	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
Adam Nemet	ff63a2dc51	[ISel] Keep matching state consistent when folding during X86 address match In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009	2014-10-03 20:00:34 +00:00
Tom Stellard	fae1dc8a12	R600: Align functions to 256 bytes llvm-svn: 219002	2014-10-03 19:02:02 +00:00
Benjamin Kramer	e12a6bac32	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Robin Morisset	9098fee690	[Power] Use lwsync for non-seq_cst fences Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995	2014-10-03 18:04:36 +00:00
Hans Wennborg	6a654333c5	MipsAsmParser.cpp: fix VS2012 build llvm-svn: 218991	2014-10-03 17:16:24 +00:00
Hans Wennborg	da47cf46de	HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012 llvm-svn: 218990	2014-10-03 17:02:28 +00:00
Daniel Sanders	ef638fea2d	[mips] Print warning when using register names not available in N32/64 Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989	2014-10-03 15:37:37 +00:00
Sid Manning	40d809399f	Fix build break on Hexagon Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987	2014-10-03 13:59:01 +00:00
Sid Manning	7da3f9acba	Adding skeleton for unit testing Hexagon Code Emission Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986	2014-10-03 13:18:11 +00:00
Chandler Carruth	1964078936	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985	2014-10-03 13:11:13 +00:00
Renato Golin	4e31ae1051	Revert 202433 - Provide a target override for the latest regalloc heuristic That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981	2014-10-03 12:20:53 +00:00
Chandler Carruth	4bf341de3c	[x86] Refactor the element insertion logic in the new vector shuffle lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980	2014-10-03 12:01:55 +00:00
Chandler Carruth	971a560cb8	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977	2014-10-03 11:25:58 +00:00
Chandler Carruth	e91b316266	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974	2014-10-03 10:11:39 +00:00
Eric Christopher	f12e1ab313	constify TargetMachine parameter. llvm-svn: 218934	2014-10-03 00:42:41 +00:00
Eric Christopher	5312afe7e1	constify TargetMachine argument. llvm-svn: 218930	2014-10-03 00:17:59 +00:00
Eric Christopher	a94e592e49	We can grab the options struct from the TargetMachine, no need to pass it down in the constructor. llvm-svn: 218929	2014-10-03 00:10:03 +00:00
Adam Nemet	4dca3ce4b0	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928	2014-10-02 23:18:30 +00:00
Adam Nemet	4e2ef472d2	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927	2014-10-02 23:18:28 +00:00
Adam Nemet	dc87aea176	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926	2014-10-02 23:18:26 +00:00
Hal Finkel	fe3368cb57	[PowerPC] Modern Book-E cores support sync Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923	2014-10-02 22:34:22 +00:00
Robin Morisset	e1ca44bd4c	[Power] Improve the expansion of atomic loads/stores Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922	2014-10-02 22:27:07 +00:00
Juergen Ributzka	99bd3cba8b	[Stackmaps] Make ithe frame-pointer required for stackmaps. Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920	2014-10-02 22:21:49 +00:00
Chandler Carruth	75e182b414	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911	2014-10-02 21:37:14 +00:00
Tilmann Scheller	383b4fff4c	[NVPTX] Remove dead code. Found by the Clang static analyzer. llvm-svn: 218874	2014-10-02 15:12:48 +00:00
Joerg Sonnenberger	f148a6d498	Support padding unaligned data in .text. llvm-svn: 218870	2014-10-02 13:41:42 +00:00
Chandler Carruth	8a16802d46	[x86] Improve and correct how the new vector shuffle lowering was matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we could lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. llvm-svn: 218839	2014-10-01 23:14:28 +00:00
Eric Christopher	f6ed33e7fa	constify the TargetMachine argument used in the subtarget and lowering constructors. llvm-svn: 218832	2014-10-01 21:36:28 +00:00
Sanjay Patel	9ebfbb969d	Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578 Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 llvm-svn: 218822	2014-10-01 21:20:06 +00:00
Eric Christopher	eb6e3bbf47	Now that the optimization level is adjusting the feature string before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817	2014-10-01 21:05:35 +00:00
Eric Christopher	36448af7f5	Rework the PPC TargetMachine so that the non-function specific overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805	2014-10-01 20:38:26 +00:00
Eric Christopher	12f4a78581	constify TargetMachine parameter for X86TargetLowering. llvm-svn: 218804	2014-10-01 20:38:22 +00:00
Sanjay Patel	0e4a83e89c	Don't repeat function/variable name in comment. NFC. llvm-svn: 218791	2014-10-01 19:39:32 +00:00
Tim Northover	5d72c5de02	ARM: allow copying of CPSR when all else fails. As with x86 and AArch64, certain situations can arise where we need to spill CPSR in the middle of a calculation. These should be avoided where possible (MRS/MSR is rather expensive), which ARM is actually better at than the other two since it tries to Glue defs to uses, but as a last ditch effort, copying is better than crashing. rdar://problem/18011155 llvm-svn: 218789	2014-10-01 19:21:03 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Reed Kotler	b9dc248e9e	Add fptrunc to mips fast-sel Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel Test Plan: fptrunc.ll checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: rfuhler Differential Revision: http://reviews.llvm.org/D5553 llvm-svn: 218785	2014-10-01 18:47:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Tom Stellard	79243d9664	R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table llvm-svn: 218776	2014-10-01 17:15:17 +00:00
Toma Tabacu	c4c202a9a7	[mips] Rename emit and parse functions for the .cpload assembler directive. NFC. Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 llvm-svn: 218768	2014-10-01 14:53:19 +00:00
Tom Stellard	3a35d8f4c2	R600/SI: Add a generic pseudo EXP instruction llvm-svn: 218767	2014-10-01 14:44:45 +00:00
Tom Stellard	0c238c2fbe	R600/SI: Add generic pseudo MTBUF instructions llvm-svn: 218766	2014-10-01 14:44:43 +00:00
Tom Stellard	c470c96e6b	R600/SI: Add generic pseudo SMRD instructions llvm-svn: 218765	2014-10-01 14:44:42 +00:00
Oliver Stannard	d4e0a4fd2c	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5 Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763	2014-10-01 13:13:18 +00:00
Chandler Carruth	6c02c031b8	[x86] Fix a few more tiny patterns with the new vector shuffle lowering that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns just right. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're going to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756	2014-10-01 11:14:02 +00:00
Chandler Carruth	048486109b	[x86] Delete some extraneous logic from the new vector shuffle lowering. Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755	2014-10-01 11:13:57 +00:00
Tom Coxon	e493f177ee	[AArch64] Allow access to all system registers with MRS/MSR instructions. The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. llvm-svn: 218753	2014-10-01 10:13:59 +00:00
Asiri Rathnayake	530b3edab6	Add missing natual vector cast. Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751	2014-10-01 09:59:45 +00:00
Oliver Stannard	37e4daab05	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM) The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747	2014-10-01 09:02:17 +00:00
Daniel Sanders	92db6b78f7	[mips] Fix disassembly of [ls][wd]c[23], cache, and pref Fixes PR21015, and PR20993. Patch by Jun Koi llvm-svn: 218745	2014-10-01 08:26:55 +00:00
Sasa Stankovic	7072a7968f	[mips] For indirect calls we don't need $gp to point to .got. Mips linker doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 llvm-svn: 218744	2014-10-01 08:22:21 +00:00
Nick Lewycky	5f75f4ddb9	Fix typo in comment from r218733 llvm-svn: 218739	2014-10-01 03:37:34 +00:00
Chandler Carruth	26cb9b8d2d	[x86] Teach the new vector shuffle lowering to be even more aggressive in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. llvm-svn: 218734	2014-10-01 03:19:43 +00:00
Chandler Carruth	846baf2ca1	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733	2014-10-01 02:25:54 +00:00
Adam Nemet	05d8c8e682	[AVX512] Remove space before \t in AsmStrings. llvm-svn: 218725	2014-10-01 00:41:32 +00:00
Chandler Carruth	b9d3fa1e65	[x86] Teach the new vector shuffle lowering about VBROADCAST and VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. llvm-svn: 218724	2014-10-01 00:41:21 +00:00
Sanjay Patel	8fde95cb2b	Split the estimate() interface into separate functions for each type. NFC. It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698	2014-09-30 20:28:48 +00:00
Juergen Ributzka	c110c0b99a	Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218693	2014-09-30 19:59:35 +00:00
Matt Arsenault	9706978077	R600/SI: Fix printing of clamp and omod No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. llvm-svn: 218692	2014-09-30 19:49:48 +00:00
Matt Arsenault	272c50a1fe	R600/SI: Update VOP3b to not include obsolete operands abs / neg are now part of the srcN_modifiers operands llvm-svn: 218691	2014-09-30 19:49:43 +00:00
Reed Kotler	3ebdcc9ea7	Add numeric extend, trunctate to mips fast-isel Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 llvm-svn: 218681	2014-09-30 16:30:13 +00:00
Tom Coxon	2c13e71728	[AArch64] Remove unnecessary whitespace. (Test commit) llvm-svn: 218680	2014-09-30 16:23:16 +00:00
Robert Khasanov	28a7df0b5f	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670	2014-09-30 12:15:52 +00:00
Robert Khasanov	5aa4445bde	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ} Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669	2014-09-30 11:41:54 +00:00
Robert Khasanov	b25e562d14	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW. Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668	2014-09-30 11:32:22 +00:00
Robert Khasanov	a27c8e0fd9	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ. Added CMP_MASK intrinsic type llvm-svn: 218667	2014-09-30 11:19:50 +00:00
Job Noorman	a9372a2755	Make sure aggregates are properly alligned on MSP430. llvm-svn: 218665	2014-09-30 11:15:44 +00:00
Chandler Carruth	aaf8e03d92	[x86] Revert r218588, r218589, and r218600. These patches were pursuing a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658	2014-09-30 02:52:28 +00:00
Matt Arsenault	06a711dce5	Fix missing C++ mode comment llvm-svn: 218654	2014-09-30 01:05:27 +00:00
Juergen Ributzka	6ac12439d0	[FastISel][AArch64] Fold sign-/zero-extends into the load instruction. The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653	2014-09-30 00:49:58 +00:00

... 10 11 12 13 14 ...

31349 Commits