llvm-project

Commit Graph

Author	SHA1	Message	Date
Bradley Smith	48b93e1f21	[ARM] Add DSP build attribute and extension targeting llvm-svn: 257885	2016-01-15 10:28:25 +00:00
Bradley Smith	42f6e90a43	[ARM] Add new system registers to ARMv8-M Baseline/Mainline llvm-svn: 257884	2016-01-15 10:28:03 +00:00
Bradley Smith	618712df04	[ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline llvm-svn: 257883	2016-01-15 10:27:14 +00:00
Bradley Smith	433c22e35c	[ARM] Add ARMv8-A semaphore/atomic instructions to ARMv8-M Baseline/Mainline llvm-svn: 257882	2016-01-15 10:26:51 +00:00
Bradley Smith	a1189106d5	[ARM] Add B.W and CBZ instructions to ARMv8-M Baseline llvm-svn: 257881	2016-01-15 10:26:17 +00:00
Bradley Smith	519563e371	[ARM] Add SDIV/UDIV instructions to ARMv8-M Baseline llvm-svn: 257880	2016-01-15 10:25:35 +00:00
Bradley Smith	d9a99ce53d	[ARM] Add MOVW/MOVT instructions to ARMv8-M Baseline/Mainline llvm-svn: 257879	2016-01-15 10:25:14 +00:00
Bradley Smith	e26f799422	[ARM] Add ARMv8-M Baseline/Mainline LLVM targeting llvm-svn: 257878	2016-01-15 10:24:39 +00:00
Bradley Smith	4c21cba72b	[ARM] Split out ARMv8-A semaphores and atomics and ARMv7 clrex as separate features llvm-svn: 257877	2016-01-15 10:23:46 +00:00
Jonas Paulsson	5b29e096ac	[SystemZ] Fix bad instruction name SLGBR -> SLBGR Reviewed by Ulrich Weigand llvm-svn: 257874	2016-01-15 07:12:09 +00:00
Pete Cooper	835594e627	Delete MCRelocationInfo::createExprForRelocation. This method has no callers. Also remove X86ELFRelocationInfo.cpp and X86MachORelocationInfo.cpp which only existed to provide an implementation of that method. Ok'd by Rafael and Jim. llvm-svn: 257859	2016-01-15 02:24:12 +00:00
Weiming Zhao	038393bba0	Fix AArch64ConditionOptimizer Summary: This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL. Modifying CMP will have side effect on CSEL. Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16147 llvm-svn: 257844	2016-01-15 00:06:58 +00:00
Krzysztof Parzyszek	0d11212f00	[Hexagon] Use S2_lsr_i_r instead of S2_extractu to obtain upper halfword llvm-svn: 257815	2016-01-14 21:59:22 +00:00
Krzysztof Parzyszek	5337a3e965	[Hexagon] Handle HVX registers in bit simplification llvm-svn: 257811	2016-01-14 21:45:43 +00:00
Rui Ueyama	da00f2fdf4	Update to use new name alignTo(). llvm-svn: 257804	2016-01-14 21:06:47 +00:00
Rafael Espindola	c897cdde70	Handle offsets larger than 32 bits. David Majnemer noticed that it was not obvious what the behavior would be if B.Offset - A.Offset could not fit in an int. llvm-svn: 257803	2016-01-14 21:03:06 +00:00
Rafael Espindola	56cb2734e3	Assert that a cmp function defines a total order. Thanks to David Blaikie for noticing it. llvm-svn: 257796	2016-01-14 20:28:25 +00:00
Krzysztof Parzyszek	237b96132d	[Hexagon] Expand pseudo instruction Insert4 llvm-svn: 257771	2016-01-14 15:37:16 +00:00
Krzysztof Parzyszek	b28ae10a16	[Hexagon] Handle branches with non-mbb operands llvm-svn: 257768	2016-01-14 15:05:27 +00:00
Benjamin Kramer	fc1f7d893e	[ARM] Use the efficient version of BitVector::set and a static_assert. No functional change intended. llvm-svn: 257766	2016-01-14 14:33:04 +00:00
Igor Breger	fc96331d88	AVX512: VMOVDQA32/64 (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16142 llvm-svn: 257749	2016-01-14 07:56:04 +00:00
Ahmed Bougacha	dfc77357a0	[AArch64] Don't assume extractelt constant index when matching shuffle. llvm-svn: 257735	2016-01-14 02:12:30 +00:00
JF Bastien	d1bd129d00	WebAssembly: mark a few new failures A recent change introduced this assertion failure in some corner cases. Repro: mkdir /s/wasm/torture-out ; time /s/wasm/waterfall/src/compile_torture_tests.py --c /s/llvm/out/bin/clang --cxx /s/llvm/out/bin/clang++ --testsuite /s/gcc/gcc/testsuite --fails /s/llvm/llvm/lib/Target/WebAssembly/known_gcc_test_failures.txt --out /s/wasm/torture-out Or look on the wasm integration bot: https://build.chromium.org/p/client.wasm.llvm/console llvm-svn: 257733	2016-01-14 01:49:22 +00:00
David Majnemer	3463e696fb	[X86] Don't alter HasOpaqueSPAdjustment after we've relied on it We rely on HasOpaqueSPAdjustment not changing after we've calculated things based on it. Things like whether or not we can use 'rep;movs' to copy bytes around, that sort of thing. If it changes, invariants in the backend will quietly break. This situation arose when we had a call to memcpy and a COPY of the FLAGS register where we would attempt to reference local variables using %esi, a register that was clobbered by the 'rep;movs'. This fixes PR26124. llvm-svn: 257730	2016-01-14 01:20:03 +00:00
JF Bastien	664fd461c2	WebAssembly: fix build break introduced by ELFObjectWriter churn llvm-svn: 257709	2016-01-13 23:36:00 +00:00
Rafael Espindola	8340f94df1	Convert a few assert failures into proper errors. Fixes PR25944. llvm-svn: 257697	2016-01-13 22:56:57 +00:00
Krzysztof Parzyszek	a61f7da6ba	[Hexagon] Fix the options controlling jump table generation llvm-svn: 257679	2016-01-13 21:43:13 +00:00
Changpeng Fang	c16be00313	AMDGPU/SI: Update ISA version for FIJI llvm-svn: 257666	2016-01-13 20:39:25 +00:00
Dan Gohman	a39ca60126	[WebAssembly] Add an assertion to catch unexpected MCFixupKindInfo flags. llvm-svn: 257657	2016-01-13 19:31:57 +00:00
Dan Gohman	938ff9f0aa	[WebAssembly] MCFixupKindInfo's TargetSize is in bits rather than bytes. llvm-svn: 257655	2016-01-13 19:29:37 +00:00
Hans Wennborg	81efb6b418	Fix struct/class mismatch for MachineSchedContext llvm-svn: 257648	2016-01-13 18:59:45 +00:00
Marek Olsak	46dadbfab2	AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625	2016-01-13 17:23:20 +00:00
Marek Olsak	3c0ebc71f1	AMDGPU/SI: Remove ending s_endpgm from non-void functions Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16035 llvm-svn: 257623	2016-01-13 17:23:12 +00:00
Marek Olsak	8e9cc63bfb	AMDGPU/SI: Add s_waitcnt at the end of non-void functions Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622	2016-01-13 17:23:09 +00:00
Marek Olsak	8a0f335ad6	AMDGPU/SI: Add support for non-void functions Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621	2016-01-13 17:23:04 +00:00
Derek Schuff	9c3bf3187a	[WebAssemly] Invalidate liveness in CFG stackifier WebAssemblyCFGStackify does not track liveness for EXPR_STACK, causing verifier failure if liveness has not already been invalidated. llvm-svn: 257620	2016-01-13 17:10:28 +00:00
Nicolai Haehnle	02c3291566	AMDGPU/SI: Add SI Machine Scheduler Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609	2016-01-13 16:10:10 +00:00
Michael Zuckerman	6b35f460ac	Fixing warning by adding the X86ISD::VROTRI case. Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257607	2016-01-13 15:48:42 +00:00
Krzysztof Parzyszek	a3c5d44437	[Hexagon] Do not insert non-phis before phis in bit simplification llvm-svn: 257606	2016-01-13 15:48:18 +00:00
Michael Zuckerman	0e31b22487	[AVX512] Adding PMOVSXBD/W/Q , PMOVZSDQ and PMOVZSWD/Q Intrinsics . Differential Revision: http://reviews.llvm.org/D16111 llvm-svn: 257604	2016-01-13 14:59:19 +00:00
Michael Zuckerman	43cea85db9	[AVX512] Adding PMOVZXBD/W/Q , PMOVZXDQ and PMOVZXWD/Q Intrinsics Differential Revision:http://reviews.llvm.org/D16071 llvm-svn: 257601	2016-01-13 14:25:21 +00:00
Ulrich Weigand	46ff7ec317	[PowerPC] Fix large code model with the ELFv2 ABI The global entry point prologue currently assumes that the TOC associated with a function is less than 2GB away from the function entry point. This is always true when using the medium or small code model, but may not be the case when using the large code model. This patch adds a new variant of the ELFv2 global entry point prologue that lifts the 2GB restriction when building with -mcmodel=large. This works by emitting a quadword containing the distance from the function entry point to its associated TOC immediately before the entry point, and then using a prologue like: ld r2,-8(r12) add r2,r2,r12 Since creation of the entry point prologue is now split across two separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog code), I've switched to using named labels instead of just temporaries to indicate the locations of the global and local entry points and the new TOC offset data word. These names are provided by new routines in PPCFunctionInfo modeled after the existing PPCFunctionInfo::getPICOffsetSymbol. Note that a corresponding change was committed to GCC here: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html Reviewers: hfinkel Differential Revision: http://reviews.llvm.org/D15500 llvm-svn: 257597	2016-01-13 13:12:23 +00:00
Michael Zuckerman	298a680c80	[AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257594	2016-01-13 12:39:33 +00:00
Marek Olsak	4e99b6ec01	AMDGPU/SI: Allow more shader inputs Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16032 llvm-svn: 257593	2016-01-13 11:46:48 +00:00
Marek Olsak	b6c8c3d165	AMDGPU/SI: Allow any number of PS inputs Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 llvm-svn: 257592	2016-01-13 11:46:10 +00:00
Marek Olsak	fccabaf57e	AMDGPU/SI: Add new target attribute InitialPSInputAddr Summary: This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM. The register assigns VGPR locations to PS inputs, while the ENA register determines whether or not they are loaded. Mesa needs to set some inputs as not-movable, so that a pixel shader prolog binary appended at the beginning can assume where some inputs are. v2: Make PSInputAddr private, because there is never enough silly getters and setters for people to read. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16030 llvm-svn: 257591	2016-01-13 11:45:36 +00:00
Marek Olsak	926c56f50c	AMDGPU/SI: Fix a bug in SIFoldOperands Summary: ret.ll will contain a test for this Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16029 llvm-svn: 257590	2016-01-13 11:44:29 +00:00
Andrey Turetskiy	1ce2c9973f	LEA code size optimization pass (Part 2): Remove redundant LEA instructions. Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA (in the same basic block) which calculates address differing only be a displacement. Works only for -Oz. Differential Revision: http://reviews.llvm.org/D13295 llvm-svn: 257589	2016-01-13 11:30:44 +00:00
James Y Knight	7699494f08	[SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition. AnalyzeBranch on X86 (and, previously, SPARC, which implementation was copied from X86) tries to modify the branches based on block layout (e.g. checking isLayoutSuccessor), when AllowModify is true. The rest of the architectures leave that up to the caller, which can call InsertBranch, RemoveBranch, and ReverseBranchCondition as appropriate. That appears to be the preferred way to do it nowadays. This commit makes SPARC like the rest: replaces AnalyzeBranch with an implementation cribbed from AArch64, and adds a ReverseBranchCondition implementation. Additionally, a test-case has been added (also cribbed from AArch64) demonstrating that redundant branch sequences no longer get emitted. E.g., it used to emit code like this: bne .LBB1_2 nop ba .LBB1_1 nop .LBB1_2: And now emits: cmp %i0, 42 be .LBB1_1 nop llvm-svn: 257572	2016-01-13 04:44:14 +00:00
Ana Pazos	359cab3bb3	Guard fabs to bfc convert with V6T2 flag Summary: BFC instructions are available in ARMv6T2 and above. Reviewers: t.p.northover Subscribers: aemerson Differential Revision: http://reviews.llvm.org/D16076 llvm-svn: 257546	2016-01-13 00:03:35 +00:00
Quentin Colombet	f8e3030794	[ARM] Mark VMOV with immediate: isAsCheapAsMove. VMOVs are not strictly speaking cheap, but they are as expensive as a vector copy (VORR), so we should prefer rematerialization over splitting when it applies. rdar://problem/23754176 llvm-svn: 257545	2016-01-13 00:02:40 +00:00
Derek Schuff	4377e2d713	[WebAssembly] Fix disassembler shared-libs build llvm-svn: 257536	2016-01-12 23:03:40 +00:00
Dan Gohman	0656f5f845	[WebAsssembly] Register the MC register info. llvm-svn: 257525	2016-01-12 21:27:55 +00:00
Michael Zuckerman	2ddcbcf464	[AVX512] adding PROLQ and PROLD Intrinsics Differential Revision: http://reviews.llvm.org/D16048 llvm-svn: 257523	2016-01-12 21:19:17 +00:00
Kyle Butt	cec40806f1	Codegen: [PPC] Handle weighted comparisons when inserting selects. Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle the weighted predicates as well. This latent bug was triggered by r255398, because it added use of the branch-weighted predicates. While here, switch over an enum instead of an int to get the compiler to enforce totality in the future. llvm-svn: 257518	2016-01-12 21:00:43 +00:00
Dan Gohman	4635017176	[WebAssembly] Add a EM_WEBASSEMBLY value, and several bits of code that use it. A request has been made to the official registry, but an official value is not yet available. This patch uses a temporary value in order to support development. When an official value is recieved, the value of EM_WEBASSEMBLY will be updated. llvm-svn: 257517	2016-01-12 20:56:01 +00:00
Dan Gohman	3469ee120c	[WebAssembly] Introduce a WebAssemblyTargetStreamer class. Refactor .param, .result, .local, and .endfunc, as directives, using the proper MCTargetStreamer mechanism, rather than fake instructions. llvm-svn: 257511	2016-01-12 20:30:51 +00:00
Krzysztof Parzyszek	f62d44be28	Replace inherited constructor with an explicit one Some bots failed when the inherited constructor was used. llvm-svn: 257508	2016-01-12 19:27:59 +00:00
Dan Gohman	1d68e80f26	[WebAssembly] Make CFG stackification independent of basic-block labels. This patch changes the way labels are referenced. Instead of referencing the basic-block label name (eg. .LBB0_0), instructions now just have an immediate which indicates the depth in the control-flow stack to find a label to jump to. This makes them much closer to what we expect to have in the binary encoding, and avoids the problem of basic-block label names not being explicit in the binary encoding. Also, it terminates blocks and loops with end_block and end_loop instructions, rather than basic-block label names, for similar reasons. This will also fix problems where two constructs appear to have the same label, because we no longer explicitly use labels, so consumers that need labels will presumably create their own labels, and presumably they won't reuse labels when they do. This patch does make the code a little more awkward to read; as a partial mitigation, this patch also introduces comments showing where the labels are, and comments on each branch showing where it's branching to. llvm-svn: 257505	2016-01-12 19:14:46 +00:00
Krzysztof Parzyszek	1279881315	[Hexagon] Implement RDF-based post-RA optimizations - Handle simple cases of register copies (what current RDF CP allows). - Hexagon-specific dead code elimination: handles dead address updates in post-increment instructions. llvm-svn: 257504	2016-01-12 19:09:01 +00:00
Krzysztof Parzyszek	c09d630e50	RDF: Copy propagation This is a very limited implementation of DFG-based copy propagation. It only handles actual COPY instructions (does not handle other equivalents such as add-immediate with a 0 operand). The major limitation is that it does not update the DFG: that will be the change required to make it more robust (hopefully coming up soon). llvm-svn: 257490	2016-01-12 17:23:48 +00:00
Tom Stellard	f421837250	AMDGPU: Emit note directive for HSA even if there are no functions Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16010 llvm-svn: 257488	2016-01-12 17:18:17 +00:00
Krzysztof Parzyszek	6f4000e763	RDF: Dead code elimination Utility class to perform DFG-based dead code elimination. llvm-svn: 257485	2016-01-12 17:01:16 +00:00
Krzysztof Parzyszek	8dca45efa8	Fix compiler warnings from r257477 llvm-svn: 257483	2016-01-12 16:51:55 +00:00
Krzysztof Parzyszek	acdff46a9c	RDF: Implement register liveness analysis Compute block live-ins and operand kill flags from the DFG. llvm-svn: 257480	2016-01-12 15:56:33 +00:00
Daniel Sanders	5e1d5a789a	[mips] Correct operand order in DSP's mthi/mtlo Summary: The result register is the second operand as per the other mt* instructions. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D15993 llvm-svn: 257478	2016-01-12 15:15:14 +00:00
Krzysztof Parzyszek	b5b5a1d7ad	Register Data Flow: data flow graph Target independent, SSA-based data flow framework for representing data flow between physical registers. This commit implements the creation of the actual data flow graph. llvm-svn: 257477	2016-01-12 15:09:49 +00:00
Benjamin Kramer	ab8cc02ba5	[Hexagon] Make helper function static. NFC. llvm-svn: 257476	2016-01-12 14:58:49 +00:00
Keno Fischer	00021429d4	[ARM] Fix several state persistence bugs Summary: This fixes three bugs, in all of which state is not or incorrecly reset between objects (i.e. when reusing the same pass manager to create multiple object files): 1) AttributeSection needs to be reset to nullptr, because otherwise the backend will try to emit into the old object file's attribute section causing a segmentation fault. 2) MappingSymbolCounter needs to be reset, otherwise the second object file will start where the first one left off. 3) The MCStreamer base class resets the Streamer's e_flags settings. Since EF_ARM_EABI_VER5 is set on streamer creation, we need to set it again after the MCStreamer was rest. Also rename Reset (uppser case) to EHReset to avoid confusion with reset (lower case). Reviewers: rengolin Differential Revision: http://reviews.llvm.org/D15950 llvm-svn: 257473	2016-01-12 13:38:15 +00:00
Andrey Turetskiy	fed110f646	Test commit access - tiny comment and code style fix. llvm-svn: 257472	2016-01-12 13:34:11 +00:00
Robert Lougher	6abd69a60b	The isel pattern that selects the memory-register form of VCVTPH2PS (64 to 128-bit) matches against the pattern fragment 'vzmovl_v2i64' (a zero-extended 64-bit load). However, a change in r248784 teaches the instruction combiner that only the lower 64 bits of the input to a 128-bit vcvtph2ps are used. This means the instruction combiner will ordinarily optimize away the upper 64-bit insertelement instruction in the zero-extension and so we no longer select the memory-register form. To fix this a new pattern has been added. Differential Revision: http://reviews.llvm.org/D16067 llvm-svn: 257470	2016-01-12 11:48:25 +00:00
Igor Breger	ea8e8e9f97	AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16042 llvm-svn: 257463	2016-01-12 10:02:32 +00:00
Dan Gohman	1a42728719	[WebAssembly] Implement a prototype instruction encoder and disassembler. This is using an extremely simple temporary made-up binary format, not the official binary format (which isn't defined yet). llvm-svn: 257440	2016-01-12 03:32:29 +00:00
Dan Gohman	afd7e3ada8	[WebAssembly] Register the MC subtarget info. llvm-svn: 257439	2016-01-12 03:30:06 +00:00
Dan Gohman	a11fb2373c	[WebAssembly] Define OperandTypes for decoding immediate values. llvm-svn: 257438	2016-01-12 03:09:16 +00:00
Dan Gohman	85159ca224	[WebAssembly] Use TSFlags instead of keeping a list of special-case opcodes. llvm-svn: 257433	2016-01-12 01:45:12 +00:00
Manman Ren	ed967f3752	CXX_FAST_TLS calling convention: performance improvement for x86-64. This is the same change on x86-64 as r255821 on AArch64. rdar://9001553 llvm-svn: 257428	2016-01-12 01:08:46 +00:00
Manman Ren	5e9e65e705	CXX_FAST_TLS calling convention: performance improvement for ARM. This is the same change on ARM as r255821 on AArch64. rdar://9001553 llvm-svn: 257424	2016-01-12 00:47:18 +00:00
Manman Ren	1602605bf8	CXX_FAST_TLS calling convention: Add support for ARM on Darwin. rdar://9001553 llvm-svn: 257417	2016-01-11 23:50:43 +00:00
Dan Gohman	26c6765bd6	[WebAssembly] Define WebAssembly-specific relocation codes. Currently WebAssembly has two kinds of relocations; data addresses and function addresses. This adds ELF relocations for them, as well as an MC symbol kind to indicate which type of relocation is needed. llvm-svn: 257416	2016-01-11 23:38:05 +00:00
Dan Gohman	f225a63849	[WebAssembly] Reorganize address offset folding. Always expect tglobaladdr and texternalsym to be wrapped in WebAssemblywrapper nodes. Also, split out a regPlusGA from regPlusImm so that it can special-case global addresses, as they can be folded in more cases. Unfortunately this doesn't enable any new optimizations yet due to SelectionDAG limitations. I'll be submitting changes to the SelectionDAG infrastructure, along with tests, in a separate patch. llvm-svn: 257394	2016-01-11 22:05:44 +00:00
Matt Arsenault	5e0bdb8b95	AMDGPU: Implement {{s\|u}}int_to_fp i64 -> f32 The old lowering for uint_to_fp failed opencl conformance. It might be OK for fast math mode, but I'm not sure. llvm-svn: 257393	2016-01-11 22:01:48 +00:00
Matt Arsenault	800fecf9de	AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target It might be better to let this be a select failure instead. llvm-svn: 257386	2016-01-11 21:18:33 +00:00
Matt Arsenault	5319b0add5	AMDGPU: Fix ctlz combine for sub 32-bit types llvm-svn: 257353	2016-01-11 17:02:06 +00:00
Matt Arsenault	de5fbe9c60	AMDGPU: Pattern match ffbh pattern to instruction. The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. llvm-svn: 257352	2016-01-11 17:02:00 +00:00
Matt Arsenault	f058d67643	AMDGPU: Custom lower i64 ctlz llvm-svn: 257348	2016-01-11 16:50:29 +00:00
Matt Arsenault	a0e5cd55ad	Mips: Remove lowerSELECT_CC This is the same as the default expansion. llvm-svn: 257346	2016-01-11 16:44:48 +00:00
Matt Arsenault	5ca3c72c5a	LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal llvm-svn: 257345	2016-01-11 16:37:46 +00:00
Matt Arsenault	02d45dfeda	AMDGPU: Remove dead target dag combine llvm-svn: 257344	2016-01-11 16:37:40 +00:00
Daniel Sanders	4d32300cfd	[mips] Never select JAL for calls to an absolute immediate address. Summary: It actually takes an offset into the current PC-region. This fixes the 'expr' command in lldb. Reviewers: vkalintiris, jaydeep, bhushan Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D16054 llvm-svn: 257339	2016-01-11 15:57:46 +00:00
Krzysztof Parzyszek	bc17b68a47	[Hexagon] Add check for nullptr in getFixupNoBits llvm-svn: 257338	2016-01-11 15:51:53 +00:00
Krzysztof Parzyszek	f49a8411f8	[Hexagon] Add implicit uses of GP to GP-relative loads and stores llvm-svn: 257337	2016-01-11 15:49:58 +00:00
Krzysztof Parzyszek	b024445444	[Hexagon] Mark D14 and GP as reserved registers llvm-svn: 257336	2016-01-11 15:47:41 +00:00
Alexey Bataev	28f0c5efec	[X86] Reduce complexity of the LEA optimization pass, by Andrey Turetsky. In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time. Differential Revision: http://reviews.llvm.org/D15692 llvm-svn: 257328	2016-01-11 11:52:29 +00:00
Craig Topper	9d2cab7742	[AVX-512] Remove another extra space from the Intel syntax asm strings. llvm-svn: 257304	2016-01-11 01:03:40 +00:00
Craig Topper	9feea57844	[AVX-512] Remove more superfluous spaces from asm strings. llvm-svn: 257301	2016-01-11 00:44:58 +00:00
Craig Topper	156622ad9d	[AVX-512] Remove unused Round and Itinerary from the maskable_cmp multiclasses. They weren't used and there were extra spaces in the asm string to prepare for the concatenations of the round string that wasn't ever used. llvm-svn: 257300	2016-01-11 00:44:56 +00:00
Craig Topper	bfe13ff6ca	[AVX-512] Make spacing between comma and {sae} operand consistent in asm strings. llvm-svn: 257299	2016-01-11 00:44:52 +00:00
Craig Topper	5be407ab27	[X86] Remove extra spaces from MPX instruction asm strings. llvm-svn: 257298	2016-01-11 00:44:46 +00:00
Elena Demikhovsky	542dfcf44c	Optimized instruction sequence for sitofp operation on X86-32 Optimized sitofp i64 %x to double. The current sequence movl %ecx, 8(%esp) movl %edx, 12(%esp) fildll 8(%esp) is replaced with: movd %ecx, %xmm0 movd %edx, %xmm1 punpckldq %xmm1, %xmm0 movq %xmm0, 8(%esp) Differential Revision: http://reviews.llvm.org/D15946 llvm-svn: 257285	2016-01-10 09:41:22 +00:00
Michael Zuckerman	885f61c534	[AVX512] add PRORVQ and PRORVD Intrinsic Differential Revision:http://reviews.llvm.org/D15955 llvm-svn: 257283	2016-01-10 09:16:41 +00:00
Simon Pilgrim	c7bebcbfd8	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur. This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars. llvm-svn: 257266	2016-01-09 20:59:39 +00:00
Simon Pilgrim	2e7a1849c9	[X86][AVX] Add support for i64 broadcast loads on 32-bit targets Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264	2016-01-09 19:59:27 +00:00
Tobias Edler von Koch	ccd3bfc3c8	[Hexagon] Replace a static member variable in HexagonCVIResource (NFC) This creates one instance of TUL per HexagonShuffler, which avoids thread-safety issues with future changes. llvm-svn: 257215	2016-01-08 22:07:25 +00:00
Weiming Zhao	4b3b13d3bc	RBIT Instruction only available for ARMv6t2 and above. Summary: r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse. RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5. Patch by Z. Zheng <zhaoshiz@codeaurora.org> Reviewers: apazos, jmolloy, weimingz Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15932 llvm-svn: 257188	2016-01-08 18:43:41 +00:00
Weiming Zhao	48c033e021	Disable shrink-wrap for Thumb1 Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue. However, shrink-wrap may remove the LR, which causes issues when the function returns. Reviewers: qcolombet, rengolin Subscribers: aemerson, rengolin Differential Revision: http://reviews.llvm.org/D15984 llvm-svn: 257187	2016-01-08 18:37:43 +00:00
Tom Stellard	4c4c72db48	AMDGPU/SI: Emit global variable sizes when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15952 llvm-svn: 257173	2016-01-08 14:50:28 +00:00
Tom Stellard	ad8f5e8111	AMDGPU: Emit functions sizes Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15951 llvm-svn: 257172	2016-01-08 14:50:23 +00:00
Nemanja Ivanovic	2314e83227	Prevent renaming of CR fields in AADB when a CR restore is present This patch corresponds to review: http://reviews.llvm.org/D15930 Moves to and from CR fields depend on shifts/masks that depend on the target/source CR field. Thus, post-ra anti-dep breaking must not later change that CR register assignment. llvm-svn: 257168	2016-01-08 13:09:54 +00:00
Dylan McKay	cc6733aa83	[AVR] Added AVRSelectionDAGInfo header file llvm-svn: 257152	2016-01-08 06:32:27 +00:00
Craig Topper	048e700828	[AVX-512] Remove superfluous spaces from some asm strings. llvm-svn: 257150	2016-01-08 06:09:20 +00:00
Craig Topper	04493fda81	[X86] Don't print the aliased version of CVTSD2SI64rm. This appears to be a mistake I made years ago. llvm-svn: 257149	2016-01-08 06:09:18 +00:00
Craig Topper	29510c0430	[X86] Use \t instead of space after mnemonics in a bunch InstAliases for consistency. llvm-svn: 257148	2016-01-08 06:09:13 +00:00
Kyle Butt	bfcff3856a	Add call sequence start and end for __tls_get_addr This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839. For a PIC TLS variable access in a function, prologue (mflr followed by std and stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but no one saves/restores it. Also added a test for save/restore clobbered registers during calling __tls_get_addr. Patch by Tim Shen llvm-svn: 257137	2016-01-08 02:06:19 +00:00
Dan Gohman	4ef99433aa	[WebAssembly] Minor code cleanups. NFC. llvm-svn: 257131	2016-01-08 01:18:00 +00:00
Dan Gohman	35e4a28947	[WebAssembly] Minor code cleanups. NFC. llvm-svn: 257128	2016-01-08 01:06:00 +00:00
Dan Gohman	8633eedb30	[WebAssembly] Remove an unused def : Pat. WebAssemblyISelLowering.cpp does not wrap jump table nodes inside of WebAssemblywrapper nodes, so this pattern is not currently used. llvm-svn: 257127	2016-01-08 00:50:33 +00:00
Dan Gohman	cceedf79b4	[WebAssembly] Remove unused arguments, unused functions. NFC. llvm-svn: 257125	2016-01-08 00:43:54 +00:00
Eric Christopher	b793230797	Add some testing for thumb1 and thumb2 inline asm immediate constraints and fix a couple of bugs on inspection. Also fixes PR26061. llvm-svn: 257122	2016-01-08 00:34:44 +00:00
JF Bastien	b9ec4c6cea	WebAssembly: use .skip instead of .zero directive .zero is confusing when used with two arguments. Documentation: This directive emits SIZE 0-valued bytes. SIZE must be an absolute expression. This directive is actually an alias for the '.skip' directive so in can take an optional second argument of the value to store in the bytes instead of zero. Using '.zero' in this way would be confusing however. Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353 Hexagon and Sparc do the same, and it's all the same to WebAssembly so let's pick the less confusing of the two. llvm-svn: 257111	2016-01-07 23:18:29 +00:00
JF Bastien	841085c561	WebAssembly: update expected failures, more assert got resolved. llvm-svn: 257098	2016-01-07 21:00:37 +00:00
JF Bastien	d9d2892668	WebAssembly: update expected failures, assert got resolved by r257084. llvm-svn: 257093	2016-01-07 20:07:21 +00:00
Derek Schuff	9bfea27c26	[WebAssembly] Support combining GEP and FrameIndex offsets in memory operand offset field Previously we only supported putting the FI into memory operand offset fields if there was nothing there already. Now combine them. Differential Revision: http://reviews.llvm.org/D15941 llvm-svn: 257084	2016-01-07 18:55:52 +00:00
Dan Gohman	a4730cf0b4	[WebAssembly] Use the default private label prefixes. The MC assembler doesn't like using the empty string as a private label prefix because then it treats all labels as private. This commit reverts back to the default prefix, which is .L, which is common in ELF targets and consistent with the LLVM name mangler. llvm-svn: 257083	2016-01-07 18:49:53 +00:00
Nicolai Haehnle	82fc962c20	AMDGPU/SI: Fold operands with sub-registers Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074	2016-01-07 17:10:29 +00:00
Nicolai Haehnle	3c05d6d3b5	AMDGPU/SI: xnack_mask is always reserved on VI Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073	2016-01-07 17:10:20 +00:00
Michael Zuckerman	3aca221b31	[AVX512] add PSLLW and PSLLV Intrinsic Differential Revision: http://reviews.llvm.org/D15889 llvm-svn: 257070	2016-01-07 16:02:51 +00:00
Nico Weber	4324b9b236	Revert r257055, it caused PR26064. llvm-svn: 257066	2016-01-07 15:01:46 +00:00
Michael Zuckerman	354152d590	[AVX512] add PSRAV Intrinsic Differential Revision: http://reviews.llvm.org/D15856 llvm-svn: 257063	2016-01-07 14:42:20 +00:00
Amjad Aboud	d7cfb48485	Added support for macro emission in dwarf (supporting DWARF version 4). Differential Revision: http://reviews.llvm.org/D15495 llvm-svn: 257060	2016-01-07 14:28:20 +00:00
Michael Zuckerman	a6df006b50	[AVX512] add PSHUFHW and PSHUFLW Intrinsic Differential Revision: http://reviews.llvm.org/D15925 llvm-svn: 257056	2016-01-07 12:35:43 +00:00
Simon Pilgrim	bcc11a059e	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur. Follow up to D15310 llvm-svn: 257055	2016-01-07 11:34:27 +00:00
Dylan McKay	5c96de3ad7	Added AVRTargetObjectFile class and AVR.h llvm-svn: 257049	2016-01-07 10:53:15 +00:00
Simon Pilgrim	83e44c66ae	[X86][SSE} Add INSERTPS as a target shuffle Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS. llvm-svn: 257046	2016-01-07 10:24:19 +00:00
Michael Zuckerman	4a1566827d	[AVX512] add PSHUFD Intrinsic Differential Revision: http://reviews.llvm.org/D15934 llvm-svn: 257044	2016-01-07 09:24:12 +00:00
Tim Northover	bd41cf880c	ARM: support TLS accesses on Darwin platforms Darwin TLS accesses most closely resemble ELF's general-dynamic situation, since they have to be able to handle all possible situations. The descriptors and so on are obviously slightly different though. llvm-svn: 257039	2016-01-07 09:03:03 +00:00
Jonas Paulsson	3939b690f6	[SystemZ] Add hasSideEffects flag on Serialize instruction. Serialize will perform a hardware serialization operation, and is acting as a memory barrier. Therefore it must have the hasSideEffects flag set so it will be treated as a global memory object. Reviewed by Ulrich Weigand llvm-svn: 257036	2016-01-07 07:20:55 +00:00
Craig Topper	68cffb17a0	[X86] Remove superfluous mayLoad flag. The pattern already implies it. llvm-svn: 257035	2016-01-07 06:42:10 +00:00
Craig Topper	79e0ef82e8	[X86] Had hasSideEffects=0 to VBROADCASTI128. llvm-svn: 257034	2016-01-07 06:37:55 +00:00
Craig Topper	04cc5d25c7	[X86] Add OpSize32 to MOVSX32_NOREX instructions to match their other versions. llvm-svn: 257033	2016-01-07 06:37:52 +00:00
Craig Topper	0b165557b2	[X86] Add hasSideEffects=0 and mayLoad=1 to MOVZX64* instructions. While there remove a superfluous _Q from the instruction names. llvm-svn: 257032	2016-01-07 05:57:39 +00:00
Craig Topper	fc678ba944	[X86] STOSQ without a rep prefix doesn't read or write RCX. llvm-svn: 257030	2016-01-07 05:18:49 +00:00
Haicheng Wu	08b9462540	[AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path Allow fadd/fmul to be reassociated in aarch64. llvm-svn: 257024	2016-01-07 04:01:02 +00:00
Dan Gohman	0c6f5ac50a	[WebAssembly] Add -m:e to the target triple. This enables ELF-style name mangling, which primarily means using ".L" for private symbols. llvm-svn: 257020	2016-01-07 03:19:23 +00:00
Simon Pilgrim	bc82dedd26	[X86] Determine if target shuffle can contain zero elements getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult. This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier. I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it. Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261. Differential Revision: http://reviews.llvm.org/D15378 llvm-svn: 256992	2016-01-06 23:24:40 +00:00
Nicolai Haehnle	a61e5a8d4e	AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader Summary: This is admittedly something that you could only run into by manually playing around with shader assembly because the SITypeWriter pass is skipped for compute. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15902 llvm-svn: 256980	2016-01-06 22:01:04 +00:00
Quentin Colombet	eb61e8e6b0	[X86] Correctly model TLS calls w.r.t. frame requirements. TLS calls need the stack frame to be properly set up and this implies that such calls need ADJUSTSTACK_xxx markers. Fixes PR25820. llvm-svn: 256959	2016-01-06 19:09:26 +00:00
Sanjay Patel	ab69e9f497	refactor divrem8 lowering; NFCI The code duplication contributed to PR25754: https://llvm.org/bugs/show_bug.cgi?id=25754 llvm-svn: 256957	2016-01-06 18:47:09 +00:00
Dan Gohman	8f59cf756f	[WebAssembly] Don't use range-based loop for a list that's being modified The first instruction in a block is what the rend() iterator points to, so if it moves, we need to re-evaluate rend() so that we continue to iterate through the rest of the instructions. llvm-svn: 256953	2016-01-06 18:29:35 +00:00
JF Bastien	1dede3f95f	WebAssembly: add missing expected failures exposed by r256890 llvm-svn: 256948	2016-01-06 17:08:56 +00:00
JF Bastien	e6ec487cf7	WebAssembly: add new expected failures exposed by r256890 llvm-svn: 256945	2016-01-06 16:15:51 +00:00
Krzysztof Parzyszek	2d0418e842	[Hexagon] Add system instructions for cache manipulation llvm-svn: 256936	2016-01-06 14:22:22 +00:00
Artyom Skrobov	51f2d11be9	PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 result Reviewers: spatel, srking Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15331 llvm-svn: 256924	2016-01-06 09:41:10 +00:00
Simon Pilgrim	267163e713	[X86][SSE] There is no zmm addsubpd/addsubps instruction. Replace the assert in combineShuffleToAddSub with an early out. llvm-svn: 256922	2016-01-06 09:08:49 +00:00
Simon Pilgrim	eaabd64a11	[X86][SSE] An empty target shuffle mask is always a failure. As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure. llvm-svn: 256921	2016-01-06 08:59:32 +00:00
Craig Topper	1b94d9a3cc	[X86] Use PS instead of TB for instructions that have PD/XS/XD variations. Use OpSize32 on an instruction that has an OpSize16 variant. llvm-svn: 256918	2016-01-06 06:18:41 +00:00
Craig Topper	275600390f	[X86] Fix an incorrect usage of In32BitMode that should have been Not64BitMode. llvm-svn: 256917	2016-01-06 06:18:37 +00:00
Philip Reames	c86ed0055d	Extract helper function to merge MemoryOperand lists [NFC] In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder. I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface. Differential Revision: http://reviews.llvm.org/D15757 llvm-svn: 256909	2016-01-06 04:39:03 +00:00
Junmo Park	3a40237c03	Delete trailing whitespace; NFC llvm-svn: 256908	2016-01-06 03:53:36 +00:00
Junmo Park	3ec882feed	Delete trailing whitespace; NFC llvm-svn: 256906	2016-01-06 03:41:30 +00:00
Nicolai Haehnle	6035504ab3	AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870	2016-01-05 20:42:49 +00:00
David Majnemer	861a0ae349	[X86] Determine if we have an OpaqueSPAdjustment earlier We queried hasFP before we hit ExpandISelPseudos. ExpandISelPseudos manipulated state that hasFP relied on, potentially changing the result after it has been queried elsewhere. While I am not aware of any particular bug due to this state of affairs, it seems best to avoid it entirely by changing the state during DAG construction. llvm-svn: 256849	2016-01-05 17:46:36 +00:00
Michael Zuckerman	5cbae95916	[AVX512] add PSLLD and PSLLQ Intrinsic Differential Revision: http://reviews.llvm.org/D15885 llvm-svn: 256840	2016-01-05 15:17:39 +00:00
MinSeong Kim	a7385ebf78	[AArch64] Add support for Samsung Exynos-M1 Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A). Differential Revision: http://reviews.llvm.org/D15663 llvm-svn: 256828	2016-01-05 12:51:59 +00:00
Junmo Park	3b8c715b2f	Remove extra whitespace. NFC. llvm-svn: 256820	2016-01-05 09:36:47 +00:00
Simon Pilgrim	d47ac60f00	[X86][SSE] Merge PerformBLENDICombine into PerformShuffleCombine PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward. llvm-svn: 256819	2016-01-05 09:12:17 +00:00
Craig Topper	e00bffbc13	[X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. It was only needed for rematerialization. llvm-svn: 256818	2016-01-05 07:44:14 +00:00
Craig Topper	9583f51348	[X86] Add OpSize32 to OR32mrLocked instruction to match the normal OR32mr instruction. llvm-svn: 256817	2016-01-05 07:44:11 +00:00
Craig Topper	ad2ce36be0	[AVX512] Add hasSideEffects=0 to kunpck instructions since they lack a pattern in their instructions. llvm-svn: 256816	2016-01-05 07:44:08 +00:00
Matt Arsenault	905042774d	AMDGPU: Remove redundant let mayLoad = 1 This is already set on the SMRD format class. llvm-svn: 256813	2016-01-05 04:50:28 +00:00
Tom Stellard	5cd09ade38	AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions for HSA Summary: This fixes a regression caused by r256282. Reviewers: arsenm, cfang Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15736 llvm-svn: 256810	2016-01-05 03:40:16 +00:00
David Majnemer	869be0a4a6	Revert "[X86] Use push-pop for materializing small constants under 'minsize'" The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808	2016-01-05 02:32:06 +00:00
Tom Stellard	2c82ee60c3	AMDGPU/SI: Consolidate FLAT patterns Summary: We had to sets of identical FLAT patterns one inside the HasFlatAddressSpace predicate and one inside the useFlatForGloabl predicate. This patch merges these sets into a single pattern under the isCIVI predicate. The reason we can remove the predicates is that when MUBUF instructions are legal, the instruction selector will prefer selecting those over FLAT instructions because MUBUF patterns have a higher complexity score. So, in this case having patterns for FLAT instructions will have no effect. This change also simplifies the process for forcing global address space loads to use FLAT instructions, since we no only have to disable the MUBUF patterns instead of having to disable the MUBUF patterns and enable the FLAT patterns. Reviewers: arsenm, cfang Subscribers: llvm-commits llvm-svn: 256807	2016-01-05 02:26:37 +00:00
Matthias Braun	7e762e4f9c	MachineInstrBundle: Fix reversed isSuperRegisterEq() call Unfortunately this fix had the effect of exposing the -verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for which I disabled it for now. Two testcases also have additional pushq/popq where the corrected code cannot prove that %rax is dead any longer. Looking at the examples, this could potentially be fixed by improving computeRegisterLiveness() to check the live-in lists of the successors blocks when reaching the end of a block. This fixes http://llvm.org/PR25951. llvm-svn: 256799	2016-01-05 00:45:35 +00:00
Nicolai Haehnle	5b50497617	AMDGPU: add +xnack feature Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794	2016-01-04 23:35:53 +00:00
Simon Pilgrim	e6955f3211	[X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct input type llvm-svn: 256782	2016-01-04 21:41:11 +00:00
Tom Stellard	3da5672755	AMDGPU/SI: Move VI SMEM pattern back into VIInstructions.td Summary: This was accidently moved to CIInstructions.td in r256282 Reviewers: cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15763 llvm-svn: 256775	2016-01-04 20:23:10 +00:00
Geoff Berry	9e934b0cc2	[AArch64] Optimize some simple TBZ/TBNZ cases. Summary: Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases: (tbz (and x, m), b) -> (tbz x, b) (tbz (shl x, c), b) -> (tbz x, b-c) (tbz (shr x, c), b) -> (tbz x, b+c) (tbz (xor x, -1), b) -> (tbnz x, b) Reviewers: jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15702 llvm-svn: 256765	2016-01-04 18:55:47 +00:00
Nicolai Haehnle	e705aadd67	AMDGPU: Avoid assertions after SGPR spilling failed Summary: The comment explains it: emitError does not necessarily exit the compilation process, and then using NoRegister leads to assertions later on. This generates incorrect code, of course, but the user should know to not use the result when an error has been emitted. It would be nice to have a test-case for this inside the LLVM repository, but llc exits on error. shader-db tests trigger the underlying issue at least on Tonga. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15826 llvm-svn: 256757	2016-01-04 15:50:01 +00:00
Michael Zuckerman	cf0b6db9ef	[AVX512] add PSRAD and PSRAQ Intrinsic Differential Revision: http://reviews.llvm.org/D15851 llvm-svn: 256754	2016-01-04 13:45:45 +00:00
Michael Zuckerman	000fca44a8	[AVX512] add PSRAW Intrinsic Differential Revision: http://reviews.llvm.org/D15850 llvm-svn: 256751	2016-01-04 12:50:36 +00:00
Michael Zuckerman	068bc2f219	[AVX512] add PSRLV Intrinsic Differential Revision: http://reviews.llvm.org/D15838 llvm-svn: 256747	2016-01-04 11:39:06 +00:00
David Majnemer	ca1c9f074f	[X86] Make hasFP constant time We need a frame pointer if there is a push/pop sequence after the prologue in order to unwind the stack. Scanning the instructions to figure out if this happened made hasFP not constant-time which is a violation of expectations. Let's compute this up-front and reuse that computation when we need it. llvm-svn: 256730	2016-01-04 04:49:41 +00:00
Craig Topper	e30b8ca149	Use std::is_sorted and std::none_of instead of manual loops. NFC llvm-svn: 256719	2016-01-03 19:43:40 +00:00
Dimitry Andric	227b928abc	Fix several accidental DOS line endings in source files Summary: There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings. There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those. Reviewers: joerg, aaron.ballman Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D15848 llvm-svn: 256707	2016-01-03 17:22:03 +00:00
David Majnemer	011980cd50	[X86] Add intrinsics for reading and writing to the flags register LLVM's targets need to know if stack pointer adjustments occur after the prologue. This is needed to correctly determine if the red-zone is appropriate to use or if a frame pointer is required. Normally, LLVM can figure this out very precisely by reasoning about the contents of the MachineFunction. There is an interesting corner case: inline assembly. The vast majority of inline assembly which will perform a push or pop is done so to pair up with pushf or popf as appropriate. Unfortunately, this inline assembly doesn't mark the stack pointer as clobbered because, well, it isn't. The stack pointer is decremented and then immediately incremented. Because of this, LLVM was changed in r256456 to conservatively assume that inline assembly contain a sequence of stack operations. This is unfortunate because the vast majority of inline assembly will not end up manipulating the stack pointer in any way at all. Instead, let's provide a more principled solution: an intrinsic. FWIW, other compilers (MSVC and GCC among them) also provide this functionality as an intrinsic. llvm-svn: 256685	2016-01-01 06:50:01 +00:00
Craig Topper	74658dfaad	[X86] Remove a return after llvm_unreachable. llvm-svn: 256681	2015-12-31 22:40:48 +00:00
Craig Topper	69653af748	[X86] Move shuffle decoding for constant pool into the X86CodeGen library to remove a layering violation in the Util library. llvm-svn: 256680	2015-12-31 22:40:45 +00:00
Michael Zuckerman	0dc468880d	[AVX512] add PSRLQ and PSRLD Intrinsic Differential Revision: http://reviews.llvm.org/D15770 llvm-svn: 256673	2015-12-31 15:22:04 +00:00
Michael Kuperstein	d36e24a166	[X86] Avoid folding scalar loads into unary sse intrinsics Not folding these cases tends to avoid partial register updates: sqrtss (%eax), %xmm0 Has a partial update of %xmm0, while movss (%eax), %xmm0 sqrtss %xmm0, %xmm0 Has a clobber of the high lanes immediately before the partial update, avoiding a potential stall. Given this, we only want to fold when optimizing for size. This is consistent with the patterns we already have for some of the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl() Differential Revision: http://reviews.llvm.org/D15741 llvm-svn: 256671	2015-12-31 09:45:16 +00:00
Asaf Badouh	af6569afd2	[X86][PKU] Add {RD,WR}PKRU intrinsics Differential Revision: http://reviews.llvm.org/D15808 llvm-svn: 256670	2015-12-31 08:31:13 +00:00
Craig Topper	fd2c6a3be0	[TableGen] Modify the AsmMatcherEmitter to only apply the table growth from r252440 to the Hexagon target. This restores the previous behavior of not including the mnemonic in the classes table for every target that starts instruction lines with the mnemonic. Not only did the table size increase by 1 entry, but the class enum increased in size which caused every class in the array to increase in size. It also grew the size of the function that parsers tokens into classes by a substantial amount. This adds a new HasMnemonicFirst flag to all AsmParsers. It's set to 1 by default and Hexagon target overrides it to 0. For the X86 target alone this recovers 324KB of size on the llvm-mc executable. I believe the current state is still a bad design choice for the Hexagon target as it causes most of the parsing to do a linear search through the entire match table to comparing operands against every instruction until it finds one that works. At least for the other targets we do a binary search based on mnemonic over which to do the linear scan. llvm-svn: 256669	2015-12-31 08:18:23 +00:00
Sanjay Patel	4104f78640	use range-based for-loops; NFCI llvm-svn: 256573	2015-12-29 19:14:23 +00:00
Michael Zuckerman	80821ee77c	[AVX512] add PSRLW Intrinsic Differential Revision: http://reviews.llvm.org/D15751 llvm-svn: 256558	2015-12-29 13:04:35 +00:00
Craig Topper	d270501a6e	[TableGen] Remove MnemonicContainsDot from AsmParser. It isn't used. NFC llvm-svn: 256542	2015-12-29 07:03:30 +00:00
Craig Topper	3294966ed7	[X86] Remove declaration of ATTAsmParser. Its equivalent to the DefaultAsmParser. NFC llvm-svn: 256541	2015-12-29 07:03:27 +00:00
Artyom Skrobov	2aca0c622a	[Thumb] Fix assembler error 'cannot honor width suffix pop {lr}' Summary: * avoid generating POP {LR} in Thumb1 epilogues * combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass * combine POP {LR} + B + BX LR -> POP {PC} on v5T+ Test cases by Ana Pazos Differential Revision: http://reviews.llvm.org/D15707 llvm-svn: 256523	2015-12-28 21:40:45 +00:00
Sanjay Patel	b3c53e512f	[x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 http://reviews.llvm.org/rL256510 llvm-svn: 256522	2015-12-28 21:16:55 +00:00
Elena Demikhovsky	5494698828	Implemented cost model for masked gather and scatter operations The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519	2015-12-28 20:10:59 +00:00
Sanjay Patel	9da2b647c7	[x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 llvm-svn: 256510	2015-12-28 19:20:19 +00:00
Sanjay Patel	cc4c71b4fb	tidy up; NFC llvm-svn: 256506	2015-12-28 18:18:22 +00:00
Roman Divacky	73fc84761f	Support clrex instruction on ARMv6k. Patch by Andrew Turner. llvm-svn: 256505	2015-12-28 17:47:23 +00:00
Michael Kuperstein	2ea81baf3a	[X86] Better support for the MCU psABI (LLVM part) This adds support for the MCU psABI in a way different from r251223 and r251224, basically reverting most of these two patches. The problem with the approach taken in r251223/4 is that it only handled libcalls that originated from the backend. However, the mid-end also inserts quite a few libcalls and assumes these use the platform's default calling convention. The previous patch tried to insert inregs when necessary both in the FE and, somewhat hackily, in the CG. Instead, we now define a new default calling convention for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does. Differential Revision: http://reviews.llvm.org/D15054 llvm-svn: 256494	2015-12-28 14:39:21 +00:00
Alexander Kornienko	175a7cbf3f	Refactor: Simplify boolean conditional return statements in lib/Target/PowerPC Summary: Use clang-tidy to simplify boolean conditional return statements Reviewers: uweigand, rafael, wschmidt Subscribers: craig.topper, llvm-commits Patch by Richard Thomson! Differential Revision: http://reviews.llvm.org/D9984 llvm-svn: 256493	2015-12-28 13:38:42 +00:00
Asaf Badouh	fba562004b	[X86][AVX512] Lower broadcast sub vector to vector inrtrinsics lower broadcast<type>x<vector> to shuffles. there are two cases: 1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0. 2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op). Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256490	2015-12-28 08:26:26 +00:00
Asaf Badouh	5546f51011	[X86][AVX512] add fp scalar broadcast intrinsics Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256489	2015-12-28 08:09:25 +00:00
Craig Topper	401675ce5b	[AVX512] Remove VEX_LIG from vmovd/vmovq instructions. From what I can tell from the Intel docs these instructions require the L-bit to be 0. llvm-svn: 256486	2015-12-28 06:32:47 +00:00
Craig Topper	af88afb214	[AVX512] Fix some places that used FR64 instead of FR64X. llvm-svn: 256484	2015-12-28 06:11:45 +00:00
Craig Topper	c648c9b92d	[AVX512] Bring vmovq instructions names into alignment with the AVX and SSE names. Add a missing encoding to disassembler and assembler. I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there. llvm-svn: 256483	2015-12-28 06:11:42 +00:00
Craig Topper	b4c56624eb	[X86] Move address for store target from outs to ins on a couple instructions. llvm-svn: 256482	2015-12-28 06:11:39 +00:00
Craig Topper	cd4621a8ab	[X86] Add proper Uses/Defs/mayLoad flags for AAA/AAD/AAM/AAS/DAA/DAS/XLAT instructions. llvm-svn: 256481	2015-12-28 06:11:37 +00:00
Craig Topper	f3ed5c115c	[AVX512] Remove separate instruction and patterns for lowering ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering. llvm-svn: 256477	2015-12-27 21:33:50 +00:00
Craig Topper	c48fa89e44	[AVX512] Remove alternate data type versions of VALIGND, VALIGNQ, VMOVSHDUP and VMOVSLDUP. They don't have any tests and I don't think they can be selected. If they are truly needed they should be implemented with patterns against the normal instructions and not separate instructions. llvm-svn: 256475	2015-12-27 19:45:21 +00:00
Igor Breger	756c289dd8	AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE. Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470	2015-12-27 13:56:16 +00:00
Asaf Badouh	b0d91fa42a	[X86][AVX512] change broadcast to use maskable pattern Differential Revision: http://reviews.llvm.org/D15786 llvm-svn: 256469	2015-12-27 12:14:34 +00:00
Craig Topper	f8423c05ee	[AVX-512] Remove alernate integer forms for VPERMILPS and VPERMILPD. There no tests for them and I don't see any way to select them anyway. If they are really needed they should be implemented as patterns and not full fledged instructions. llvm-svn: 256462	2015-12-27 06:55:08 +00:00
David Majnemer	334676355a	[X86, Win64] Use a frame pointer if pushf is emitted A frame pointer must be used if stack pointer is modified after the prologue. LLVM will emit pushf/popf if we need to save/restore the FLAGS register, requiring us to have a frame pointer for the function. There is a small twist: this sequence might exist in user code via inline-assembly. For now, conservatively assume that such functions require a frame pointer. For real world justification, please see clang's implementation of __readeflags. This fixes PR25945. llvm-svn: 256456	2015-12-27 06:07:26 +00:00
Sanjay Patel	bcff3f7d92	[x86] lower calls to llvm.maxnum.v4f32 using maxps This is a follow-on to: http://reviews.llvm.org/rL255700 llvm-svn: 256454	2015-12-26 21:44:55 +00:00
Craig Topper	5ce29aa307	[X86] Fix an unused variable warning in released builds. llvm-svn: 256453	2015-12-26 20:13:33 +00:00
Craig Topper	7e3ba15529	[X86] Add support for printing shuffle comments for AVX512 PSHUFB instructions. llvm-svn: 256452	2015-12-26 19:48:43 +00:00
Craig Topper	fa5f35e6ad	[X86] Fold some variable declarations and initializations into if statements. NFC llvm-svn: 256451	2015-12-26 19:48:37 +00:00
Craig Topper	d400019447	[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct. llvm-svn: 256433	2015-12-26 04:50:07 +00:00
Craig Topper	53bd5cac86	[X86] Fix copy and paste typo from pasting from another Makefile to restore code. llvm-svn: 256431	2015-12-25 23:27:57 +00:00
Craig Topper	96c985169b	[X86] Put back the include path to the main X86 sources in the AsmParser library to fix the bots. llvm-svn: 256430	2015-12-25 22:22:16 +00:00
Craig Topper	95e5596228	[X86] Remove X86CodeGen dependency from the AsmParser library. llvm-svn: 256429	2015-12-25 22:10:11 +00:00
Craig Topper	c0453e87dc	[X86] Move getX86SubSuperRegisterOrZero to X86MCTargetDesc.cpp so it can be used by AsmParser library without depending on X86CodeGen library. llvm-svn: 256428	2015-12-25 22:10:08 +00:00
Craig Topper	daf2e3ff7a	Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC llvm-svn: 256427	2015-12-25 22:10:01 +00:00
Craig Topper	c7277d9485	[X86] Move AVX512 STATIC_ROUNDING enum to X86BaseInfo.h to fix a layering violation in AsmParser. llvm-svn: 256426	2015-12-25 22:09:49 +00:00
Craig Topper	91dab7baee	[X86] Replace MVT::SimpleValueType in the AsmParser library and getX86SubSuperRegister with just an unsigned representing size. This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types. llvm-svn: 256425	2015-12-25 22:09:45 +00:00
Craig Topper	2c7d7c2584	[X86] Don't pass the default value to the High argument of getX86SubSuperRegister. Most place don't care about this argument. NFC llvm-svn: 256424	2015-12-25 19:44:16 +00:00
Craig Topper	d59bc5188d	[X86] getX86SubSuperRegisterOrZero shouldn't call getX86SubSuperRegister recursively. It should call itself instead. Otherwise it might fire an assertion when it was designed not too. llvm-svn: 256422	2015-12-25 17:07:32 +00:00
Craig Topper	3453a43da9	[X86] Add missing X86II::MRM_C4, MRM_C5, etc. encodings to getMemoryOperandNo. These aren't used by any instructions, but could be someday. NFC llvm-svn: 256421	2015-12-25 17:07:30 +00:00
Craig Topper	f804af209d	[X86] Use assert instead of if and llvm_unreachable. NFC llvm-svn: 256420	2015-12-25 17:07:27 +00:00
Craig Topper	3fb423ef8b	[X86] Minor identation fixes. NFC llvm-svn: 256419	2015-12-25 17:07:24 +00:00
Dan Gohman	8887d1faed	[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify. Move RegStackify after coalescing and teach it to use LiveIntervals instead of depending on SSA form. This avoids a problem where a register in a COPY instruction is stackified and then subsequently coalesced with a register that is not stackified. This also puts it after the scheduler, which allows us to simplify the EXPR_STACK constraint, as we no longer have instructions being reordered after stackification and before coloring. llvm-svn: 256402	2015-12-25 00:31:02 +00:00
Marina Yatsina	8dfd5cbb73	[X86][ms-inline asm] Add support for memory operands that include structs Add ability to reference struct symbols in memory operands. Test case will be added on the clang side (review http://reviews.llvm.org/D15749) Differential Revision: http://reviews.llvm.org/D15748 llvm-svn: 256381	2015-12-24 12:09:51 +00:00
Asaf Badouh	9a5a83a518	[X86][PKU] Add {RD,WR}PKRU encoding Differential Revision: http://reviews.llvm.org/D15711 llvm-svn: 256366	2015-12-24 08:25:00 +00:00
Elena Demikhovsky	9e225a2f52	AVX-512: Kreg set 0/1 optimization The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365	2015-12-24 08:12:22 +00:00
Igor Breger	268f6f53c5	AVX512: VPMOVM2B/W/D/Q intrinsic implementation. Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364	2015-12-24 07:11:53 +00:00
Matt Arsenault	4339b3ff35	AMDGPU: Fix getRegisterBitWidth for vectors llvm-svn: 256362	2015-12-24 05:14:55 +00:00
Tom Stellard	5ebdfbe562	AMDGPU/SI: Fix encoding of flat instructions on VI Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15735 llvm-svn: 256360	2015-12-24 03:18:18 +00:00
Tom Stellard	668f793049	AMDGPU/SI: Remove non-existent flat instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15734 llvm-svn: 256357	2015-12-24 02:41:55 +00:00
Simon Pilgrim	17377bdd45	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332	2015-12-23 13:10:07 +00:00
Igor Breger	7b46b4e798	AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar immidiate shift v64i8 .Fix predicate for AVX1/2 shifts. Differential Revision: http://reviews.llvm.org/D15713 llvm-svn: 256324	2015-12-23 08:06:50 +00:00
Dan Gohman	08d58bcf6a	[WebAssembly] Add a TODO comment for a possible future optimization. llvm-svn: 256306	2015-12-23 00:22:04 +00:00
Dan Gohman	a2b2cdc813	[WebAssembly] Trim unneeded #includes. NFC. llvm-svn: 256301	2015-12-22 23:45:21 +00:00
Dan Gohman	cc38ba1954	[WebAssembly] Minor code simplification. NFC. llvm-svn: 256300	2015-12-22 23:39:16 +00:00
Changpeng Fang	b41574a961	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282	2015-12-22 20:55:23 +00:00
Rafael Espindola	4b0d24c00a	Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA" This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275	2015-12-22 19:46:44 +00:00
Changpeng Fang	9b8a9be058	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273	2015-12-22 19:32:28 +00:00
Cong Hou	e93b8e1539	[BPI] Replace weights by probabilities in BPI. This patch removes all weight-related interfaces from BPI and replace them by probability versions. With this patch, we won't use edge weight anymore in either IR or MC passes. Edge probabilitiy is a better representation in terms of CFG update and validation. Differential revision: http://reviews.llvm.org/D15519 llvm-svn: 256263	2015-12-22 18:56:14 +00:00
Jun Bum Lim	6755c3bc5f	[AArch64] Promote loads from stored This is a recommit of r256004 which was reverted in r256160. The issue was the incorrect promotion for half and byte loads transformed into mov instructions. This fix will replace half and byte type loads only with bit field extracts. Original commit message: This change promotes load instructions which directly read from stored by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256249	2015-12-22 16:36:16 +00:00
Asaf Badouh	13ffa4bf7c	[X86][AVX512] Add rcp14 and rsqrt14 intrinsics Differential Revision: http://reviews.llvm.org/D15414 llvm-svn: 256237	2015-12-22 11:40:04 +00:00
Dylan McKay	751a449e2f	[AVR] Added configuration file and machine function information class This commit adds the 'AVRMachineFunctionInfo' class, which simply stores basic properties about generated machine functions. llvm-svn: 256213	2015-12-21 23:13:15 +00:00
Eric Christopher	213a5daab7	Fix line endings after r256155. NFC. llvm-svn: 256211	2015-12-21 23:04:27 +00:00
David Majnemer	03e2cc3007	[MC, COFF] Support link /incremental conditionally Today, we always take into account the possibility that object files produced by MC may be consumed by an incremental linker. This results in us initialing fields which vary with time (TimeDateStamp) which harms hermetic builds (e.g. verifying a self-host went well) and produces sub-optimal code because we cannot assume anything about the relative position of functions within a section (call sites can get redirected through incremental linker thunks). Let's provide an MCTargetOption which controls this behavior so that we can disable this functionality if we know a-priori that the build will not rely on /incremental. llvm-svn: 256203	2015-12-21 22:09:27 +00:00
Cong Hou	8df93ce455	[X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194	2015-12-21 20:42:43 +00:00
Adrian Prantl	5d9acc2443	Teach ARMLoadStoreOptimizer to ignore DBG_VALUE instructions when merging instructions. As noted in PR24563. rdar://problem/23963293 llvm-svn: 256183	2015-12-21 19:25:03 +00:00
Tom Stellard	2b65ed306d	AMDGPU/SI: Fix encoding for FLAT_SCRATCH registers on VI Summary: These register has different encodings on CI and VI, so we add pseudo FLAT_SCRACTH registers to be used before MC, and subtarget specific registers to be used by the MC layer. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15661 llvm-svn: 256178	2015-12-21 18:44:27 +00:00
Tom Stellard	9da8620cdb	AMDGPU/SI: Change assembly name for flat scratch registers to flat_scratch This matches what the assembler accepts. llvm-svn: 256177	2015-12-21 18:44:21 +00:00
Matthew Simpson	11c4de6054	[AArch64] Add additional extract-extend patterns for smov This patch adds to the target description two additional patterns for matching extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and v8i16-to-i64 cases. The existing patterns miss these cases because the extracted elements must first be legalized to i32, resulting in any_extend nodes. This was originally implemented as a DAG combine (r255895), but was reverted due to failing out-of-tree tests. llvm-svn: 256176	2015-12-21 18:31:25 +00:00
Chad Rosier	353d71914a	Remove extra whitespace. NFC. llvm-svn: 256173	2015-12-21 18:08:05 +00:00
Dan Gohman	d544e0c100	[WebAssembly] Convert a regular for loop to a range-based for loop. llvm-svn: 256169	2015-12-21 17:22:02 +00:00
Dan Gohman	d9b4cdb68d	[WebAssembly] Clean up comments and fix a missing #include dependency. llvm-svn: 256168	2015-12-21 17:19:31 +00:00
Dan Gohman	979b766fef	[WebAssembly] Remove an unneeded empty destructor. llvm-svn: 256167	2015-12-21 17:12:40 +00:00
Dan Gohman	d587aa5917	[WebAssembly] Enclose the operand variables for load and store instructions in braces. This allows the AsmMatcherEmitter to properly tokenize the AsmStrings for load and store instructions. This is a step towards asm parsing. llvm-svn: 256166	2015-12-21 16:58:49 +00:00
Dan Gohman	a783f10c16	[WebAssembly] Mark the ARGUMENT pseudo-instructions as CodeGenOnly. llvm-svn: 256165	2015-12-21 16:53:29 +00:00
Dan Gohman	dd20c70b61	[WebAssembly] Add some comments and make some minor source cleanups. llvm-svn: 256164	2015-12-21 16:50:41 +00:00
Jun Bum Lim	4bb171c8da	Revert "[AArch64] Promote loads from stores" This reverts commit r256004 due to a failure in cortex-a53. llvm-svn: 256160	2015-12-21 15:36:49 +00:00
Chad Rosier	d016574df8	[AArch64] Enable PostRAScheduler for AArch64 generic build. Disable post-ra scheduler for perturbed tests to appease the bots and to preserve the history of the tests. http://reviews.llvm.org/D15652 llvm-svn: 256158	2015-12-21 14:43:45 +00:00
Igor Breger	44b60a3687	AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored. llvm-svn: 256157	2015-12-21 14:40:36 +00:00
Amjad Aboud	60b5e1b6c0	Implemented Support of IA interrupt and exception handlers: http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html Differential Revision: http://reviews.llvm.org/D15567 llvm-svn: 256155	2015-12-21 14:07:14 +00:00
Zlatko Buljan	5da2f6cd03	[mips][microMIPS] Implement DERET and DI instructions and check size operand for EXT and DEXT* instructions Differential Revision: http://reviews.llvm.org/D15570 llvm-svn: 256152	2015-12-21 13:08:58 +00:00
NAKAMURA Takumi	9ec6a826dd	[Cygwin] Enable TLS as emutls. It resolves clang selfhosting with std::once() for Cygwin. FIXME: It may be EmulatedTLS-generic also for X86-Android. FIXME: Pass EmulatedTLS to LLVM CodeGen from Clang with -femulated-tls. llvm-svn: 256134	2015-12-21 02:37:23 +00:00
Dylan McKay	f061e9b7b2	[AVR] Added AVRCallingConv.td llvm-svn: 256130	2015-12-20 23:17:44 +00:00
Craig Topper	ca66fc5473	[X86] Use range-based for loop. NFC llvm-svn: 256127	2015-12-20 18:41:57 +00:00
Craig Topper	074e845260	[X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift. This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code. Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases. llvm-svn: 256126	2015-12-20 18:41:54 +00:00
Dylan McKay	029346f438	Add AVR.td and AVRRegisterInfo.td Summary: This adds the core AVR TableGen file, along with the register descriptions. Lines in AVR.td which require other TableGen files which haven't been committed yet are commented out. This is a fairly trivial patch, and should only require a quick review. I kept the line width smaller than 80 columns, but there are a few exceptions because I'm not sure how to split a string over several lines. Reviewers: stoklund Subscribers: dylanmckay, agnat Differential Revision: http://reviews.llvm.org/D14684 llvm-svn: 256120	2015-12-20 12:16:20 +00:00
Weiming Zhao	613c6862fa	Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for Thumb2 Summary: r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb. r250697: http://reviews.llvm.org/rL250697 Reviewers: rmaprath Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D15653 llvm-svn: 256115	2015-12-20 06:41:44 +00:00
Tom Stellard	ffc1a5aef7	AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders Summary: The analysis of shader inputs was completely wrong. We were passing the wrong index to AttributeSet::hasAttribute() and the logic for which inputs where in SGPRs was wrong too. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15608 llvm-svn: 256082	2015-12-19 02:54:15 +00:00
Matt Arsenault	2aed6ca1d3	AMDGPU: Switch barrier intrinsics to using convergent noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075	2015-12-19 01:46:41 +00:00
Nicolai Haehnle	6bcf8b2890	AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15629 llvm-svn: 256073	2015-12-19 01:36:26 +00:00
Nicolai Haehnle	dd58705af6	AMDGPU: fix overlapping copies in copyPhysReg Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072	2015-12-19 01:16:06 +00:00
Krzysztof Parzyszek	21dc8bdd9e	[Hexagon] Add PIC support llvm-svn: 256025	2015-12-18 20:19:30 +00:00
Changpeng Fang	c9963936e7	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022	2015-12-18 20:04:28 +00:00
Changpeng Fang	ef735b74c1	Revert "AMDGPU/SI: Test commit" This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021	2015-12-18 20:04:26 +00:00
Changpeng Fang	7fdf674c2e	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020	2015-12-18 19:57:41 +00:00
Jun Bum Lim	3509d64c24	[AArch64] Promote loads from stores This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004	2015-12-18 18:08:30 +00:00
Zlatko Buljan	252cca555f	[mips][microMIPS][DSP] Implement PACKRL.PH, PICK.PH, PICK.QB, SHILO, SHILOV and WRDSP instructions Differential Revision: http://reviews.llvm.org/D14429 llvm-svn: 255991	2015-12-18 08:59:37 +00:00
Eric Christopher	8c2adf6b49	Remove unused class variables. llvm-svn: 255939	2015-12-17 23:43:40 +00:00
Hans Wennborg	a6a2e512cf	[X86] Use push-pop for materializing small constants under 'minsize' Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936	2015-12-17 23:18:39 +00:00
Matthew Simpson	13dddb0799	Revert "[AArch64] Add DAG combine for extract extend pattern" This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928	2015-12-17 21:29:47 +00:00
Dan Gohman	670a60ed52	[WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF. llvm-svn: 255925	2015-12-17 20:50:45 +00:00
Tom Stellard	caaa3aa07c	AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908	2015-12-17 17:05:09 +00:00
Nicolai Haehnle	87323da6eb	AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906	2015-12-17 16:46:42 +00:00
Rafael Espindola	f44db24e1f	Avoid explicit relocation sorting most of the time. These days relocations are created and stored in a deterministic way. The order they are created is also suitable for the .o file, so we don't need an explicit sort. The last remaining exception is MIPS. llvm-svn: 255902	2015-12-17 16:22:06 +00:00
Rafael Espindola	9e1cae510f	Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build" This reverts commit r255896. It broke the tests. llvm-svn: 255899	2015-12-17 15:12:26 +00:00
Rafael Espindola	d0e16522c7	Always sort by offset first. NFC. Every target changing sortRelocs was first calling the parent implementation. Just run that first. llvm-svn: 255898	2015-12-17 15:08:24 +00:00
Diego Novillo	8561841875	Fix unused variable warning in release builds. NFC. llvm-svn: 255897	2015-12-17 14:58:34 +00:00
MinSeong Kim	d05e9fd194	[AArch64] Enable PostRAScheduler for AArch64 generic build This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896	2015-12-17 14:51:22 +00:00
Matthew Simpson	4355e404d5	[AArch64] Add DAG combine for extract extend pattern This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895	2015-12-17 14:30:55 +00:00
Rafael Espindola	850ba46dd6	Simplify. NFC. llvm-svn: 255894	2015-12-17 14:19:52 +00:00
Alexey Bataev	7b72b658cc	[X86] Add option for enabling LEA optimization pass, by Andrey Turetsky Add option to enable/disable LEA optimization pass. By default the pass is disabled. Differential Revision: http://reviews.llvm.org/D15573 llvm-svn: 255881	2015-12-17 07:34:39 +00:00
Dan Gohman	5bf22fc84a	[WebAssembly] Convert WebAssemblyTargetObjectFile to TargetLoweringObjectFileELF llvm-svn: 255877	2015-12-17 04:55:44 +00:00
Matthias Braun	454192917b	AArch64: Simplify emitEpilogue() and related code; NFC This is in preparation to an upcoming patch. llvm-svn: 255872	2015-12-17 03:18:47 +00:00
Dan Gohman	05ac43fec3	[WebAssembly] Experimental ELF writer support This creates the initial infrastructure for writing ELF output files. It doesn't yet have any implementation for encoding instructions. Differential Revision: http://reviews.llvm.org/D15555 llvm-svn: 255869	2015-12-17 01:39:00 +00:00
JF Bastien	eefff9ccc5	WebAssembly: update expected torture test failures We now have 240 expected failures. llvm-svn: 255858	2015-12-17 00:12:06 +00:00
Dan Gohman	4172953813	[WebAssembly] Fix legalization of shift operators on large integer types. llvm-svn: 255847	2015-12-16 23:25:51 +00:00
Derek Schuff	8bb5f2927a	[WebAssembly] Implement eliminateCallFramePseudo Summary: Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN pseudo-instructions. Add a test calling a vararg function which causes non-0 adjustments. This revealed an issue with RegisterCoalescer wherein it eliminates a COPY from SP32 to a vreg but failes to update the live ranges of EXPR_STACK, causing a machineinstr verifier failure (so this test is commented out). Also add a dynamic alloca test, which causes a callseq_end dag node with a 0 (instead of undef) second argument to be generated. We currently fail to select that, so adjust the ADJCALLSTACKUP tablegen code to handle it. Differential Revision: http://reviews.llvm.org/D15587 llvm-svn: 255844	2015-12-16 23:21:30 +00:00
Ahmed Bougacha	66834ec6e1	[AArch64] Simplify some TRI/TII getters. NFC. We don't need static_casts when we use the right Subtarget. llvm-svn: 255836	2015-12-16 22:54:06 +00:00
Ahmed Bougacha	cecb6b0865	[CodeGen] Make MachineInstrBuilder::copyImplicitOps const. NFC. This matches the other MIB methods, none of which modify the builder. Without this, we can't chain copyImplicitOps. Also reformat the few users, in PPCEarlyReturn. llvm-svn: 255828	2015-12-16 22:15:30 +00:00
Manman Ren	cbe4f9417d	CXX_FAST_TLS calling convention: performance improvement for AArch64. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821	2015-12-16 21:04:19 +00:00
Derek Schuff	993d35b4aa	Remove now-unused include llvm-svn: 255817	2015-12-16 20:43:10 +00:00
Derek Schuff	83717cc297	Iterate over phys regs instead llvm-svn: 255816	2015-12-16 20:43:08 +00:00
Derek Schuff	45cd5a79b2	[WebAssembly] Print an extra local decl when the user stack pointer is used Differential Revision: http://reviews.llvm.org/D15546 llvm-svn: 255815	2015-12-16 20:43:06 +00:00
Krzysztof Parzyszek	4f9164d9b3	[Hexagon] Misc fixes to r255807 llvm-svn: 255811	2015-12-16 20:07:04 +00:00
Krzysztof Parzyszek	56bbf54b43	[Hexagon] Update the Hexagon packetizer llvm-svn: 255807	2015-12-16 19:36:12 +00:00
Reid Kleckner	187d33ee74	Revert "[ARM] Add ARMv8.2-A FP16 scalar instructions" This reverts commit r255762. llvm-svn: 255806	2015-12-16 19:21:03 +00:00
Dan Gohman	b3aa1ecab0	[WebAssembly] Fix the CFG Stackifier to handle unoptimized branches If a branch both branches to and falls through to the same block, treat it as an explicit branch. llvm-svn: 255803	2015-12-16 19:06:41 +00:00
Matt Arsenault	e05ff15186	AMDGPU: Override getCFInstrCost The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796	2015-12-16 18:37:19 +00:00
Dan Gohman	e2831b4e27	[WebAssembly] Use the new offset syntax for memory operands in inline asm. llvm-svn: 255788	2015-12-16 18:14:49 +00:00
Ulrich Weigand	88a7a2eac7	[SystemZ] Sort relocs to avoid code corruption by linker optimization The SystemZ linkers provide an optimization to transform a general- or local-dynamic TLS sequence into an initial-exec sequence if possible. Do do that, the compiler generates a function call to __tls_get_offset, which is a brasl instruction annotated with two relocations: - a R_390_PLT32DBL to install __tls_get_offset as branch target - a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that the TLS optimization should be performed if possible If the optimization is performed, the brasl is replaced by an ld load instruction. However, both relocs are processed independently by the linker. Therefore it is crucial that the R_390_PLT32DBL is processed first (installing the branch target for the brasl) and the R_390_TLS_GDCALL is processed second (replacing the whole brasl with an ld). If the relocs are swapped, the linker will first replace the brasl with an ld, and then install the __tls_get_offset branch target offset. Since ld has a different layout than brasl, this may even result in a completely different (or invalid) instruction; in any case, the resulting code is corrupted. Unfortunately, the way the MC common code sorts relocations causes these two to always end up the wrong way around, resulting in wrong code generation by the linker and crashes. This patch overrides the sortRelocs routine to detect this particular pair of relocs and enforce the required order. llvm-svn: 255787	2015-12-16 18:12:40 +00:00
Ulrich Weigand	47f3649374	[SystemZ] Fix assertion failure in adjustSubwordCmp When comparing a zero-extended value against a constant small enough to be in range of the inner type, it doesn't matter whether a signed or unsigned compare operation (for the outer type) is being used. This is why the code in adjustSubwordCmp had this assertion: assert(C.ICmpType == SystemZICMP::Any && "Signedness shouldn't matter here."); assuming the the caller had already detected that fact. However, it turns out that there cases, in particular with always-true or always- false conditions that have not been eliminated when compiling at -O0, where this is not true. Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any here, we can simply set it safely to SystemZICMP::Any, however. llvm-svn: 255786	2015-12-16 18:04:06 +00:00
Tobias Edler von Koch	b51460cf86	[Hexagon] Make memcpy lowering thread-safe This removes an unpleasant hack involving a global variable for special lowering of certain memcpy calls. These are now lowered as intended in EmitTargetCodeForMemcpy in the same way that other targets do it. llvm-svn: 255785	2015-12-16 17:29:37 +00:00
Dan Gohman	30a42bf585	[WebAssembly] Support more kinds of inline asm operands llvm-svn: 255782	2015-12-16 17:15:17 +00:00
Oliver Stannard	2de8c16913	[ARM] Add ARMv8.2-A FP16 vector instructions ARMv8.2-A adds 16-bit floating point versions of all existing SIMD floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Note that VFP without SIMD is not a valid combination for any version of ARMv8-A, but I have ensured that these instructions all depend on both FeatureNEON and FeatureFullFP16 for consistency. Differential Revision: http://reviews.llvm.org/D15039 llvm-svn: 255764	2015-12-16 12:37:39 +00:00
Oliver Stannard	48568cbe18	[ARM] Add ARMv8.2-A FP16 scalar instructions ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. The assembly for these instructions uses S registers (AArch32 does not have H registers), but the instructions have ".f16" type specifiers rather than ".f32" or ".f64". The top 16 bits of each source register are ignored, and the top 16 bits of the destination register are set to zero. These instructions are mostly the same as the 32- and 64-bit versions, but they use coprocessor 9 rather than 10 and 11. Two new instructions, VMOVX and VINS, have been added to allow packing and extracting two 16-bit floats stored in the top and bottom halves of an S register. New fixup kinds have been added for the PC-relative load and store instructions, but no ELF relocations have been added as they have a range of 512 bytes. Differential Revision: http://reviews.llvm.org/D15038 llvm-svn: 255762	2015-12-16 11:35:44 +00:00
Michael Kuperstein	e75e6e2a23	[X86] Improve shift combining This folds (ashr (shl a, [56,48,32,24,16]), SarConst) into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or into (lshr, (sext (a), SarConst - [56,48,32,24,16])) depending on sign of (SarConst - [56,48,32,24,16]) sexts in X86 are MOVs. The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size). However the MOVs have 2 advantages to SHIFTs on x86: 1. MOVs can write to a register that differs from source. 2. MOVs accept memory operands. This fixes PR24373. Patch by: evgeny.v.stupachenko@intel.com Differential Revision: http://reviews.llvm.org/D13161 llvm-svn: 255761	2015-12-16 11:22:37 +00:00
Reid Kleckner	7850c9f5ca	[WinEH] Make llvm.x86.seh.recoverfp work on x64 It adjusts from RSP-after-prologue to RBP, which is what SEH filters need to do before they can use llvm.localrecover. Fixes SEH filter captures, which were broken in r250088. Issue reported by Alex Crichton. llvm-svn: 255707	2015-12-15 23:40:58 +00:00
Hans Wennborg	7036e503d7	Fix "Not having LAHF/SAHF" assert. It wants to assert that the subtarget is 64-bit, not the register. llvm-svn: 255703	2015-12-15 23:21:46 +00:00
Tom Stellard	7750f4ed9e	AMDGPU/SI: Set the code object work group segment size when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15493 llvm-svn: 255702	2015-12-15 23:15:25 +00:00
Sanjay Patel	271efcdf20	[x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475) This patch improves on the suggested codegen from PR24475: https://llvm.org/bugs/show_bug.cgi?id=24475 but only for the fmaxf() case to start, so we can sort out any bugs before extending to fmin, f64, and vectors. The fmax / maxnum definitions provide us flexibility for signed zeros, so the only thing we have to worry about in this replacement sequence is NaN handling. Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes a problem: SelectionDAGBuilder::visitSelect() transforms compare/select instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps that should be checking for NaN inputs or global unsafe-math before transforming? As it stands, that bypasses a big set of optimizations that the x86 backend already has in PerformSELECTCombine(). Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so we have completely unnecessary operations happening on undef elements of the vector. Differential Revision: http://reviews.llvm.org/D15294 llvm-svn: 255700	2015-12-15 23:11:43 +00:00
James Y Knight	99fcb721b2	[Sparc] Tweak r255668: Use llvm_unreachable. llvm-svn: 255698	2015-12-15 23:07:16 +00:00
Tom Stellard	a495307e5e	AMDGPU/SI: Set the code objects private segment size when targeting HSA. Summary: I'm not sure how things worked before without this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15492 llvm-svn: 255692	2015-12-15 22:55:30 +00:00
Tom Stellard	29dd05e92f	AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15426 llvm-svn: 255689	2015-12-15 22:39:36 +00:00
Dan Gohman	4b9d7916ee	[WebAssembly] Implement instruction selection for constant offsets in addresses. Add instruction patterns for matching load and store instructions with constant offsets in addresses. The code is fairly redundant due to the need to replicate everything between imm, tglobaldadr, and texternalsym, but this appears to be common tablegen practice. The main alternative appears to be to introduce matching functions with C++ code, but sticking with purely generated matchers seems better for now. Also note that this doesn't yet support offsets from getelementptr, which will be the most common case; that will depend on a change in target-independent code in order to set the NoUnsignedWrap flag, which I'll submit separately. Until then, the testcase uses ptrtoint+add+inttoptr with a nuw on the add. Also implement isLegalAddressingMode with an approximation of this. Differential Revision: http://reviews.llvm.org/D15538 llvm-svn: 255681	2015-12-15 22:01:29 +00:00
Reid Kleckner	d7045faa10	[WinEH] Remove unused intrinsic llvm.x86.seh.restoreframe We can clean this up now that we have the X86 CATCHRET instruction to restore the FP, SP, and BP. llvm-svn: 255677	2015-12-15 21:41:34 +00:00
Tom Stellard	a6f24c6565	AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 llvm-svn: 255672	2015-12-15 20:55:55 +00:00
Justin Bogner	843fb204b7	LPM: Stop threading `Pass ` through all of the loop utility APIs. NFC A large number of loop utility functions take a `Pass ` and reach into it to find out which analyses to preserve. There are a number of problems with this: - The APIs have access to pretty well any Pass state they want, so it's hard to tell what they may or may not do. - Other APIs have copied these and pass around a `Pass *` even though they don't even use it. Some of these just hand a nullptr to the API since the callers don't even have a pass available. - Passes in the new pass manager don't work like the current ones, so the APIs can't be used as is there. Instead, we should explicitly thread the analysis results that we actually care about through these APIs. This is both simpler and more reusable. llvm-svn: 255669	2015-12-15 19:40:57 +00:00
James Y Knight	33beb24318	[Sparc] Fix handling of double incoming arguments on sparc little-endian. On SparcV8, doubles get passed in two 32-bit integer registers. The call code was already handling endianness correctly, but the incoming argument code was not -- it got the two halves in opposite order. Also remove some dead code in LowerFormalArguments_32 to handle less-than-32bit values, which can't actually happen. Finally, add some test cases for the 32-bit calling convention, cribbed from the 64abi.ll test, and run for both big and little-endian. llvm-svn: 255668	2015-12-15 19:23:12 +00:00
Michael Kuperstein	53946bf8c6	[X86] MOVPC32r should only emit CFI adjustments when needed We only want to emit CFI adjustments when actually using DWARF. This fixes PR25828. Differential Revision: http://reviews.llvm.org/D15522 llvm-svn: 255664	2015-12-15 18:50:32 +00:00
Tom Stellard	dbe374b2c5	AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence() Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15476 llvm-svn: 255661	2015-12-15 18:04:38 +00:00
Tom Stellard	8f307217c3	AMDGPU/SI: Fix bitcast between v2f32 and f64 The radeonsi fp64 support can hit these now that some redundant bitcasts are folded. Patch by: Michel Dänzer Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 255657	2015-12-15 17:11:17 +00:00
Hans Wennborg	08d5905bac	[X86] Smaller code for materializing 32-bit 1 and -1 constants "movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 llvm-svn: 255656	2015-12-15 17:10:28 +00:00
JF Bastien	dac806c783	WebAssembly: update expected torture test failures We now have 252 expected failures. llvm-svn: 255654	2015-12-15 17:07:07 +00:00
Krzysztof Parzyszek	372bd80834	[Hexagon] Preprocess mapped instructions before lowering to MC llvm-svn: 255653	2015-12-15 17:05:45 +00:00
Tom Stellard	43f52df0b5	AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics Summary: These are meant to be used instead of the llvm.SI.tid intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15475 llvm-svn: 255652	2015-12-15 17:02:52 +00:00
Tom Stellard	ad7d03daa6	AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics Summary: These are meant to be used instead of the llvm.SI.fs.interp intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15474 llvm-svn: 255651	2015-12-15 17:02:49 +00:00
Tom Stellard	ac00eb5470	AMDGPU/SI: Add getShaderType() function to Utils/ Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15424 llvm-svn: 255650	2015-12-15 16:26:16 +00:00
Nemanja Ivanovic	8922476bcb	Bitcasts between FP and INT values using direct moves This patch corresponds to review: http://reviews.llvm.org/D15286 This patch was meant to land in revision 255246, but I accidentally uploaded the patch that corresponds to http://reviews.llvm.org/D15372 in that revision accidentally. Thereby, this patch is the actual Bitcasts using direct moves patch, whereas http://reviews.llvm.org/rL255246 actually corresponds to http://reviews.llvm.org/D15372. llvm-svn: 255649	2015-12-15 14:50:34 +00:00

... 5 6 7 8 9 ...

36032 Commits