llvm-project

Commit Graph

Author	SHA1	Message	Date
Nicolai Haehnle	e2dda4f750	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Nicolai Haehnle	7483937bf0	AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi Summary: A shader stored the live mask (initial exec mask) in an SGPR which was then spilled during register allocation. The allocator quite reasonably optimized turned the spill into v_writelane_b32 %vgpr, exec_lo, N v_writelane_b32 %vgpr, exec_hi, N+1 at the beginning of the shader, confusing the SGPR accounting. No test case, because si-sgpr-spill.ll together with an upcoming patch for WQM handling exhibits the problem. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19199 llvm-svn: 266824	2016-04-19 21:58:17 +00:00
Krzysztof Parzyszek	3af70c126d	[Hexagon] Fix operand swapping in HexagonPeephole Also, disable zero- and size-extend optimizations for now. llvm-svn: 266821	2016-04-19 21:36:24 +00:00
Marcin Koscielnicki	3fdc257d6a	[AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic. Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818	2016-04-19 20:51:05 +00:00
Krzysztof Parzyszek	5ffee8d829	[Hexagon] Fix printing the address operand of S2_storerinewabs llvm-svn: 266811	2016-04-19 20:20:33 +00:00
Tim Shen	a1d8bc5597	[PPC, SSP] Support PowerPC Linux stack protection. llvm-svn: 266809	2016-04-19 20:14:52 +00:00
Tim Shen	e885d5e4d3	[SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806	2016-04-19 19:40:37 +00:00
Jacques Pienaar	50d4e98905	[lanai] Add lowering for SETCCE i32. * Add lowering for SETCCE i32. * Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE). * Fix select.ll test and immediate form selection for RI operations. llvm-svn: 266802	2016-04-19 19:15:25 +00:00
Sanjoy Das	2effffd456	[X86] Simplify StackMapShadowTracker; NFC - Elide trivial contructor and desctructor - Move implementation out of an unnecessary explicit llvm namespace scope llvm-svn: 266794	2016-04-19 18:48:16 +00:00
Sanjoy Das	6ecfae61dc	[X86MCInstLower] Clean up EmitNops; NFC Instead of having a conditional assert inside EmitNops, refactor so that the caller can have the assert instead. llvm-svn: 266793	2016-04-19 18:48:13 +00:00
Krzysztof Parzyszek	7b59ae28aa	[Hexagon] Implement branch relaxation Patch by Sirish Pande. llvm-svn: 266792	2016-04-19 18:30:18 +00:00
David L Kreitzer	d5cb34118d	Preliminary changes for fixing PR27241. Generalized/restructured some things in preparation for enabling the outgoing parameter store-to-push optimization for 64-bit targets. Differential Revision: http://reviews.llvm.org/D19222 llvm-svn: 266774	2016-04-19 17:43:44 +00:00
Simon Pilgrim	32b1c9fe7f	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728	2016-04-19 12:26:40 +00:00
Sanjoy Das	fe71ec771a	Disable the PatchableFunction pass for NVPTX & Wasm PatchableFunction requires AllVRegsAllocated that these targets don't provide. llvm-svn: 266720	2016-04-19 06:24:58 +00:00
Sanjoy Das	c0441c29df	Introduce a "patchable-function" function attribute Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated code easily patchable at runtime (for instance, to redirect control). Right now only one patchability scheme is supported, `"prologue-short-redirect"`, but this can be expanded in the future. Reviewers: joker.eph, rnk, echristo, dberris Subscribers: joker.eph, echristo, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19046 llvm-svn: 266715	2016-04-19 05:24:47 +00:00
Jacques Pienaar	250c4bec9e	[lanai] Set boolean contentss to ZeroOrOneBooleanContent. llvm-svn: 266701	2016-04-19 00:26:42 +00:00
Tim Northover	b629c77692	ARM: use a pseudo-instruction for cmpxchg at -O0. The fast register-allocator cannot cope with inter-block dependencies without spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions where any value produced within a block is dead by the end, but not for cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded after regalloc. Fortunately this is at -O0 so we don't have to care about performance. This simplifies the various axes of expansion considerably: we assume a strong seq_cst operation and ensure ordering via the always-present DMB instructions rather than v8 acquire/release instructions. Should fix the 32-bit part of PR25526. llvm-svn: 266679	2016-04-18 21:48:55 +00:00
JF Bastien	246e796bcd	Lanai: fix debug build There's currently no raw_ostream &operator<<(SimpleValueType); provided by LLVM. It could be added by refactoring utils/TableGen/CodeGenTarget.cpp:getEnumName, but that's much more work than fixing the build. llvm-svn: 266627	2016-04-18 16:33:41 +00:00
Konstantin Zhuravlyov	8c273ad719	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626	2016-04-18 16:28:23 +00:00
Artem Tamazov	e2762423c2	[AMDGPU][llvm-mc] s_setreg* - Fix order of operands Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first. Differential Revision: http://reviews.llvm.org/D19161 llvm-svn: 266617	2016-04-18 14:54:26 +00:00
Aaron Ballman	2eeefe8ed8	Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616	2016-04-18 14:47:19 +00:00
Daniel Sanders	d8c07766f3	[mips][ias] Prevent double-filling of delay slots by generating '.set noreorder' regions. Summary: When clang is given -save-temps or -via-file-asm, any inline assembly in the source is parsed twice. Once by the compiler, and again by the assembler. We must take care to ensure that this doesn't lead to double-filling delay slots. Reviewers: sdardis, vkalintiris Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D19166 llvm-svn: 266608	2016-04-18 12:35:36 +00:00
Eric Liu	0179230ae8	Include SmallVector.h header in lib/Target/WebAssembly/InstPrinter/WebAssemblyInstPrinter.h llvm-svn: 266606	2016-04-18 12:21:59 +00:00
Renato Golin	4b18a510a2	[ARM] AArch32 v8 NEON is still not IEEE-754 compliant llvm-svn: 266603	2016-04-18 12:06:47 +00:00
Daniel Sanders	c6924fa5d6	[mips][ias] Stream macro expansions to output instead of buffering them. NFC. Summary: This will allows us to eliminate some magic numbers from the offset operand of branch instructions in favour of symbols and makes it possible to avoid double-filling delay slots when clang is given -save-temps. parseDirectiveCpRestore() is calling isIntegratedAssemblerRequired() for the moment since correctly pushing the generation of these instructions into the ELF target streamer is tricky enough to warrant a separate patch. Reviewers: sdardis, vkalintiris Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D19164 llvm-svn: 266602	2016-04-18 12:06:15 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Craig Topper	221e1c2b1f	[X86] Be explicit about calls to setOperationAction for AVX2 and AVX512 rather than just looping over all vector types and conditinally matching them. NFC llvm-svn: 266577	2016-04-17 22:49:46 +00:00
Craig Topper	6ff46266d1	Declare MVT::SimpleValueType as an int8_t sized enum. This removes 400 bytes from TargetLoweringBase and probably other places. This required changing several places to print VT enums as strings instead of raw ints since the proper method to use to print became ambiguous. This is probably an improvement anyway. This also appears to save ~8K from an x86 self host build of llc. llvm-svn: 266562	2016-04-17 17:37:33 +00:00
Simon Pilgrim	dd153476fd	[X86] Added TODO comment for target shuffle mask decoding of bitcasted masks llvm-svn: 266559	2016-04-17 11:34:18 +00:00
Asaf Badouh	aec79651c1	[X86] Remove unneeded variables no functional change. ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL). Differential Revision: http://reviews.llvm.org/D18942 llvm-svn: 266557	2016-04-17 08:28:40 +00:00
Craig Topper	75869d5701	[AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled. llvm-svn: 266554	2016-04-17 07:25:39 +00:00
Craig Topper	1663e7a472	[X86] Use ternary operator to reduce code slightly. NFC llvm-svn: 266534	2016-04-16 19:09:32 +00:00
Simon Pilgrim	fd4b9b02a3	[X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support Added additional test that peeks through bitcast to v16i8 mask llvm-svn: 266533	2016-04-16 17:52:07 +00:00
Matt Arsenault	c10783c42d	AMDGPU: Enable LocalStackSlotAllocation pass This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508	2016-04-16 02:13:37 +00:00
Matt Arsenault	b6be202779	AMDGPU: Use s_addk_i32 / s_mulk_i32 llvm-svn: 266506	2016-04-16 01:46:49 +00:00
Vasileios Kalintiris	5a971a48c3	[mips] More range-based for loops. NFC. There are still a couple more inside the MIPS target. I opted for a single commit in order to avoid spamming the list. llvm-svn: 266472	2016-04-15 20:43:17 +00:00
Vasileios Kalintiris	36311395ae	[mips] Use range-based for loops and simplify slightly the code. NFC. llvm-svn: 266471	2016-04-15 20:18:48 +00:00
Ulrich Weigand	6e648ea533	[SystemZ] Call tryAddingSymbolicOperand in the disassembler Use the tryAddingSymbolicOperand callback to attempt to present immediate values in symbolic form when disassembling. This is currently only used for PC-relative immediates (which are most likely to be symbolic in the SystemZ ISA). Add new DecodeMethod types to allow distinguishing between branch and non-branch instructions. llvm-svn: 266469	2016-04-15 19:55:58 +00:00
Tim Northover	903f81ba18	ARM: don't try to hoist constant RHS out of a division. Divisions by a constant can be converted into multiplies which are usually cheaper, but this isn't possible if the constant gets separated (particularly in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is cheap. I considered making the check generic, but neither AArch64 (strangely) nor x86 showed any benefit on the tests I had. llvm-svn: 266464	2016-04-15 18:17:18 +00:00
Chad Rosier	1fbe9bcab4	[AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth(). This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462	2016-04-15 18:09:10 +00:00
Geoff Berry	c376406669	[AArch64] Add MMOs to callee-save load/store instructions. Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439	2016-04-15 15:16:19 +00:00
Nirav Dave	1f51c334ca	Fix typing on generated LXV2DX/STXV2DX instructions [PPC] Previously when casting generic loads to LXV2DX/ST instructions we would leave the original load return type in place allowing for an assertion failure when we merge two equivalent LXV2DX nodes with different types. This fixes PR27350. Reviewers: nemanjai Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19133 llvm-svn: 266438	2016-04-15 15:01:38 +00:00
Jun Bum Lim	4c5bd58ebe	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Nicolai Haehnle	750082d1fe	AMDGPU/SI: Fix regression with no-return atomics Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433	2016-04-15 14:42:36 +00:00
Craig Topper	18e69f4f63	Use MVT instead of EVT to remove a bunch of unnecessary calls to getSimpleVT. llvm-svn: 266414	2016-04-15 06:20:21 +00:00
Craig Topper	ea46b592ab	Add a setOperationPromotedToType convenience method that sets an operation to promoted and set the type in one call. Use it so save code in X86. llvm-svn: 266413	2016-04-15 06:20:18 +00:00
Craig Topper	13e9dc66e4	[X86] AND, OR, and XOR of vectors are always legal no need to set them legal explicitly. llvm-svn: 266412	2016-04-15 06:20:14 +00:00
Craig Topper	5e20fd3e7c	[X86] Combine an if and else block that had the same set of calls to setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC llvm-svn: 266410	2016-04-15 04:57:09 +00:00
Justin Lebar	cd5fbea67e	[NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5. Summary: Calls on NVPTX are unusually expensive (for one thing, lots of state needs to be saved to memory, which is slow), so make the inlininer much more aggressive. Reviewers: chandlerc Subscribers: jholewinski, llvm-commits, tra Differential Revision: http://reviews.llvm.org/D18561 llvm-svn: 266406	2016-04-15 01:38:50 +00:00
Matt Arsenault	9c499c3a74	AMDGPU: Remove custom load/store scalarization llvm-svn: 266385	2016-04-14 23:31:26 +00:00
Matt Arsenault	fd8ab09c0e	AMDGPU: Include LDS size in printed comment llvm-svn: 266382	2016-04-14 22:11:51 +00:00
Mehdi Amini	03b42e41bf	Remove every uses of getGlobalContext() in LLVM (but the C API) At the same time, fixes InstructionsTest::CastInst unittest: yes you can leave the IR in an invalid state and exit when you don't destroy the context (like the global one), no longer now. This is the first part of http://reviews.llvm.org/D19094 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266379	2016-04-14 21:59:01 +00:00
Matt Arsenault	3d1c1deb04	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Matt Arsenault	4ac341c8b3	AMDGPU: Directly emit m0 initialization with s_mov_b32 Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377	2016-04-14 21:58:15 +00:00
Matt Arsenault	7900334dd5	AMDGPU: Fold bitcasts of scalar constants to vectors This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376	2016-04-14 21:58:07 +00:00
Renato Golin	5cb666add7	[ARM] Adding IEEE-754 SIMD detection to loop vectorizer Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363	2016-04-14 20:42:18 +00:00
Tom Stellard	000c5af3e6	AMDGPU: Add skeleton GlobalIsel implementation Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356	2016-04-14 19:09:28 +00:00
Reid Kleckner	28865809fe	Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.h MachineInstr.h and MachineInstrBuilder.h are very popular headers, widely included across all LLVM backends. It turns out that there only a handful of TUs that actually care about DI operands on MachineInstrs. After this change, touching DebugInfoMetadata.h and rebuilding llc only needs 112 actions instead of 542. llvm-svn: 266351	2016-04-14 18:29:59 +00:00
Jacques Pienaar	ad1db3597e	[lanai] Add custom lowering for SRL_PARTS i32. llvm-svn: 266349	2016-04-14 17:59:22 +00:00
Tom Stellard	cef0fe4245	[GlobalISel] Move GISelAccessor class into public headers Reviewers: qcolombet Subscribers: joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19120 llvm-svn: 266348	2016-04-14 17:45:38 +00:00
Nicolai Haehnle	05b127da06	[StructurizeCFG] Annotate branches that were treated as uniform Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346	2016-04-14 17:42:35 +00:00
Nicolai Haehnle	723b73b4eb	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345	2016-04-14 17:42:29 +00:00
Nicolai Haehnle	19f0f5177d	AMDGPU: change a redundant if () to an assert(). NFC Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344	2016-04-14 17:42:18 +00:00
Tom Stellard	b72a65ff53	[GlobalISel] Coding style and whitespace fixes Reviewers: qcolombet Subscribers: joker.eph, llvm-commits, vkalintiris Differential Revision: http://reviews.llvm.org/D19119 llvm-svn: 266342	2016-04-14 17:23:33 +00:00
Tim Northover	cdf1529c01	AArch64: expand cmpxchg after regalloc at -O0. FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339	2016-04-14 17:03:29 +00:00
Jacques Pienaar	add4a274ba	[lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth. Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched. Reviewers: eliben, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18903 llvm-svn: 266338	2016-04-14 16:47:42 +00:00
Tom Stellard	79a1fd718c	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Tom Stellard	f110f8f9f7	AMDGPU/SI: Use the correct scratch wave offset register for shaders. Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336	2016-04-14 16:27:03 +00:00
Simon Dardis	53a3492b71	Summary: Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like binutils. This patch was previous committed as r266055 as seemed to have caused some spurious test failures. They did not reappear after further local testing. llvm-svn: 266301	2016-04-14 13:43:17 +00:00
Mehdi Amini	867e91468b	Do not use getGlobalContext()... ever. This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275	2016-04-14 04:36:40 +00:00
Matt Arsenault	9cd90712f0	AMDGPU: Implement canonicalize Also add generic DAG node for it. llvm-svn: 266272	2016-04-14 01:42:16 +00:00
Matthias Braun	46b0f03e12	TargetLowering: Factor out common code for tail call eligibility checking; NFC llvm-svn: 266270	2016-04-14 01:10:42 +00:00
Tim Northover	5c02f9ad28	ARM: override cost function to re-enable ConstantHoisting (& fix it). At some point, ARM stopped getting any benefit from ConstantHoisting because the pass called a different variant of getIntImmCost. Reimplementing the correct variant revealed some problems, however: + ConstantHoisting was modifying switch statements. This is simply invalid, the cases must remain integer constants no matter the notional cost. + ConstantHoisting was mangling alloca instructions in the entry block. These should be handled by FrameLowering, so constants actually have a cost of 0. Worse, the resulting bitcasts meant they became dynamic allocas. rdar://25707382 llvm-svn: 266260	2016-04-13 23:08:27 +00:00
Matthias Braun	707e02c273	ARM: Use a callee save register for the swiftself parameter. It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18901 llvm-svn: 266253	2016-04-13 21:43:25 +00:00
Matthias Braun	588d1cdad4	X86: Use a callee save register for the swiftself parameter. It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18902 llvm-svn: 266252	2016-04-13 21:43:21 +00:00
Matthias Braun	74a0bd319a	AArch64: Use a callee save registers for swiftself parameters It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D19007 llvm-svn: 266251	2016-04-13 21:43:16 +00:00
Tom Stellard	b951f10701	AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244	2016-04-13 20:44:16 +00:00
Nemanja Ivanovic	87bcae366d	[PowerPC] Basic support for P9 byte comparison and count trailing zero insns This patch corresponds to review: http://reviews.llvm.org/D17850 This patch implements the following instructions: cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd. llvm-svn: 266228	2016-04-13 18:51:18 +00:00
Evandro Menezes	8d53f88162	[AArch64] Disable LDP/STP for quads Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs of regular LDR/STR. Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>. llvm-svn: 266223	2016-04-13 18:31:45 +00:00
Tim Northover	b8a1ecfc62	AArch64: don't create instructions that write to xzr/wzr twice. These are unpredictable even on AArch64. Patch by Yichao Yu. llvm-svn: 266206	2016-04-13 16:25:39 +00:00
Artem Tamazov	eb4d5a9b0b	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205	2016-04-13 16:18:41 +00:00
Zoran Jovanovic	2f6845ba39	[mips] Fix emitAtomicCmpSwapPartword to handle 64 bit pointers correctly Differential Revision: http://reviews.llvm.org/D18995 llvm-svn: 266204	2016-04-13 16:02:25 +00:00
Vasileios Kalintiris	3751d4114c	[mips] Sign-extend i32 values truncated from previously zero-extended i32 values. Summary: This is a special case for MIPS64 because the architecture requires properly 32-bit sign-extended values in the register containers. Additionaly, we merge consecutive trunc + AssertZExt nodes in order to avoid unnecessary sign-extensions when the extension comes from a type smaller than i32. Reviewers: dsanders Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D18893 llvm-svn: 266203	2016-04-13 15:07:45 +00:00
Zlatko Buljan	58d6a959be	[mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions Differential Revision: http://reviews.llvm.org/D17137 This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068. There was the problem with test-suite failure. The problem is hopefully solved with dependant patch so this patch is commited again. llvm-svn: 266179	2016-04-13 08:02:26 +00:00
Hrvoje Varga	11dd31df9a	[mips][microMIPS] Fix for "Cannot copy registers" assertion Differential Revision: http://reviews.llvm.org/D17068 This changes contains fix for failing test-suite. So, this patch should hopefully work now. llvm-svn: 266171	2016-04-13 06:17:21 +00:00
Tom Stellard	703b2ec43f	AMDGPU/SI: Fix spilling of 96-bit registers Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152	2016-04-12 23:57:30 +00:00
Evandro Menezes	551af44e31	[AArch64] Fuse AES{D,E}/AESMC for Exynos M1. (NFC) llvm-svn: 266144	2016-04-12 22:42:36 +00:00
David L Kreitzer	99775c1b6e	Fixed a few typos and formatting problems. NFCI. llvm-svn: 266135	2016-04-12 21:45:09 +00:00
Justin Bogner	32ad24d4ef	X86: Avoid accessing SDValues after they've been RAUW'd This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress matches an ADD with an AND as an operand, and that AND hits one of the "heroic transforms" that folds masks and shifts, we end up with N pointing to an SDNode that was deleted. Make sure we're done accessing it before that. Found by ASan with the recycling allocator changes in llvm.org/PR26808. llvm-svn: 266130	2016-04-12 21:34:24 +00:00
Nicolai Haehnle	df77c9ada4	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126	2016-04-12 21:18:10 +00:00
James Y Knight	19f6cce4e3	Add __atomic_* lowering to AtomicExpandPass. (Recommit of r266002, with r266011, r266016, and not accidentally including an extra unused/uninitialized element in LibcallRoutineNames) AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and cmpxchg instructions to __atomic_* library calls, when the target doesn't support atomics of a given size. This is the first step towards moving all atomic lowering from clang into llvm. When all is done, the behavior of __sync_* builtins, __atomic_* builtins, and C11 atomics will be unified. Previously LLVM would pass everything through to the ISelLowering code. There, unsupported atomic instructions would turn into __sync_* library calls. Because of that behavior, Clang currently avoids emitting llvm IR atomic instructions when this would happen, and emits __atomic_* library functions itself, in the frontend. This change makes LLVM able to emit __atomic_* libcalls, and thus will eventually allow clang to depend on LLVM to do the right thing. It is advantageous to do the new lowering to atomic libcalls in AtomicExpandPass, before ISel time, because it's important that all atomic operations for a given size either lower to __atomic_* libcalls (which may use locks), or native instructions which won't. No mixing and matching. At the moment, this code is enabled only for SPARC, as a demonstration. The next commit will expand support to all of the other targets. Differential Revision: http://reviews.llvm.org/D18200 llvm-svn: 266115	2016-04-12 20:18:48 +00:00
Tom Stellard	ab1d3a9d50	AMDGPU/SI: Insert wait states required after v_readfirstlane on SI Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105	2016-04-12 18:40:43 +00:00
Matt Arsenault	3b08238f78	AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32 This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104	2016-04-12 18:24:38 +00:00
Sanjay Patel	e6a0a23e08	fix indentation; NFC llvm-svn: 266097	2016-04-12 18:01:48 +00:00
Nicolai Haehnle	279970c0dc	AMDGPU/SI: Fix a mis-compilation of multi-level breaks Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088	2016-04-12 16:10:38 +00:00
Petar Jovanovic	48e4db1ca2	[mips] add assembler support for .set arch=octeon This patch enables assembler support for .set arch=octeon. It will fix issues with inline assembler when this directive is used. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D18548 llvm-svn: 266081	2016-04-12 15:28:16 +00:00
Matt Arsenault	64fa2f4513	AMDGPU: Implement i64 global atomics llvm-svn: 266075	2016-04-12 14:05:11 +00:00
Matt Arsenault	a9dbdcae04	AMDGPU: Add atomic_inc + atomic_dec intrinsics These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074	2016-04-12 14:05:04 +00:00
Matt Arsenault	21ecfe43ba	AMDGPU: Remove trailing whitespace llvm-svn: 266073	2016-04-12 14:04:54 +00:00
Rafael Espindola	d41b54be11	This reverts commit r266002, r266011 and r266016. They broke the msan bot. Original message: Add __atomic_* lowering to AtomicExpandPass. AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and cmpxchg instructions to __atomic_* library calls, when the target doesn't support atomics of a given size. This is the first step towards moving all atomic lowering from clang into llvm. When all is done, the behavior of __sync_* builtins, __atomic_* builtins, and C11 atomics will be unified. Previously LLVM would pass everything through to the ISelLowering code. There, unsupported atomic instructions would turn into __sync_* library calls. Because of that behavior, Clang currently avoids emitting llvm IR atomic instructions when this would happen, and emits __atomic_* library functions itself, in the frontend. This change makes LLVM able to emit __atomic_* libcalls, and thus will eventually allow clang to depend on LLVM to do the right thing. It is advantageous to do the new lowering to atomic libcalls in AtomicExpandPass, before ISel time, because it's important that all atomic operations for a given size either lower to __atomic_* libcalls (which may use locks), or native instructions which won't. No mixing and matching. At the moment, this code is enabled only for SPARC, as a demonstration. The next commit will expand support to all of the other targets. Differential Revision: http://reviews.llvm.org/D18200 llvm-svn: 266062	2016-04-12 12:30:25 +00:00
Simon Dardis	ee1590f5f0	Revert "[mips] MIPSR6 Compact branch aliases" This reverts commit r266055. ps4-buildslave2 is highlighting a failure. llvm-svn: 266061	2016-04-12 12:22:45 +00:00
Jonas Paulsson	f0fc50905f	[SystemZ] Use LDE32 instead of LE, when Offset is small. On z13, if eliminateFrameIndex() chooses LE (and not LEY), immediately transform that LE to LDE32 to avoid partial register dependencies. LEY should be generally preferred for big offsets over an expansion into LAY + LDE32. Reviewed by Ulrich Weigand. llvm-svn: 266060	2016-04-12 12:07:23 +00:00
Simon Dardis	703c864fe3	[mips] MIPSR6 Compact branch aliases Summary: Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like binutils. Reviewers: dsanders Differential Revision: http://reviews.llvm.org/D18856 llvm-svn: 266055	2016-04-12 10:41:53 +00:00
Chuang-Yu Cheng	94f58e79ae	[PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0 Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046). The PPCInstrInfo::optimizeCompareInstr function could create a new use of CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of CR0 is created. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D18884 llvm-svn: 266040	2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng	6efde2fb45	[PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field In the ELFv2 ABI, we are not required to save all CR fields. If only one nonvolatile CR field is clobbered, use mfocrf instead of mfcr to selectively save the field, because mfocrf has short latency compares to mfcr. Thanks Nemanja's invaluable hint! Reviewers: nemanjai tjablin hfinkel kbarton http://reviews.llvm.org/D17749 llvm-svn: 266038	2016-04-12 03:04:44 +00:00
Matthias Braun	dff243e597	AArch64: Drive-by cleanup llvm-svn: 266035	2016-04-12 02:16:13 +00:00
Tim Northover	a6dea06fe3	ARM: use r7 as the frame-pointer on all MachO targets. This is better for a few reasons: + It matches the other tooling for iOS. + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't use ARM mode). + It leads to infinitesimally smaller code (0.2%, yay!). rdar://25369506 llvm-svn: 266003	2016-04-11 22:27:40 +00:00
James Y Knight	b91d38c5fe	Add __atomic_* lowering to AtomicExpandPass. AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and cmpxchg instructions to __atomic_* library calls, when the target doesn't support atomics of a given size. This is the first step towards moving all atomic lowering from clang into llvm. When all is done, the behavior of __sync_* builtins, __atomic_* builtins, and C11 atomics will be unified. Previously LLVM would pass everything through to the ISelLowering code. There, unsupported atomic instructions would turn into __sync_* library calls. Because of that behavior, Clang currently avoids emitting llvm IR atomic instructions when this would happen, and emits __atomic_* library functions itself, in the frontend. This change makes LLVM able to emit __atomic_* libcalls, and thus will eventually allow clang to depend on LLVM to do the right thing. It is advantageous to do the new lowering to atomic libcalls in AtomicExpandPass, before ISel time, because it's important that all atomic operations for a given size either lower to __atomic_* libcalls (which may use locks), or native instructions which won't. No mixing and matching. At the moment, this code is enabled only for SPARC, as a demonstration. The next commit will expand support to all of the other targets. Differential Revision: http://reviews.llvm.org/D18200 llvm-svn: 266002	2016-04-11 22:22:33 +00:00
Manman Ren	5751814eda	Swift Calling Convention: swifterror target support. Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997	2016-04-11 21:08:06 +00:00
Tom Stellard	0ffdf65eaa	Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute" This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992	2016-04-11 20:38:40 +00:00
Hans Wennborg	1f09485c40	Fix broken assert, PR24624 llvm-svn: 265989	2016-04-11 20:35:41 +00:00
Sriraman Tallam	f39e190ad8	Test commit. llvm-svn: 265976	2016-04-11 18:40:50 +00:00
Petar Jovanovic	e578e970cb	[mips] Make Static a default relocation model for MIPS codegen This change follows up defaults for GCC and Clang, so LLVM does not differ from them. While number of the test files are touched with this change, they all keep the old (expected) behaviour with the explicit option: "-relocation-model=pic" The tests that have not been touched are insensitive to relocation model. Differential Revision: http://reviews.llvm.org/D17995 llvm-svn: 265949	2016-04-11 15:24:23 +00:00
Daniel Sanders	a45d3e439f	[mips] Trivial corrections to range checked immediates. Summary: SYNC has a 5-bit unsigned immediate. Move MIPS16-specific pcrel16 operand to Mips16 files. Reviewers: vkalintiris Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D18755 llvm-svn: 265947	2016-04-11 15:20:40 +00:00
Ulrich Weigand	aa04768600	[SystemZ] README: remove an implemented idea, add some new ones The note about conditional returns can now be removed, as they are implemented. Let's also add 2 new ones in exchange. Author: koriakin Differential Revision: http://reviews.llvm.org/D18962 llvm-svn: 265944	2016-04-11 14:38:47 +00:00
Ulrich Weigand	1bac911c58	[SystemZ] Add SVC instruction This is going to be useful for inline assembly only. Author: koriakin Differential Revision: http://reviews.llvm.org/D18952 llvm-svn: 265943	2016-04-11 14:35:39 +00:00
Oliver Stannard	c869e9158d	[ARM] Avoid switching ARM/Thumb mode on .arch/.cpu directive When we see a .arch or .cpu directive, we should try to avoid switching ARM/Thumb mode if possible. If we do have to switch modes, we also need to emit the correct mapping symbol for the new ISA. We did not do this previously, so could emit ARM code with Thumb mapping symbols (or vice-versa). The GAS behaviour is to always stay in the same mode, and to emit an error on any instructions seen when the current mode is not available on the current target. We can't represent that situation easily (we assume that Thumb mode is available if ModeThumb is set), so we differ from the GAS behaviour when switching to a target that can't support the old mode. I've added a warning for when this implicit mode-switch occurs. Differential Revision: http://reviews.llvm.org/D18955 llvm-svn: 265936	2016-04-11 13:06:28 +00:00
Ulrich Weigand	848a513d0a	[SystemZ] Support conditional indirect sibling calls via BCR This adds a conditional variant of CallBR instruction, CallBCR. Also, it can be fused with integer comparisons, resulting in one of the new C*BCall instructions. In addition to CallBRCL limitations, this has another one: it won't trigger if the function to call isn't already in %r1 - see f22 in the test for an example (it's also why the loads in tests are volatile). Author: koriakin Differential Revision: http://reviews.llvm.org/D18928 llvm-svn: 265933	2016-04-11 12:12:32 +00:00
Ulrich Weigand	fb97c51f6f	[SystemZ] Remove incorrect CC use for C*BReturn instructions These are fused compare-and-branches, so they obviously don't use CC. Author: koriakin Differential Revision: http://reviews.llvm.org/D18927 llvm-svn: 265932	2016-04-11 12:03:30 +00:00
Andrey Turetskiy	9df334c28e	[X86] Restrict max long nop length for Lakemont. Restrict the max length of long nops for Lakemont to 7. Experiments on MCU benchmarks (Dhrystone, Coremark) show that this is the most optimal length. Differential Revision: http://reviews.llvm.org/D18897 llvm-svn: 265924	2016-04-11 10:07:36 +00:00
Simon Pilgrim	d263fdc512	[X86][AVX512BW] Add support for v64i8 multiplies Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets. I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL. Differential Revision: http://reviews.llvm.org/D18937 llvm-svn: 265902	2016-04-10 17:02:48 +00:00
Craig Topper	35db8ecb50	[X86] Use for loops over types to reduce code for setting up operation actions. llvm-svn: 265893	2016-04-10 05:39:32 +00:00
Craig Topper	dcc8f49bf0	[X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC llvm-svn: 265892	2016-04-10 05:39:28 +00:00
Davide Italiano	7aa47094b2	[MC] support TLSDESC and TLSCALL / GNU2 tls dialect Differential Revision: http://reviews.llvm.org/D18885 llvm-svn: 265881	2016-04-09 20:32:33 +00:00
Sanjay Patel	4abae4e0fa	[x86] use BMI 'andn' for logic + compare ops With BMI, we can use 'andn' to save an instruction when the result is only used in a compare. This is related to one of the potential sequences to check 'isfinite' in: https://llvm.org/bugs/show_bug.cgi?id=27164 Differential Revision: http://reviews.llvm.org/D18910 llvm-svn: 265875	2016-04-09 16:02:52 +00:00
Simon Pilgrim	1cc5712763	[X86][XOP] Support for VPPERM 2-input shuffle mask decoding This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874	2016-04-09 14:51:26 +00:00
Craig Topper	f027107094	[X86] Use for loops over types to reduce code for setting up operation actions. NFC llvm-svn: 265871	2016-04-09 06:31:02 +00:00
Craig Topper	e801ed9e15	[X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC llvm-svn: 265870	2016-04-09 05:53:48 +00:00
Tim Shen	0012756489	[SSP] Remove llvm.stackprotectorcheck. This is a cleanup patch for SSP support in LLVM. There is no functional change. llvm.stackprotectorcheck is not needed, because SelectionDAG isn't actually lowering it in SelectBasicBlock; rather, it adds check code in FinishBasicBlock, ignoring the position where the intrinsic is inserted (See FindSplitPointForStackProtector()). llvm-svn: 265851	2016-04-08 21:26:31 +00:00
Hans Wennborg	e25b65bdb7	Rangeify a loop. NFC. llvm-svn: 265846	2016-04-08 20:46:09 +00:00
Hans Wennborg	74ff770670	Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC These are already defined, with the same values, a few lines up. NFC. llvm-svn: 265845	2016-04-08 20:46:00 +00:00
Kevin B. Smith	e0a6fc3bcc	[X86] Fix PR23155 by turning on X86FixupBWInsts by default. Differential Revision: http://reviews.llvm.org/D18866 llvm-svn: 265830	2016-04-08 18:58:29 +00:00
Ulrich Weigand	fa2dffbc1a	[SystemZ] Support conditional sibling calls via BRCL This adds a conditional variant of CallJG instruction, CallBRCL. It can be used for conditional sibling calls. Unfortunately, due to IfCvt limitations, it only really works well for functions without arguments. Author: koriakin Differential Revision: http://reviews.llvm.org/D18864 llvm-svn: 265814	2016-04-08 17:22:19 +00:00
Sam Parker	2d5126cdf5	[ARM] Enable SMLAW[B\|T] and SMLUW[B\|T] instruction selection Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt, smulwb and smulwt instructions for the ARM backend. Also updated the smul CodeGen test and removed the smulw one. Differential Revision: http://reviews.llvm.org/D18892 llvm-svn: 265793	2016-04-08 16:02:53 +00:00
Simon Pilgrim	476170384f	[X86] Tidied up shuffle decode function doxygen descriptions As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions. llvm-svn: 265785	2016-04-08 14:17:07 +00:00
Chuang-Yu Cheng	98c1894755	CXX_FAST_TLS calling convention: performance improvement for PPC64 This is the same change on PPC64 as r255821 on AArch64. I have even borrowed his commit message. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D17533 llvm-svn: 265781	2016-04-08 12:04:32 +00:00
Vasileios Kalintiris	957d849e03	[mips] Use range-based for loops. NFC. llvm-svn: 265780	2016-04-08 10:33:00 +00:00
Zlatko Buljan	53a037f5cc	[mips][microMIPS] Add CodeGen support for ADD, ADDIU, ADDU and DADD* instructions Differential Revision: http://reviews.llvm.org/D16454 llvm-svn: 265772	2016-04-08 07:27:26 +00:00
Quentin Colombet	88f7f6bc4f	[AArch64] Fix a typo in the register class to register bank mapping. For GPR family we want the GPR register bank, not FPR! llvm-svn: 265743	2016-04-07 23:10:14 +00:00
Quentin Colombet	846219ae10	[AArch64] Get rid of some GlobalISel ifdefs. llvm-svn: 265725	2016-04-07 21:24:40 +00:00
Quentin Colombet	6cc73ce808	[AArch64] gcc does not like litteral without quotes even on preprocessor macros. llvm-svn: 265720	2016-04-07 20:49:15 +00:00
Quentin Colombet	789ad56248	[AArch64][CallLowering] Do not build the API if GlobalISel is not built. This gets rid of some ifdefs and dummy implementations that were here just to fill the blanks. llvm-svn: 265719	2016-04-07 20:47:51 +00:00
Quentin Colombet	d4131814b3	[GlobalISel] Add RegBankSelect hooks into the pass pipeline. Now, RegBankSelect will happen after the IRTranslation and the target may optionally add additional passes in between. llvm-svn: 265716	2016-04-07 20:27:33 +00:00
Jan Vesely	43b7b5b846	AMDGPU/SI: Implement atomic load/store for i32 and i64 Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709	2016-04-07 19:23:11 +00:00
Tom Stellard	9112758077	AMDGPU/SI: Add latency for export instructions Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708	2016-04-07 18:30:05 +00:00
Quentin Colombet	d21115876c	[RegisterBank] Rename RegisterBank::contains into RegisterBank::covers. llvm-svn: 265695	2016-04-07 17:09:39 +00:00
Ulrich Weigand	79391ee0f2	[SystemZ] Fix build break from r265689 Fix build error seen on some build bots due to: error: default label in switch which covers all enumeration values llvm-svn: 265693	2016-04-07 16:33:25 +00:00
Kevin B. Smith	3802c4af59	[X86]: Fix for PR27251. Differential Revision: http://reviews.llvm.org/D18850 llvm-svn: 265690	2016-04-07 16:15:34 +00:00
Ulrich Weigand	2eb027d21f	[SystemZ] Implement conditional returns Return is now considered a predicable instruction, and is converted to a newly-added CondReturn (which maps to BCR to %r14) instruction by the if conversion pass. Also, fused compare-and-branch transform knows about conditional returns, emitting the proper fused instructions for them. This transform triggers on a lot of tests, hence the huge diffstat. The changes are mostly jX to br %r14 -> bXr %r14. Author: koriakin Differential Revision: http://reviews.llvm.org/D17339 llvm-svn: 265689	2016-04-07 16:11:44 +00:00
Ehsan Amiri	4701a91e59	[PPC] Enable transformations in PPCPassConfig::addIRPasses at O2 http://reviews.llvm.org/D18562 A large number of testcases has been modified so they pass after this test. One testcase is deleted, because I realized even after undoing the original change that was committed with this testcase, the testcase still passes. So I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll) is an arbitrary change to keep it passing. Given the original intention of the testcase, and the fact that fixing it will require some time to change the testcase, we concluded that this quick change will be enough. llvm-svn: 265683	2016-04-07 15:30:55 +00:00
Tom Stellard	d37630e461	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678	2016-04-07 14:47:07 +00:00
Valery Pykhtin	e23b6deb01	[AMDGPU] fix readlane/readfirstlane src vgpr operand type. For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand). readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding). Differential Revision: http://reviews.llvm.org/D18696 llvm-svn: 265670	2016-04-07 13:41:51 +00:00
Benjamin Kramer	4fb78518f1	Make helper functions static. NFC. llvm-svn: 265653	2016-04-07 10:10:09 +00:00
Simon Pilgrim	d54bae6525	[X86][SSE] Add support for VZEXT constant folding llvm-svn: 265646	2016-04-07 07:52:45 +00:00
Ahmed Bougacha	1cf67fb9cb	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636	2016-04-07 02:07:10 +00:00
Quentin Colombet	b073c12912	[AArch64] Teach RegisterBankInfo about the CC register bank. We need to cover each register class with a register bank. llvm-svn: 265629	2016-04-07 00:39:29 +00:00
Quentin Colombet	cbc353a422	[AArch64] Teach RegisterBankInfo about the mapping of register classes on register banks. llvm-svn: 265626	2016-04-07 00:14:30 +00:00
Hans Wennborg	ab16be799c	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623	2016-04-07 00:05:49 +00:00
JF Bastien	800f87a871	NFC: make AtomicOrdering an enum class Summary: In the context of http://wg21.link/lwg2445 C++ uses the concept of 'stronger' ordering but doesn't define it properly. This should be fixed in C++17 barring a small question that's still open. The code currently plays fast and loose with the AtomicOrdering enum. Using an enum class is one step towards tightening things. I later also want to tighten related enums, such as clang's AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI' enum). This change touches a few lines of code which can be improved later, I'd like to keep it as NFC for now as it's already quite complex. I have related changes for clang. As a follow-up I'll add: bool operator<(AtomicOrdering, AtomicOrdering) = delete; bool operator>(AtomicOrdering, AtomicOrdering) = delete; bool operator<=(AtomicOrdering, AtomicOrdering) = delete; bool operator>=(AtomicOrdering, AtomicOrdering) = delete; This is separate so that clang and LLVM changes don't need to be in sync. Reviewers: jyknight, reames Subscribers: jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D18775 llvm-svn: 265602	2016-04-06 21:19:33 +00:00
Ehsan Amiri	322eca3849	[PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP http://reviews.llvm.org/D18405 When the integer value loaded is never used directly as integer we should use VSX or Floating Point Facility integer loads and avoid extra direct move llvm-svn: 265593	2016-04-06 20:12:29 +00:00
James Y Knight	037b9894bd	Put quotes around #error string. GCC reports "missing terminating ' character", even when it's being skipped by preprocessing. llvm-svn: 265590	2016-04-06 19:52:32 +00:00
Nicolai Haehnle	df3a20cd80	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Quentin Colombet	4f03c0b806	[AArch64] Change the CMake to avoid to build GlobalISel related APIs when GISel is not built. The positive side effects are: - We do not have to define dummy implementation - We do not have to do weird gymnastic to avoid like issues (like missing constructor or vtable for the base classes) llvm-svn: 265570	2016-04-06 17:38:12 +00:00
Quentin Colombet	c17f744001	[AArch64] Teach the subtarget how to get to the RegisterBankInfo. Rework the access to GlobalISel APIs to contain how much of the APIs we need to access for the final executable to build when GlobalISel is not built. This prevents massive usage of ifdefs in various places. Now, all the GlobalISel ifdefs will be happing only in AArch64TargetMachine.cpp. llvm-svn: 265567	2016-04-06 17:26:03 +00:00
Hans Wennborg	6849f8f15f	Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC." It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559	2016-04-06 16:44:38 +00:00
Hans Wennborg	a7e396b5ef	Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551	2016-04-06 16:10:20 +00:00
Quentin Colombet	d1d324b2ae	[AArch64] Use the default constructor of RegisterBankInfo when GlobalISel is not built. This will avoid link-time error as the defautl constructor of RegisterBankInfo is the only one available when GlobalISel is not built. llvm-svn: 265549	2016-04-06 15:53:13 +00:00
Sam Kolton	ff90c60a78	[AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp. Summary: 1. Disable DPP encoding for instructions that do not support it: - VOP1: - v_readfirstlane_b32 - v_clrexcp - v_movreld_b32 - v_movrels_b32 - v_movrelsd_b32 - VOP2: - v_madmk_f16/32 - v_madak_f16/32 - VOPC, VINTRP, VOP3 2. Fix DPP for v_nop 3. New DPP tests for VOP1 and VOP2 instructions Reviewers: nhaustov, tstellarAMD, vpykhtin Subscribers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D18552 llvm-svn: 265538	2016-04-06 13:29:59 +00:00
Evgeny Astigeevich	9c24ebfa6d	[AArch64][CodeGen] NFC refactor AArch64InstrInfo::optimizeCompareInstr to prepare it for fixing a bug in it AArch64InstrInfo::optimizeCompareInstr has a bug which causes generation of incorrect code (PR#27158). The patch refactors the function to simplify reviewing the fix of the bug. 1. Function name ‘modifiesConditionCode’ is changed to ‘areCFlagsAccessedBetweenInstrs’ to reflect that the function can check modifying accesses, reading accesses or both. 2. Function ‘AArch64InstrInfo::optimizeCompareInstr’ - Documented the function - Cmp_NZCV is DeadNZCVIdx to reflect that it is an operand index of dead NZCV - The code for the case of substituting CmpInstr is put into separate functions the main of them is ‘substituteCmpInstr’. Differential Revision: http://reviews.llvm.org/D18609 llvm-svn: 265531	2016-04-06 11:39:00 +00:00
Chuang-Yu Cheng	6e1408a891	[ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap test. http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708 llvm-svn: 265528	2016-04-06 10:48:36 +00:00
Matthias Braun	8e594fdf19	AArch64: Fix compile error Fixed to adapt a use of enterBasicBlock() in my last commit (because I had follow on patches in my repository that change the code). llvm-svn: 265513	2016-04-06 02:59:44 +00:00
Matthias Braun	7dc03f060e	RegisterScavenger: Take a reference as enterBasicBlock() argument. Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511	2016-04-06 02:47:09 +00:00
Chuang-Yu Cheng	2e5973ef74	[ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and add a couple of test cases. This patch also passed llvm/clang bootstrap test, and spec2006 build/run/result validation. Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617 Great thanks to Tom's (tjablin) help, he contributed a lot to this patch. Thanks Hal and Kit's invaluable opinions! Reviewers: hfinkel kbarton http://reviews.llvm.org/D16315 llvm-svn: 265506	2016-04-06 02:04:38 +00:00
Chuang-Yu Cheng	024a623c55	[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance This patch implement the following instructions: - addpcis subpcis - maddhd maddhdu maddld - modsw moduw modsd modud - darn - extswsli extswsli. - setb - dtstsfi dtstsfiq Total 15 instructions Reviewers: nemanjai hfinkel tjablin amehsan kbarton http://reviews.llvm.org/D17885 llvm-svn: 265505	2016-04-06 01:47:02 +00:00
Chuang-Yu Cheng	eaf4b3d75c	[Power9] Implement copy-paste, msgsync, slb, and stop instructions This patch implements the following BookII and Book III instructions: - copy copy_first cp_abort paste paste. paste_last - msgsync - slbieg slbsync - stop Total 10 instructions Reviewers: nemanjai hfinkel tjablin amehsan kbarton llvm-svn: 265504	2016-04-06 01:46:45 +00:00
NAKAMURA Takumi	285c8ff753	AArch64CodeGen: Make AArch64RegisterBankInfo.cpp optional along LLVM_BUILD_GLOBAL_ISEL. llvm-svn: 265499	2016-04-06 01:18:08 +00:00
Quentin Colombet	5300950f3a	[AArch64] Initial implementation of the targeting of the register bank information. llvm-svn: 265489	2016-04-05 23:34:59 +00:00
Manman Ren	802cd6f9d7	Swift Calling Convention: swiftcc for ARM. Differential Revision: http://reviews.llvm.org/D18769 llvm-svn: 265482	2016-04-05 22:44:44 +00:00
Evgeniy Stepanov	dde29e2799	Faster stack-protector for Android/AArch64. Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481	2016-04-05 22:41:50 +00:00
Manman Ren	f8bdd88cd9	Swift Calling Convention: add swiftcc. Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480	2016-04-05 22:41:47 +00:00
Duncan P. N. Exon Smith	1de3c7e790	IR: Introduce ConstantAggregate, NFC Add a common parent class for ConstantArray, ConstantVector, and ConstantStruct called ConstantAggregate. These are the aggregate subclasses of Constant that take operands. This is mainly a cleanup, adding common `isa` target and removing duplicated code. However, it also simplifies caching which constants point transitively at `GlobalValue` (a possible future direction). llvm-svn: 265466	2016-04-05 21:10:45 +00:00
Duncan P. N. Exon Smith	91d3cfed78	Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes." This reverts commit r265454 since it broke the build. E.g.: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/ llvm-svn: 265459	2016-04-05 20:45:04 +00:00
Eugene Zelenko	1760dc2a23	Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes. Some Include What You Use suggestions were used too. Use anonymous namespaces in source files. Differential revision: http://reviews.llvm.org/D18778 llvm-svn: 265454	2016-04-05 20:19:49 +00:00
Ahmed Bougacha	50e6cd4a3a	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450	2016-04-05 20:02:57 +00:00
Ahmed Bougacha	629446ba03	[X86] Simplify early-exit check. NFC. llvm-svn: 265447	2016-04-05 20:02:22 +00:00
Sanjay Patel	4c7d094451	fix typo; NFC llvm-svn: 265442	2016-04-05 19:27:39 +00:00
Jacques Pienaar	42991b3e5a	[lanai] LanaiSetflagAluCombiner more conservative Summary: LanaiSetflagAluCombiner could previously combine instructions across basic building blocks even when not legal. Make the LanaiSetflagAluCombiner more conservative to avoid this. Reviewers: eliben Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18746 llvm-svn: 265411	2016-04-05 16:18:13 +00:00
Sam Parker	0d3a3a537c	[ARM] Cleanup of smul and smla instruction descriptions Removed the SDNode argument passed to the AI_smul and AI_smla multiclass definitions as they are always mul. Differential Revision: http://reviews.llvm.org/D18791 llvm-svn: 265409	2016-04-05 16:01:25 +00:00
Konstantin Zhuravlyov	e63e02cb0c	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408	2016-04-05 16:00:58 +00:00
Simon Dardis	d9d41f531e	[mips] MIPSR6 Compact jump support This patch adds support for compact jumps similiar to the previous compact branch support for MIPSR6. Unlike compact branches, compact jumps do not have a forbidden slot. As MipsInstrInfo::getEquivalentCompactForm can determine the correct expansion for jumps and branches for both microMIPS and MIPSR6, remove the unnecessary distinction in the delay slot filler. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders llvm-svn: 265390	2016-04-05 12:50:29 +00:00
Justin Holewinski	c79979299a	[NVPTX] Handle ldg created from sign-/zero-extended load Reviewers: jingyue Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D18053 llvm-svn: 265389	2016-04-05 12:38:01 +00:00
JF Bastien	1c3c223b65	Lanai: fix -Wsign-compare warning llvm-svn: 265368	2016-04-05 00:20:27 +00:00
JF Bastien	393b79ee00	Lanai: fix -Wpedantic warnings Extra semicolon. llvm-svn: 265365	2016-04-04 23:47:30 +00:00
Hans Wennborg	a47a692341	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345	2016-04-04 21:02:46 +00:00
Matthias Braun	870c34f0cf	ARM, AArch64, X86: Check preserved registers for tail calls. We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329	2016-04-04 18:56:13 +00:00
Derek Schuff	1dbf7a571f	Add MachineFunctionProperty checks for AllVRegsAllocated for target passes Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313	2016-04-04 17:09:25 +00:00
Daniel Sanders	b3c2764f89	[mips] Range check simm32 and fold MIPS16's imm32 into simm32. Summary: At this point we should be able to enable IAS by default for O32 without breaking check-all, or recursion. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18439 llvm-svn: 265302	2016-04-04 15:32:49 +00:00
Ulrich Weigand	99ac5045ab	[SystemZ] Add compare-and-branch instructions to MC This adds MC support for fused compare + indirect branch instructions, ie. CRB, CGRB, CLRB, CLGRB, CIB, CGIB, CLIB, CLGIB. They aren't actually generated yet -- this is preparation for their use for conditional returns in the next iteration of D17339. Author: koriakin Differential Revision: http://reviews.llvm.org/D18742 llvm-svn: 265296	2016-04-04 14:26:43 +00:00
Ulrich Weigand	a9ac6d6cc2	[SystemZ] Support ATOMIC_FENCE A cross-thread sequentially consistent fence should be lowered into z/Architecture's BCR serialization instruction, instead of causing a fatal error in the back-end. Author: bryanpkc Differential Revision: http://reviews.llvm.org/D18644 llvm-svn: 265292	2016-04-04 12:45:44 +00:00
Ulrich Weigand	f557d08325	[SystemZ] Support llvm.frameaddress/llvm.returnaddress intrinsics Enable the SystemZ back-end to lower FRAMEADDR and RETURNADDR, which previously would cause the back-end to crash. Currently, only a frame count of zero is supported. Author: bryanpkc Differential Revision: http://reviews.llvm.org/D18514 llvm-svn: 265291	2016-04-04 12:44:55 +00:00
Elena Demikhovsky	e99c561391	AVX-512: Truncating store for i1 vectors Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283	2016-04-04 07:17:47 +00:00
Simon Pilgrim	0edd3d771a	[X86] Removed duplicate code. llvm-svn: 265274	2016-04-03 20:40:35 +00:00
Simon Pilgrim	cd0dfc93eb	[X86][SSE] Support for MOVMSK signbit extraction instructions Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266	2016-04-03 18:22:03 +00:00
Simon Pilgrim	20d1d4f045	[X86] Tidied up X86ISD instruction nodes. NFCI. Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related. No change in ordering although there is definitely some scope for it. llvm-svn: 265263	2016-04-03 14:14:32 +00:00
Elena Demikhovsky	5e426f7356	AVX-512: Load and Extended Load for i1 vectors Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259	2016-04-03 08:41:12 +00:00
Jacques Pienaar	796975d311	[lanai] Fix for LanaiDelaySlotFiller and LanaiMCInstLower.cpp Summary: * Fix to stop delay slot filler from inserting SP modifying instructions in the newly expanded call/return instructions. * In LowerSymbol the outermost type was not LanaiMCExpr if there was a binary expression * Remove printExpr in LanaiInstPrinter Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18734 llvm-svn: 265251	2016-04-03 00:49:27 +00:00
Zoran Jovanovic	2b7cc5a4ae	[mips][microMIPS] Revert commits r264245 and r264248. Commit r264245 was the reason for failing tests in LLVM test suite. Commit r264248 depends on the first one. llvm-svn: 265249	2016-04-02 23:06:13 +00:00
Saleem Abdulrasool	85b43639b1	AArch64: support .cpu directive Add support for the AArch64 .cpu directive. This is a slightly involved directive since the parameter is actually a variable encoded string. The general structure is: <cpu>[[+-]<feature>]* We now map some of the supported string names for features for internal representation of feature flags. If we encounter one which we do not support, bail out as we cannot validate the assembly any longer. Resolves PR27010. llvm-svn: 265240	2016-04-02 19:29:52 +00:00
Tim Northover	5dad9df9f7	AArch64: avoid clobbering SP for dead MOVimm pseudos. We were producing ORR, which actually defines a GPR32sp rather than a GPR32. Should fix PR23209. llvm-svn: 265198	2016-04-01 23:14:52 +00:00
James Y Knight	e6a4646372	Remove useless check for ThreadModel==Single in ARMISelLowering. NFC. ThreadModel::Single is already handled already by ARMPassConfig adding LowerAtomicPass to the pass list, which lowers all atomics to non-atomic ops and deletes fences. So by the time we get to ISel, there's no atomic fences left, so they don't need special handling. llvm-svn: 265178	2016-04-01 19:33:19 +00:00
Tom Stellard	354a43c7bc	AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2} Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170	2016-04-01 18:27:37 +00:00
Sanjay Patel	9f413364d5	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161	2016-04-01 17:36:45 +00:00
Chad Rosier	8787a81023	[AArch64] Fix a typo. NFC. llvm-svn: 265160	2016-04-01 17:34:38 +00:00
Sanjay Patel	a05e0ff223	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148	2016-04-01 16:27:14 +00:00
Valery Pykhtin	5b3559c1ec	[AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields. $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141	2016-04-01 13:13:12 +00:00
Andrea Di Biagio	8c48841907	[x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type. Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138	2016-04-01 12:25:44 +00:00
Sagar Thakur	48973d21e1	[MIPS][LLVM-MC] Fix JR encoding for MIPSR6 ISA Summary: The assembler was picking the wrong JR variant because the pre-R6 one was still enabled at R6. Author: nitesh.jain Reviewers: vkalintiris, dsanders Subscribers: dsanders, llvm-commits, mohit.bhakkad, sagar, bhushan, jaydeep Differential: D18387 llvm-svn: 265134	2016-04-01 11:55:33 +00:00
Andrey Turetskiy	958eb46443	[X86] Introduce Lakemont CPU. Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128	2016-04-01 10:16:15 +00:00
James Molloy	b876c72bcc	Fix for pr24346: arm asm label calculation error in sub Some ARM instructions encode 32-bit immediates as a 8-bit integer (0-255) and a 4-bit rotation (0-30, even) in its least significant 12 bits. The original fixup, FK_Data_4, patches the instruction by the value bit-to-bit, regardless of the encoding. For example, assuming the label L1 and L2 are 0x0 and 0x104 respectively, the following instruction: add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260 would be assembled to the following, which adds 1 to r0, instead of 260: e2800104 add r0, r0, #4, 2 ; equivalently 1 The new fixup kind fixup_arm_mod_imm takes care of the encoding: e2800f41 add r0, r0, #260 Patch by Ting-Yuan Huang! llvm-svn: 265122	2016-04-01 09:40:47 +00:00
Oliver Stannard	a5520b02a5	[AArch64] Better errors for out-of-range fixups When a fixup that can be resolved by the assembler is out of range, we should report an error in the source, rather than crashing. Differential Revision: http://reviews.llvm.org/D18402 llvm-svn: 265120	2016-04-01 09:14:50 +00:00
Chuang-Yu Cheng	f8b592f213	[PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail call branch instruction might disappear Bug Pattern: # BB#0: # %entry cmpldi 3, 0 beq- 0, .LBB0_2 # BB#1: # %exit lwz 4, 0(3) #TC_RETURNd8 LVComputationKind 0 .LBB0_2: # %cond.false mflr 0 std 0, 16(1) stdu 1, -96(1) .Ltmp0: .cfi_def_cfa_offset 96 .Ltmp1: .cfi_offset lr, 16 bl __assert_fail nop The branch instruction for tail call return is not generated, because the shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit block is not sent to emitEpilogue, that's why the branch is not generated. Thanks Kit's opinions! Reviewers: nemanjai hfinkel tjablin kbarton http://reviews.llvm.org/D17606 llvm-svn: 265112	2016-04-01 06:44:32 +00:00
Michael Kuperstein	7bab713188	Use range-based for loops. NFC. llvm-svn: 265105	2016-04-01 03:45:08 +00:00
Matthias Braun	cc7fba40fe	AArch64ISelLowering: Remove unused variables/arguments; NFC llvm-svn: 265098	2016-04-01 02:49:17 +00:00
Justin Lebar	96418481bc	[NVPTX] Add a truncate DAG node to some calls. Summary: Previously, we were running afoul of the assertion EVT(CLI.Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!" in SelectionDAGBuilder.cpp when running the NVPTX/i8-param.ll test. This is because our backend (for some reason) treats small return values as i32, but it wasn't ever truncating the i32 back down to the expected width in the DAG. Unclear to me whether this fixes any actual bugs -- in this test, at least, the generated code is unchanged. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: http://reviews.llvm.org/D17872 llvm-svn: 265091	2016-04-01 01:09:10 +00:00
Justin Lebar	efcc81cbb4	[NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect. Summary: Previously the NVVMReflect pass would read its configuration from command-line flags or a static configuration given to the pass at instantiation time. This doesn't quite work for clang's use-case. It needs to pass a value for __CUDA_FTZ down on a per-module basis. We use a module flag for this, so the NVVMReflect pass needs to be updated to read said flag. Reviewers: tra, rnk Subscribers: cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18672 llvm-svn: 265090	2016-04-01 01:09:07 +00:00
Justin Lebar	645c3014a1	[NVPTX] Annotate some instructions as hasSideEffects = 0. Summary: Tablegen tries to infer this from the selection DAG patterns defined for the instructions, but it can't always. An instructive example is CLZr64. CLZr32 is correctly inferred to have no side-effects, but the selection DAG pattern for CLZr64 is slightly more complicated, and in particular the ctlz DAG node is not at the root of the pattern. Thus tablegen can't infer that CLZr64 has no side-effects. Reviewers: jholewinski Subscribers: jholewinski, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17472 llvm-svn: 265089	2016-04-01 01:09:05 +00:00
Hans Wennborg	649159df3c	Follow-up to r265036: I got these iterators mixed up llvm-svn: 265076	2016-03-31 23:55:16 +00:00
Jun Bum Lim	760afcb338	[AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth() Summary: This change will allow loads with imp-def to be clustered in machine-scheduler pass. areMemAccessesTriviallyDisjoint() can also handle loads with imp-def. Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18665 llvm-svn: 265051	2016-03-31 20:53:47 +00:00
Hal Finkel	fc35391f2b	[PowerPC] Add a late MI-level pass for QPX load/splat simplification Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). llvm-svn: 265047	2016-03-31 20:39:41 +00:00
Hans Wennborg	132cd62121	Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046	2016-03-31 20:27:30 +00:00
Benjamin Kramer	569efd2cfd	[ARM] Expand v1i64 and v2i64 ctpop. The default is legal, which results in 'Cannot select' errors. This is triggered during selfhost due to a recent cost model change. llvm-svn: 265040	2016-03-31 19:42:04 +00:00
Hans Wennborg	e97fb414e8	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039	2016-03-31 19:26:24 +00:00
Hans Wennborg	e1a2e90ffa	Change eliminateCallFramePseudoInstr() to return an iterator This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036	2016-03-31 18:33:38 +00:00
Jacques Pienaar	4badd6aaf3	[lanai] isBrImm should accept any non-constant immediate. isBrImm should accept any non-constant immediate. Previously it was only accepting LanaiMCExpr ones which was wrong. Differential Revision: http://reviews.llvm.org/D18571 llvm-svn: 265032	2016-03-31 17:58:55 +00:00
Ehsan Amiri	99b017ae35	[PPC] basic support for Power 9 direct move instructions http://reviews.llvm.org/D18097 Initial support does not include any patterns to generate this instructions llvm-svn: 265031	2016-03-31 17:47:17 +00:00
Sanjay Patel	92d5ea5e07	[x86] use SSE/AVX ops for non-zero memsets (PR27100) Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029	2016-03-31 17:30:06 +00:00
Ulrich Weigand	3707ba8030	[PowerPC] Correctly compute 64-bit offsets in fast isel PPCSimplifyAddress contains this code: IntegerType OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(Context) : Type::getInt64Ty(Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually always MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. llvm-svn: 265023	2016-03-31 15:37:06 +00:00
Nemanja Ivanovic	a621a7f9c3	[PowerPC] Basic support for P9 atomic loads and stores This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx llvm-svn: 265022	2016-03-31 15:26:37 +00:00
Jun Bum Lim	cf9744367b	[AArch64] Handle missing store pair opportunity Summary: This change will handle missing store pair opportunity where the first store instruction stores zero followed by the non-zero store. For example, this change will convert : str wzr, [x8] str w1, [x8, #4] into: stp wzr, w1, [x8] Reviewers: jmolloy, t.p.northover, mcrosier Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18570 llvm-svn: 265021	2016-03-31 14:47:24 +00:00
Ulrich Weigand	1931b01a64	[PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct if and only if the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 llvm-svn: 265020	2016-03-31 14:44:50 +00:00
Daniel Sanders	85fd10bd93	[mips] Range check simm16 Summary: There are too many instructions to exhaustively test so addiu and lwc2 are used as representative examples. It should be noted that many memory instructions that should have simm16 range checking do not because it is also necessary to support the macro of the same name which accepts simm32. The range checks for these occur in the macro expansion. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18437 llvm-svn: 265019	2016-03-31 14:34:00 +00:00
Daniel Sanders	eab3146156	[mips] Range check simm11 and mem_simm11. Summary: ldc2/sdc2 now emit slightly worse diagnostics for MIPS-I. The problem is that they don't trigger the custom parser because all the candidates are disabled by feature bits. On all other subtargets, the diagnostics are accurate but are subject to the usual issues of needing to report multiple ways to correct the code (e.g. smaller offset, enable a CPU feature) but only being able to report one error. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18436 llvm-svn: 265018	2016-03-31 14:23:20 +00:00
Sam Kolton	1048fb1818	[AMDGPU] Disassembler: support for DPP Review: http://reviews.llvm.org/D18642 llvm-svn: 265015	2016-03-31 14:15:04 +00:00
Daniel Sanders	dc0602a2c2	[mips] Split mem_msa into range checked mem_simm10 and mem_simm10_lsl[123] Summary: Also, made test_mi10.s formatting consistent with the majority of the MC tests. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18435 llvm-svn: 265014	2016-03-31 14:12:01 +00:00
Nirav Dave	83ce54aac2	Prevent X86ISelLowering from merging volatile loads Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013	2016-03-31 13:40:55 +00:00
Daniel Sanders	2e9f69d933	[mips] Range check simm9 and fix a bug this revealed. Summary: The bug was that microMIPS's [ls]w[lr]e instructions claimed to support a 12-bit offset when it is only 9-bit. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18434 llvm-svn: 265010	2016-03-31 13:15:23 +00:00
Zlatko Buljan	6221be8e46	[mips][microMIPS] Implement MFC, MFHC and DMFC* instructions Differential Revision: http://reviews.llvm.org/D17334 llvm-svn: 265002	2016-03-31 08:51:24 +00:00
Jonas Paulsson	2ba315218b	Indentation fix in SystemZInstrInfo.cpp llvm-svn: 265000	2016-03-31 08:00:14 +00:00
Craig Topper	d2aa03a60a	[X86] Use MVT instead of EVT in code called after legalization. llvm-svn: 264992	2016-03-31 04:37:41 +00:00
Hal Finkel	851b33a0b1	[PowerPC] Load two floats directly instead of using one 64-bit integer load When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) llvm-svn: 264988	2016-03-31 02:56:05 +00:00
Hans Wennborg	6596977130	[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966	2016-03-30 23:38:01 +00:00
Matthias Braun	8d41436004	CodeGen: Factor out code for tail call result compatibility check; NFC llvm-svn: 264959	2016-03-30 22:46:04 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Simon Pilgrim	c49bd2ede0	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922	2016-03-30 20:52:24 +00:00
Justin Lebar	e3804cc932	[NVPTX] Make NVVMReflect a function pass. Summary: Currently it's a module pass. Make it a function pass so that we can move it to PassManagerBuilder's EP_EarlyAsPossible extension point, which only accepts function passes. Reviewers: rnk Subscribers: tra, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18615 llvm-svn: 264919	2016-03-30 20:40:11 +00:00
Chad Rosier	f7ac5f28ab	[AArch64] Fix warnings pointed out by Hal. llvm-svn: 264882	2016-03-30 18:08:51 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Jonas Paulsson	f76123386a	[SystemZ] Add nop and nopr InstAliases. For compatability with GAS, nop and nopr are recognized as alises for bc and bcr, respectively. A mask of 0 turns these instructions effectively into no-operations. Reviewed by Ulrich Weigand. llvm-svn: 264875	2016-03-30 16:11:58 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00
Simon Pilgrim	b87ffe8519	[X86][XOP] BITREVERSE lowering using VPPERM XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870	2016-03-30 14:14:00 +00:00
Benjamin Kramer	9415e06da7	[NVPTX] Avoid temporary std::string and make single-use function local to the cpp file. No functionality change intended. llvm-svn: 264861	2016-03-30 12:31:51 +00:00
Chandler Carruth	8e06a10d1f	[x86] Fix a horrible bug in our lowering of x86 floating point atomic operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845	2016-03-30 08:41:59 +00:00
Chandler Carruth	81c3ddeb1c	[x86] Extract a helper function to compute the full addressing mode from an x86 MachineInstr's operands. This will be super useful to fix some bad atomics code in my next commit. No functionality changed. llvm-svn: 264819	2016-03-30 03:10:24 +00:00
Adam Nemet	fb8fbba584	[Aarch64] Turn on the LoopDataPrefetch pass for Cyclone llvm-svn: 264811	2016-03-30 00:21:29 +00:00
Adam Nemet	b81f1e0db3	[PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distance After the previous change, this can now be overridden centrally in the pass. llvm-svn: 264807	2016-03-29 23:45:56 +00:00
Adam Nemet	1428d41f9a	[LoopDataPrefetch] Centralize the tuning cl::opts under the pass This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806	2016-03-29 23:45:52 +00:00
James Y Knight	7306cd47d4	[SPARC] Use AtomicExpandPass to expand AtomicRMW instructions. They were previously expanded to CAS loops in a custom isel expansion, but AtomicExpandPass knows how to do that generically. Testing is covered by the existing sparc atomics.ll testcases. llvm-svn: 264771	2016-03-29 19:09:54 +00:00
Manman Ren	f46262e0b7	Swift Calling Convention: add swiftself attribute. Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754	2016-03-29 17:37:21 +00:00
Konstantin Zhuravlyov	ecc7cbf611	Test commit access llvm-svn: 264736	2016-03-29 15:15:44 +00:00
Simon Dardis	9a3f32c00d	[mips] Test commit: Mark insertNoop as dead code (NFC) llvm-svn: 264728	2016-03-29 13:02:19 +00:00
Daniel Sanders	5d3840fdf9	[mips] Correct MIPS16 jal/jalx to have uimm26 offsets and add MC layer range checks. NFC. Summary: However, this has no effect at this time because the instructions affected are marked 'isCodeGenOnly=1' and have no alternative for the MC layer. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18179 llvm-svn: 264712	2016-03-29 09:40:38 +00:00
Elena Demikhovsky	95629caaa9	AVX-512: fixed a bug in fp_to_uint pattern on KNL Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701	2016-03-29 06:33:41 +00:00
Hal Finkel	fa7057a415	[PowerPC] Refactor popcnt[dw] target features Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690	2016-03-29 01:36:01 +00:00
Derek Schuff	ecabac6244	[WebAssembly] Remove duplicate disabling of passes Also put all the disabled passes together llvm-svn: 264684	2016-03-28 22:52:20 +00:00
Hal Finkel	69ada2f514	[PowerPC] Clarify a comment in PPCTTI about vector loads This should say that we could do unaligned vector loads on the P7 using VSX instructions, not that we should. llvm-svn: 264683	2016-03-28 22:39:35 +00:00
Simon Pilgrim	d3df400fa9	[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666	2016-03-28 21:33:52 +00:00
Haicheng Wu	6a6bc750d5	[AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize Mimic what x86 does when optimizing sdiv/udiv for minsize. llvm-svn: 264606	2016-03-28 18:17:07 +00:00
Hal Finkel	7059d41622	[PowerPC] On the A2, popcnt[dw] are very slow The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600	2016-03-28 17:52:08 +00:00
Derek Schuff	ad154c837e	Introduce MachineFunctionProperties and the AllVRegsAllocated property MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593	2016-03-28 17:05:30 +00:00
Tom Stellard	a76bcc2ea1	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Krzysztof Parzyszek	2d65ea74dc	[Hexagon] Improve handling of unaligned vector loads and stores llvm-svn: 264584	2016-03-28 15:43:03 +00:00
Krzysztof Parzyszek	bb63f66686	[Hexagon] Only use restore functions for single register at -Oz llvm-svn: 264581	2016-03-28 14:52:21 +00:00
Krzysztof Parzyszek	a34901aae9	[Hexagon] Speed up frame lowering when no optimizations are enabled - Do not optimize stack slots in optnone functions. - Get aligned-base register from HexagonMachineFunctionInfo instead of looking for ALIGNA instruction in the function's body. llvm-svn: 264580	2016-03-28 14:42:03 +00:00
Douglas Katzman	d0c11cf7ad	Sparc: silently ignore .proc assembler directive Differential Revision: http://reviews.llvm.org/D18463 llvm-svn: 264579	2016-03-28 14:00:11 +00:00
Jacques Pienaar	fcef3e4617	[lanai] Add Lanai backend. Add the Lanai backend to lib/Target. General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html). Differential Revision: http://reviews.llvm.org/D17011 llvm-svn: 264578	2016-03-28 13:09:54 +00:00
Chuang-Yu Cheng	d5eb774eb6	[Power9] Implement new altivec instructions: bcd* series This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568	2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng	80722719eb	[Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567	2016-03-28 08:34:28 +00:00
Elena Demikhovsky	83f0647d85	AVX-512: Fixed ICMP instruction selection for i1 operands ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566	2016-03-28 07:47:58 +00:00
Chuang-Yu Cheng	5663848996	[Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 llvm-svn: 264565	2016-03-28 07:38:01 +00:00
Hal Finkel	0b37175ca6	[PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532	2016-03-27 05:40:56 +00:00
Simon Pilgrim	dcdf85033c	[X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517	2016-03-26 18:32:13 +00:00
Simon Pilgrim	e4dbeb40c6	[X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512	2016-03-26 15:44:55 +00:00
Simon Pilgrim	3eef33a806	[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511	2016-03-26 15:27:20 +00:00
Simon Pilgrim	7379a70677	[X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509	2016-03-26 09:44:27 +00:00
David Majnemer	b549ab02b4	[PowerPC] Disable the CTR optimization in the presence of {min,max}num The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508	2016-03-26 09:42:31 +00:00
Simon Pilgrim	ff7b7141cd	[X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC. LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506	2016-03-26 09:29:04 +00:00
Chuang-Yu Cheng	065969ec8e	[Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504	2016-03-26 05:46:11 +00:00
David Majnemer	020e890a19	[X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465	2016-03-25 21:49:11 +00:00

... 4 5 6 7 8 ...

37221 Commits