llvm-project

Commit Graph

Author	SHA1	Message	Date
Colin LeMahieu	d7a56fd9ff	[Hexagon] Updating constant extender def, adding alu-not instructions, compare to general register, and inverted compares. llvm-svn: 224989	2014-12-30 15:44:17 +00:00
Rafael Espindola	b22d5aa49a	Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 224985	2014-12-30 13:13:27 +00:00
Colin LeMahieu	651b72095b	[Hexagon] Adding allocframe, post-increment circular immediate stores, post-increment circular register stores, and bit reversed post-increment stores. llvm-svn: 224957	2014-12-29 21:33:45 +00:00
Colin LeMahieu	488b6f7bbc	[Hexagon] Fixing 224952 where an addressing mode update was missed. llvm-svn: 224955	2014-12-29 21:18:02 +00:00
Colin LeMahieu	bda31b42a0	[Hexagon] Adding post-increment register form stores and register-immediate form stores with tests. llvm-svn: 224952	2014-12-29 20:44:51 +00:00
Colin LeMahieu	9a3cd3f58c	[Hexagon] Replacing the remaining postincrement stores with versions that have encoding bits. llvm-svn: 224951	2014-12-29 20:00:43 +00:00
Colin LeMahieu	3d34afb32d	[Hexagon] Renaming old multiclass for removal. Adding post-increment store classes and instruction defs. llvm-svn: 224949	2014-12-29 19:42:14 +00:00
Craig Topper	31d6d9a0fb	[X86] Fix some cases where some 8-bit instructions were marked as being convertible to three address instructions, but aren't really. llvm-svn: 224940	2014-12-29 16:25:26 +00:00
Craig Topper	874a1966ae	[X86] Add the 0x82 instructions to the disassebmler. They are identical in functionality to the 0x80 opcode instructions, but are not valid in 64-bit mode. llvm-svn: 224939	2014-12-29 16:25:23 +00:00
Craig Topper	c51b7993b8	[x86] Refactor some tablegen instruction info classes slightly to prepare for another change. NFC. llvm-svn: 224938	2014-12-29 16:25:22 +00:00
Craig Topper	56c8e05c24	[x86] Remove unused classes from tablegen instruction info. llvm-svn: 224937	2014-12-29 16:25:19 +00:00
Rafael Espindola	44eae72c40	Add segmented stack support for DragonFlyBSD. Patch by Michael Neumann. llvm-svn: 224936	2014-12-29 15:47:28 +00:00
Rafael Espindola	bed67f3adc	Refactor duplicated code. No intended functionality change. llvm-svn: 224935	2014-12-29 15:18:31 +00:00
Keno Fischer	fd22c6693b	[X86][ISel] Fix a regression I introduced in r224884 The else case ResultReg was not checked for validity. To my surprise, this case was not hit in any of the existing test cases. This includes a new test cases that tests this path. Also drop the `target triple` declaration from the original test as suggested by H.J. Lu, because apparently with it the test won't be run on Linux llvm-svn: 224901	2014-12-28 15:20:57 +00:00
Michael Kuperstein	683c3cde43	[X86] Add missing memory variants to AVX false dependency breaking Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246) Differential Revision: http://reviews.llvm.org/D6780 llvm-svn: 224900	2014-12-28 13:15:05 +00:00
Andrea Di Biagio	22ee3f63b9	[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899	2014-12-28 11:07:35 +00:00
Craig Topper	6e3a582809	[x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Correctly this time. I did the wrong patterns the first time. llvm-svn: 224891	2014-12-27 20:08:45 +00:00
David Majnemer	d0bcef2040	PowerPC: CTR shouldn't fire if a TLS call is in the loop Determining the address of a TLS variable results in a function call in certain TLS models. This means that a simple ICmpInst might actually result in invalidating the CTR register. In such cases, do not attempt to rely on the CTR register for loop optimization purposes. This fixes PR22034. Differential Revision: http://reviews.llvm.org/D6786 llvm-svn: 224890	2014-12-27 19:45:38 +00:00
Aaron Ballman	4eb5c2e089	Fixing another -Wunused-variable warning, this time in release builds without asserts. NFC. llvm-svn: 224889	2014-12-27 19:17:53 +00:00
Aaron Ballman	b66d54c549	Removing a variable that is set but never used, to silence a -Wunused-but-set-variable warning; NFC. llvm-svn: 224888	2014-12-27 19:01:19 +00:00
Craig Topper	1113fb343e	[x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Forgot to do this when I did SSE/SSE2/AVX/AVX2. llvm-svn: 224887	2014-12-27 18:51:06 +00:00
Craig Topper	53f75b9dc0	[x86] Assert on invalid immediates in the instruction printer for cmp.ps/pd/ss/sd instead of truncating the immediate. The assembly parser and instruction selection shouldn't generate invalid immediates. llvm-svn: 224886	2014-12-27 18:11:00 +00:00
Craig Topper	acc73445b7	[x86] Prevent llvm.x86.cmp.ps/pd/ss/sd from being selected with bad immediates. The frontend now checks this when the builtin is used. This will allow the instruction printer to not have to deal with invalid immediates on these instructions. llvm-svn: 224885	2014-12-27 18:10:56 +00:00
Keno Fischer	8438b08663	[FastIsel][X86] Fix invalid register replacement for bool args Summary: Consider the following IR: %3 = load i8* undef %4 = trunc i8 %3 to i1 %5 = call %jl_value_t.0* @foo(..., i1 %4, ...) ret %jl_value_t.0* %5 Bools (that are the result of direct truncs) are lowered as whatever the argument to the trunc was and a "and 1", causing the part of the MBB responsible for this argument to look something like this: %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7 Later, when the load is lowered, it will insert %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14 but remember to (at the end of isel) replace vreg7 by vreg15. Now for the bug. In fast isel lowering, we mistakenly mark vreg8 as the result of the load instead of the trunc. This adds a fixup to have vreg8 replaced by whatever the result of the load is as well, so we end up with %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15 which is an SSA violation and causes problems later down the road. This fixes PR21557. Test Plan: Test test case from PR21557 is added to the test suite. Reviewers: ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6245 llvm-svn: 224884	2014-12-27 13:10:15 +00:00
Colin LeMahieu	8233fb002d	[Hexagon] Adding auto-incrementing loads with and without byte reversal. llvm-svn: 224871	2014-12-26 21:09:25 +00:00
Colin LeMahieu	0a721cd4e1	[Hexagon] Adding locked loads. llvm-svn: 224870	2014-12-26 20:42:27 +00:00
Colin LeMahieu	ff370ed90e	[Hexagon] Adding deallocframe and circular addressing loads. llvm-svn: 224869	2014-12-26 20:30:58 +00:00
Colin LeMahieu	c83cbbf6a1	[Hexagon] Adding remaining post-increment instruction variants. Removing unused classes. llvm-svn: 224868	2014-12-26 19:31:46 +00:00
Colin LeMahieu	fe9612e09d	[Hexagon] Adding post-increment unsigned byte loads. llvm-svn: 224867	2014-12-26 19:12:11 +00:00
Colin LeMahieu	96976a10a3	[Hexagon] Adding post-increment signed byte loads with tests. llvm-svn: 224866	2014-12-26 18:57:13 +00:00
Craig Topper	c4b12166f2	[X86] Add the debug registers DR8-DR15 so we can assemble and disassemble references to them. llvm-svn: 224862	2014-12-26 18:20:05 +00:00
Craig Topper	d5b39237a1	[X86] Don't fail disassembly if REX.R/REX.B is used on an MMX register. Similar fix to not fail to disassembler CR9-CR15 references. llvm-svn: 224861	2014-12-26 18:19:44 +00:00
Craig Topper	ee9eef2fd8	Teach disassembler to handle illegal immediates on (v)cmpps/pd/ss/sd instructions. Instead of rejecting we'll just generate the _alt forms that don't try to alter the mnemonic. While I'm here, merge some common code in the Instruction printers for the condition code replacement and fix the mask on SSE to be 3-bits instead of 4. llvm-svn: 224846	2014-12-26 06:36:28 +00:00
Craig Topper	2e44492b1d	Use MCPhysReg for table of register encodings. llvm-svn: 224845	2014-12-26 06:36:23 +00:00
Hal Finkel	0c505b08a5	[PowerPC] [FastISel] i1 constants must be zero extended When materializing constant i1 values, they must be zero extended. We represent i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code path was dead for i1 values prior to r216006 (which is why this did not manifest in miscompiles until recently). Fixes -O0 self-hosting on PPC64/Linux. llvm-svn: 224842	2014-12-25 23:08:25 +00:00
Elena Demikhovsky	fb81b93e17	Masked Load/Store - Changed the order of parameters in intrinsics. No functional changes. The documentation is coming. llvm-svn: 224829	2014-12-25 07:49:20 +00:00
Saleem Abdulrasool	747ec2dda3	MC: address some comments in deprecation checks Bob Wilson pointed out the unnecessary checks that had been committed to the instruction check predicates. The check was meant to ensure that the check was not accidentally applied to non-ARM instructions. This is better served as an assertion rather than a condition check. llvm-svn: 224825	2014-12-24 18:40:42 +00:00
Craig Topper	b86338f7b2	[X86] Remove the single AdSize indicator and replace it with separate AdSize16/32/64 flags. This removes a hardcoded list of instructions in the CodeEmitter. Eventually I intend to remove the predicates on the affected instructions since in any given mode two of them are valid if we supported addr32/addr16 prefixes in the assembler. llvm-svn: 224809	2014-12-24 06:05:22 +00:00
Colin LeMahieu	e193e1c48b	[Hexagon] Removing old classes. llvm-svn: 224795	2014-12-24 00:43:00 +00:00
Hal Finkel	fc096c98f3	[PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64 On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction sequence is interpreted by the unwinding code in libgcc. To make sure these occur as a pair, as with other pairings interpreted by the linker, fuse the two instructions into one instruction (for code generation only). In the future, we might wish to do this by emitting CFI directives instead, but this solution is simpler, and mirrors what GCC does. Additional discussion on this point is contained in the PR. Fixes PR22015. llvm-svn: 224788	2014-12-23 22:29:40 +00:00
Colin LeMahieu	947cd70413	[Hexagon] Adding doubleword load. llvm-svn: 224787	2014-12-23 20:44:59 +00:00
Colin LeMahieu	026e88d317	[Hexagon] Reapplying 224775 load words. llvm-svn: 224786	2014-12-23 20:02:16 +00:00
Jozef Kolek	ab6d1cce3e	[mips][microMIPS] Implement CACHE, PREF, SSNOP, EHB and PAUSE instructions Differential Revision: http://reviews.llvm.org/D5204 llvm-svn: 224785	2014-12-23 19:55:34 +00:00
Colin LeMahieu	20be15718b	Reverting 224775 until mayLoad flag is addressed. llvm-svn: 224783	2014-12-23 19:22:59 +00:00
Colin LeMahieu	122aeaafea	[Hexagon] Adding word loads. llvm-svn: 224775	2014-12-23 18:06:56 +00:00
Colin LeMahieu	8e39cad934	[Hexagon] Adding signed halfword loads. llvm-svn: 224774	2014-12-23 17:25:57 +00:00
Colin LeMahieu	a9386d28a5	[Hexagon] Adding unsigned halfword load. llvm-svn: 224772	2014-12-23 16:42:57 +00:00
Jozef Kolek	12c6982b3b	[mips][microMIPS] Implement LWSP and SWSP instructions Differential Revision: http://reviews.llvm.org/D6416 llvm-svn: 224771	2014-12-23 16:16:33 +00:00
Elena Demikhovsky	fcea06acb5	AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets by Asaf Badouh http://reviews.llvm.org/D6456 llvm-svn: 224764	2014-12-23 10:30:39 +00:00
Hal Finkel	6e27c6d450	[PowerPC] Don't mark the return-address slot as immutable It is tempting to mark the fixed stack slot used to store the return address as immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the function, it is not completely immutable: it is written during the function prologue. When using post-RA instruction scheduling, the prologue instructions are available for scheduling, and we're not free to interchange the order of a particular store in the prologue with loads from that stack location. Fixes PR21976. llvm-svn: 224761	2014-12-23 09:45:06 +00:00
Elena Demikhovsky	3121449f0b	AVX-512: BLENDM - fixed encoding of the broadcast version Added more intrinsics and encoding tests. llvm-svn: 224760	2014-12-23 09:36:28 +00:00
Hal Finkel	04b16b51ec	[PowerPC] Don't attempt a 64-bit pow2 division on PPC32 In r224033, in moving the signed power-of-2 division expansion into BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a 64-bit division on PPC32. This later asserts. Fixes PR21928. llvm-svn: 224758	2014-12-23 08:38:50 +00:00
Ahmed Bougacha	4553bff412	[ARM] Don't break alignment when combining base updates into load/stores. r223862/r224203 tried to also combine base-updating load/stores. There was a mistake there: the alignment was added as is as an operand to the ARMISD::VLD/VST node. However, the VLD/VST selection logic doesn't care about less-than-standard alignment attributes. For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks VLD1q64 (because of the memory type). But VLD1q64 ("vld1.64 {dXX, dYY}") is 8-aligned, per ARMARMv7a 3.2.1. For the 1-aligned load, what we really want is VLD1q8. This commit introduces bitcasts if necessary, and changes the vld/vst type to one whose standard alignment matches the original load/store alignment. Differential Revision: http://reviews.llvm.org/D6759 llvm-svn: 224754	2014-12-23 06:07:31 +00:00
Alexey Samsonov	2c55974da5	Fix UBSan bootstrap: replace shift of negative value with multiplication. llvm-svn: 224752	2014-12-23 04:15:53 +00:00
Jim Grosbach	1bd0f3530e	X86: Don't over-align combined loads. When combining consecutive loads+inserts into a single vector load, we should keep the alignment of the base load. Doing otherwise can, and does, lead to using overly aligned instructions. In the included test case, for example, using a 32-byte vmovaps on a 16-byte aligned value. Oops. rdar://19190968 llvm-svn: 224746	2014-12-23 00:35:23 +00:00
Reid Kleckner	ce0093344f	Make musttail more robust for vector types on x86 Previously I tried to plug musttail into the existing vararg lowering code. That turned out to be a mistake, because non-vararg calls use significantly different register lowering, even on x86. For example, AVX vectors are usually passed in registers to normal functions and memory to vararg functions. Now musttail uses a completely separate lowering. Hopefully this can be used as the basis for non-x86 perfect forwarding. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6156 llvm-svn: 224745	2014-12-22 23:58:37 +00:00
Adrian Prantl	d9e64b6c08	Thumb1 frame lowering: Mark CFI instructions with the FrameSetup flag. Followup to r224294: ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224743	2014-12-22 23:09:14 +00:00
Colin LeMahieu	4b1eac4dda	[Hexagon] Adding memb instruction. Fixing whitespace in test from 224730. llvm-svn: 224735	2014-12-22 21:40:43 +00:00
Colin LeMahieu	af1e5de141	[Hexagon] Adding classes and load unsigned byte instruction, updating usages. llvm-svn: 224730	2014-12-22 21:20:03 +00:00
Bruno Cardoso Lopes	811c173523	[x86] Add vector @llvm.ctpop intrinsic custom lowering Currently, when ctpop is supported for scalar types, the expansion of @llvm.ctpop.vXiY uses vector element extractions, insertions and individual calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used for the scalar calls. Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY expansion in some cases by using a using a vector parallel bit twiddling approach, based on: v = v - ((v >> 1) & 0x55555555); v = (v & 0x33333333) + ((v >> 2) & 0x33333333); v = ((v + (v >> 4) & 0xF0F0F0F) v = v + (v >> 8) v = v + (v >> 16) v = v & 0x0000003F (from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) When scalar ctpop isn't supported, the approach above performs better for v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop is supported, this approach performs ~2x better for v8i32. Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop support with -march=core-avx2. == [x86_64h - new] v8i32: 0.661685 v4i32: 0.514678 v4i64: 0.652009 v2i64: 0.324289 == [x86_64h - old] v8i32: 1.29578 v4i32: 0.528807 v4i64: 0.65981 v2i64: 0.330707 == [x86_64 - new] v8i32: 1.003 v4i32: 0.656273 v4i64: 1.11711 v2i64: 0.754064 == [x86_64 - old] v8i32: 2.34886 v4i32: 1.72053 v4i64: 1.41086 v2i64: 1.0244 More work for other vector types will come next. llvm-svn: 224725	2014-12-22 19:45:43 +00:00
Elena Demikhovsky	949b0d46bf	AVX-512: Added all forms of BLENDM instructions, intrinsics, encoding tests for AVX-512F and skx instructions. llvm-svn: 224707	2014-12-22 13:52:48 +00:00
Karthik Bhat	bf662901c1	Lower multiply-negate operation to mneg on AArch64 This patch pattern matches code such as- neg w8, w8 mul w8, w9, w8 to mneg w8, w8, w9 Review: http://reviews.llvm.org/D6754 llvm-svn: 224706	2014-12-22 13:38:58 +00:00
Craig Topper	23fd69560b	[X86] Add hasSideEffects = 0 to CALLpcrel16. This matches what is inferred from patterns for the 32-bit version. llvm-svn: 224692	2014-12-21 20:05:06 +00:00
Matt Arsenault	22b4c256e1	Enable (sext x) == C --> x == (trunc C) combine Extend the existing code which handles this for zext. This makes this more useful for targets with ZeroOrNegativeOne BooleanContent and obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne) since the constant will now be shrunk to i1. llvm-svn: 224691	2014-12-21 16:48:42 +00:00
Craig Topper	01dcd8a31d	[X86] Swap operand order in Intel syntax on a bunch of aliases. llvm-svn: 224687	2014-12-20 23:05:59 +00:00
Craig Topper	643a11268f	[X86] Swap operand order of imul aliases in Intel syntax. Also disable printing of the alias instead of the real instruction. llvm-svn: 224686	2014-12-20 23:05:57 +00:00
Craig Topper	4b8a47050f	[X86] Remove '*' from asm strings in far call/jump aliases for Intel syntax. llvm-svn: 224685	2014-12-20 23:05:55 +00:00
Craig Topper	3564080896	[X86] Don't swap the order of segment and offset in immediate form of far call/jump in Intel syntax. llvm-svn: 224684	2014-12-20 23:05:52 +00:00
Saleem Abdulrasool	0fa832002c	ARM: further improve deprecated diagnosis (LDM) The ARM ARM states: LDM/LDMIA/LDMFD: The SP can be in the list. However, ARM deprecates using these instructions with SP in the list. ARM deprecates using these instructions with both the LR and the PC in the list. LDMDA/LDMFA/LDMDB/LDMEA/LDMIB/LDMED: The SP can be in the list. However, instructions that include the SP in the list are deprecated. Instructions that include both the LR and the PC in the list are deprecated. POP: The SP can only be in the list before ARMv7. ARM deprecates any use of ARM instructions that include the SP, and the value of the SP after such an instruction is UNKNOWN. ARM deprecates the use of this instruction with both the LR and the PC in the list. Attempt to diagnose use of deprecated forms of these instructions. This mirrors the previous changes to diagnose use of the deprecated forms of STM in ARM mode. llvm-svn: 224682	2014-12-20 20:25:36 +00:00
Craig Topper	35545fa20a	[X86] Immediate forms of far call/jump are not valid in x86-64. llvm-svn: 224678	2014-12-20 07:43:27 +00:00
Eric Christopher	3ab98895bc	Remove unused variable and initialization. llvm-svn: 224655	2014-12-20 00:07:09 +00:00
Eric Christopher	8985ba912f	Remove unused variable, initializer, and accessor. llvm-svn: 224650	2014-12-19 23:46:53 +00:00
Matt Arsenault	013ddaf18c	R600: Remove outdated comment llvm-svn: 224648	2014-12-19 23:29:13 +00:00
Elena Demikhovsky	fb73ca516b	Masked load and store codegen - fixed 128-bit vectors The codegen failed on 128-bit types on AVX2. I added patterns and in td files and tests. llvm-svn: 224647	2014-12-19 23:27:57 +00:00
Matt Arsenault	dc10307524	R600/SI: Only form min/max with 1 use. If the condition is used for something else, this increases the number of instructions. llvm-svn: 224646	2014-12-19 23:15:30 +00:00
Reid Kleckner	93acac6cfc	Add the ExceptionHandling::MSVC enumeration It is intended to be used for a family of personality functions that have similar IR preparation requirements. Typically when interoperating with MSVC personality functions, bits of functionality need to be outlined from the main function into helper functions. There is also usually more than one landing pad per invoke, which does not match the LLVM IR landingpad representation. None of this is implemented yet. This change just adds a new enum that is active for *-windows-msvc and delegates to the EH removal preparation pass. No functionality change for other targets. llvm-svn: 224625	2014-12-19 22:19:48 +00:00
Sanjay Patel	1da5f1645b	Model sqrtss as a binary operation with one source operand tied to the destination (PR14221) This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. Differential Revision: http://reviews.llvm.org/D6330 llvm-svn: 224624	2014-12-19 22:16:28 +00:00
Tom Stellard	5352f35a89	R600/SI: isLegalOperand() shouldn't check constant bus for SALU instructions The constant bus restrictions only apply to VALU instructions. This enables SIFoldOperands to fold immediates into SALU instructions. llvm-svn: 224623	2014-12-19 22:15:37 +00:00
Tom Stellard	c3d7eeb6e5	R600/SI: Make sure non-inline constants aren't folded into mubuf soffset operand mubuf instructions now define the soffset field using the SCSrc_32 register class which indicates that only SGPRs and inline constants are allowed. llvm-svn: 224622	2014-12-19 22:15:30 +00:00
Colin LeMahieu	0f850bde0e	[Hexagon] Removing old variants of instructions and updating references. llvm-svn: 224612	2014-12-19 20:29:29 +00:00
Colin LeMahieu	38ce8cd2e2	[Hexagon] Adding bit extraction and table indexing instructions. llvm-svn: 224610	2014-12-19 20:01:08 +00:00
Colin LeMahieu	3c7f664d5a	[Hexagon] Adding bit insertion instructions. llvm-svn: 224609	2014-12-19 19:54:38 +00:00
Colin LeMahieu	d63ef93b4b	[Hexagon] Adding more xtype shift instructions. llvm-svn: 224608	2014-12-19 19:51:35 +00:00
Colin LeMahieu	cc09d1ccc5	[Hexagon] Adding xtype shift instructions. llvm-svn: 224604	2014-12-19 19:34:50 +00:00
Colin LeMahieu	f3db884efb	[Hexagon] Adding transfers to and from control registers. llvm-svn: 224599	2014-12-19 19:06:32 +00:00
Colin LeMahieu	402f772b82	[Hexagon] Adding doubleregs for control registers. Renaming control register class. llvm-svn: 224598	2014-12-19 18:56:10 +00:00
Tilmann Scheller	e24bb41bad	[ARM] Remove dead assignment. Found by the Clang static analyzer. llvm-svn: 224586	2014-12-19 16:57:33 +00:00
Colin LeMahieu	5ccbb1298b	[Hexagon] Adding loop0/1 sp0/1/2loop0 instructions. llvm-svn: 224556	2014-12-19 00:06:53 +00:00
Colin LeMahieu	174476ed96	Reverting 224550, was not ready for commit. llvm-svn: 224552	2014-12-18 23:36:15 +00:00
Colin LeMahieu	9000481cda	[Hexagon] Adding loop0/1 sp0/1/2loop0 instructions. llvm-svn: 224550	2014-12-18 23:27:51 +00:00
Jozef Kolek	2f27d571c8	[mips][microMIPS] Fix bugs related to atomic SC/LL instructions Fix bugs related to atomic microMIPS SC/LL instructions: While expanding atomic operations the mips32r2 encoding was emitted instead of microMIPS. Differential Revision: http://reviews.llvm.org/D6659 llvm-svn: 224524	2014-12-18 16:39:29 +00:00
Saleem Abdulrasool	0b5a8520ac	ARM: fix an off-by-one in the register list access Fix an off-by-one access introduced in 224502 for push.w and pop.w with single register operands. Add test cases for both scenarios. Thanks to Asiri Rathnayake for pointing out the failure! llvm-svn: 224521	2014-12-18 16:16:53 +00:00
Robert Khasanov	79fb7292d7	[AVX512] Enable FP arithmetic lowering for AVX512VL subsets. Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. Added lowering tests. llvm-svn: 224516	2014-12-18 12:28:22 +00:00
Saleem Abdulrasool	3a23917d48	ARM: improve instruction validation for thumb mode The ARM Architecture Reference Manual states the following: LDM{,IA,DB}: The SP cannot be in the list. The PC can be in the list. If the PC is in the list: • the LR must not be in the list • the instruction must be either outside any IT block, or the last instruction in an IT block. POP: The PC can be in the list. If the PC is in the list: • the LR must not be in the list • the instruction must be either outside any IT block, or the last instruction in an IT block. PUSH: The SP and PC can be in the list in ARM instructions, but not in Thumb instructions. STM:{,IA,DB}: The SP and PC can be in the list in ARM instructions, but not in Thumb instructions. llvm-svn: 224502	2014-12-18 05:24:38 +00:00
Craig Topper	f7df7221d1	[PowerPC] Use MCPhysReg for tables of registers. Const-correct the tables. Only put the anonymous namespace around classes. NFC. llvm-svn: 224498	2014-12-18 05:02:14 +00:00
Craig Topper	5645f6b45b	[X86] Use correct opsize on indirect call and jump aliases. llvm-svn: 224497	2014-12-18 05:02:12 +00:00
Craig Topper	9480732be2	[X86] Don't use PS prefix on LDMXCSR/STMXCSR. Near as I can tell prefixes are ignored on these instructions except for a comment in the Intel docs about 0xf3. Binutils disassembler seems to ignore prefixes on these instructions. Our disassembler still doesn't distinguish PS and "no prefix" well enough for this to make a functional change, but it helps with experiments I'm doing on a potential new disassembler table builder. llvm-svn: 224496	2014-12-18 05:02:10 +00:00
Craig Topper	2e2aee0cd6	[X86] Remove unnecessary 'In64BitMode' predicate for instructions that already indicate use of REX.W. llvm-svn: 224495	2014-12-18 05:02:08 +00:00
Eric Christopher	661f2d1ca1	Add a new string member to the TargetOptions struct for the name of the abi we should be using. For targets that don't use the option there's no change, otherwise this allows external users to set the ABI via string and avoid some of the -backend-option pain in clang. Use this option to move the ABI for the ARM port from the Subtarget to the TargetMachine and update the testcases accordingly since it's no longer valid to set via -mattr. llvm-svn: 224492	2014-12-18 02:20:58 +00:00
Eric Christopher	1971c3508a	Model ARM backend ABI selection after the front end code doing the same. This will change the "bare metal" ABI from APCS to AAPCS. The only difference between the front and back end code is that the code for Triple::GNU was added for environment. That will migrate to the front end shortly. Tests updated with the ABI they were originally testing in the case of bare metal (e.g. -mtriple armv7) or with a -gnu for arm-linux triples. llvm-svn: 224489	2014-12-18 02:08:45 +00:00
Matt Arsenault	303011a005	R600/SI: Fix f64 inline immediates llvm-svn: 224458	2014-12-17 21:04:08 +00:00
Colin LeMahieu	2055538edb	[Hexagon] Reconfiguring register alternate names. llvm-svn: 224455	2014-12-17 20:35:11 +00:00
Will Schmidt	428488c594	Enable the P8Model entry This was missed last time around, for the P8 Instruction Scheduling changes (223257). This will hook the P8Model entry in so those changes will actually be used. llvm-svn: 224452	2014-12-17 19:56:29 +00:00
Jingyue Wu	e4c9cf04f5	[NVPTX] Fix bugs related to isSingleValueType Summary: With isSingleValueType starting to treat vector types as single-value types, code that uses this interface needs to be updated. Test Plan: vector-global.ll nvcl-param-align.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6573 llvm-svn: 224440	2014-12-17 17:59:04 +00:00
Saleem Abdulrasool	1ce7d31f33	ARM: correct an off-by-one in an assert The assert was off-by-one, resulting in failures for valid input. Thanks to Asiri Rathnayake for pointing out the failure! llvm-svn: 224432	2014-12-17 16:17:44 +00:00
Michael Kuperstein	047b1a0400	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle. This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429	2014-12-17 12:32:17 +00:00
Vladimir Medic	636fefe252	MipsABIInfo class is used in different libraries. Moving the files to MCTargetDesc folder(LLVMMipsDesc library) prevents linkage errors. There are no functional changes. llvm-svn: 224427	2014-12-17 11:49:56 +00:00
Toma Tabacu	a23f13c3b0	[mips] Set GCC-compatible MIPS asssembler options before inline asm blocks. Summary: When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives, while GCC uses the default options if an assembly-level function contains inline assembly code. This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example). This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6637 llvm-svn: 224425	2014-12-17 10:56:16 +00:00
Quentin Colombet	fc2201e922	[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure: The type promotion helper does not support vector type, so when make such it does not kick in in such cases. Original commit message: [CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224402	2014-12-17 01:36:17 +00:00
Reid Kleckner	04b69f89aa	Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion." This reverts commit r224351. It causes assertion failures when building ICU. llvm-svn: 224397	2014-12-17 00:29:23 +00:00
Colin LeMahieu	aa1bade7b4	[Hexagon] Updating doubleword shift usages to new versions. llvm-svn: 224391	2014-12-16 23:36:15 +00:00
Simon Pilgrim	bf1e079005	[X86][SSE] Vector double -> float conversion memory folding (cvtpd2ps) Added a missing memory folding relationship for the (V)CVTPD2PS instruction - we can safely fold these for stack reloads. Differential Revision: http://reviews.llvm.org/D6663 llvm-svn: 224383	2014-12-16 22:30:10 +00:00
Colin LeMahieu	7fc90fc7e9	[Hexagon] Removing old XTYPE/BIT instructions and replacing usages. llvm-svn: 224381	2014-12-16 22:17:09 +00:00
Colin LeMahieu	f5acc8c625	[Hexagon] Adding tstbit/bitclr/bitset instructions. llvm-svn: 224374	2014-12-16 21:28:58 +00:00
Colin LeMahieu	615757f2f1	[Hexagon] Adding bit count and twiddling instructions. llvm-svn: 224367	2014-12-16 20:57:56 +00:00
Colin LeMahieu	6fce46baf6	[Hexagon] Adding asr/lsr/asl reg/imm, asl with saturation, asr with rounding. Doubleword abs/neg/not. Interleave and deinterleave instructions. llvm-svn: 224365	2014-12-16 20:40:23 +00:00
JF Bastien	5d3280c7a7	x86-32: PUSHF/POPF use/def EFLAGS Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386. Test Plan: ninja check Reviewers: t.p.northover, jvoung Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6687 llvm-svn: 224359	2014-12-16 20:15:45 +00:00
Matt Arsenault	31a52ad48c	NVPTX: Remove duplicate of AsmPrinter::lowerConstant llvm-svn: 224355	2014-12-16 19:16:17 +00:00
Quentin Colombet	d5e57b731f	[CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224351	2014-12-16 19:09:03 +00:00
Robert Khasanov	d04cd2fbfe	[AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets. Added lowering tests. llvm-svn: 224349	2014-12-16 18:24:07 +00:00
Colin LeMahieu	1944a8cd04	[Hexagon] Adding absolute value, and negate with saturation llvm-svn: 224346	2014-12-16 17:44:49 +00:00
Sanjay Patel	e46d54f0bf	combine consecutive subvector 16-byte loads into one 32-byte load This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ). When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector, we can use a single 32-byte load instead. But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops. We also don't bother using 32-byte integer loads on a machine that only has AVX1 (btver2) because those operands would have to be split in half anyway since there is no support for 32-byte integer math ops. Differential Revision: http://reviews.llvm.org/D6492 llvm-svn: 224344	2014-12-16 16:30:01 +00:00
Colin LeMahieu	455f24aa77	[Hexagon] Adding saturate and swizzle instructions. llvm-svn: 224343	2014-12-16 16:27:17 +00:00
Robert Khasanov	8d9b93eac8	[AVX512] Add a comment for avx512_broadcast_pat multiclass llvm-svn: 224341	2014-12-16 16:12:11 +00:00
Colin LeMahieu	d9b23509bf	[Hexagon] Removing old multiply defs and updating references to new versions. llvm-svn: 224340	2014-12-16 16:10:01 +00:00
Vladimir Medic	e88609388a	The single check for N64 inside MipsDisassemblerBase's subclasses is actually wrong. It should be testing for FeatureGP64bit.There are no functional changes. llvm-svn: 224339	2014-12-16 15:29:12 +00:00
Zoran Jovanovic	2deca34803	[mips][microMIPS] Implement SWP and LWP instructions Differential Revision: http://reviews.llvm.org/D5667 llvm-svn: 224338	2014-12-16 14:59:10 +00:00
Aaron Ballman	0d6a010c13	Fixing -Wsign-compare warnings; NFC. llvm-svn: 224337	2014-12-16 14:04:11 +00:00
Bradley Smith	ececb7f6e2	[ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes This would result in a crash since the vcvt used does not support v8i32 types. llvm-svn: 224332	2014-12-16 10:59:27 +00:00
Elena Demikhovsky	a79fc16bb0	X86: Added FeatureVectorUAMem for all AVX architectures. According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330	2014-12-16 09:10:08 +00:00
Saleem Abdulrasool	417fc6b303	ARM: diagnose deprecated syntax The use of SP and PC in the register list for stores is deprecated on ARM (ARM ARM A.8.8.199): ARM deprecates the use of ARM instructions that include the SP or the PC in the list. Provide a deprecation warning from the assembler in the case that the syntax is ever seen. llvm-svn: 224319	2014-12-16 05:53:25 +00:00
Hal Finkel	8adf2254ef	[PowerPC] Improve instruction selection bit-permuting operations (32-bit) The PowerPC backend, somewhat embarrassingly, did not generate an optimal-length sequence of instructions for a 32-bit bswap. While adding a pattern for the bswap intrinsic to fix this would not have been terribly difficult, doing so would not have addressed the real problem: we had been generating poor code for many bit-permuting operations (by which I mean things like byte-swap that permute the bits of one or more inputs around in various ways). Here are some initial steps toward solving this deficiency. Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL, SHL, SRL, AND and OR (mostly with constant second operands). Looking back through these operations, we can build up a description of the bits in the resulting value in terms of bits of one or more input values (and constant zeros). For each bit, we compute the rotation amount from the original value, and then group consecutive (value, rotation factor) bits into groups. Groups sharing these attributes are then collected and sorted, and we can then instruction select the entire permutation using a combination of masked rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts (rlwimi). The result is that instead of lowering an i32 bswap as: rlwinm 5, 3, 24, 16, 23 rlwinm 4, 3, 24, 0, 7 rlwimi 4, 3, 8, 8, 15 rlwimi 5, 3, 8, 24, 31 rlwimi 4, 5, 0, 16, 31 we now produce: rlwinm 4, 3, 8, 0, 31 rlwimi 4, 3, 24, 16, 23 rlwimi 4, 3, 24, 0, 7 and for the 'test6' example in the PowerPC/README.txt file: unsigned test6(unsigned x) { return ((x & 0x00FF0000) >> 16) \| ((x & 0x000000FF) << 16); } we used to produce: lis 4, 255 rlwinm 3, 3, 16, 0, 31 ori 4, 4, 255 and 3, 3, 4 and now we produce: rlwinm 4, 3, 16, 24, 31 rlwimi 4, 3, 16, 8, 15 and, as a nice bonus, this fixes the FIXME in test/CodeGen/PowerPC/rlwimi-and.ll. This commit does not include instruction-selection for i64 operations, those will come later. llvm-svn: 224318	2014-12-16 05:51:41 +00:00
Saleem Abdulrasool	08408ea86e	ARM: 80-column clang-format a function with an overly long string constant. NFC. llvm-svn: 224314	2014-12-16 04:10:10 +00:00
Adrian Prantl	b9fa945d51	ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224294	2014-12-16 00:20:49 +00:00
Colin LeMahieu	d9a00a9c38	[Hexagon] Adding doubleword multiplies with and without accumulation. llvm-svn: 224293	2014-12-16 00:07:24 +00:00
Colin LeMahieu	18c927620a	[Hexagon] Adding halfword to doubleword multiplies. llvm-svn: 224289	2014-12-15 23:29:37 +00:00
Colin LeMahieu	64ffd52943	[Hexagon] Adding logical-logical accumulation instructions and tests. llvm-svn: 224288	2014-12-15 23:19:07 +00:00
JF Bastien	388b8794c9	x86: Emit LOCK prefix after DATA16 Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact. Test Plan: ninja check Reviewers: craig.topper, jvoung Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D6630 llvm-svn: 224283	2014-12-15 22:34:58 +00:00
Colin LeMahieu	71e11a1d0d	[Hexagon] Adding a number of additional multiply forms with tests. llvm-svn: 224282	2014-12-15 22:10:37 +00:00
Colin LeMahieu	4a46429305	[Hexagon] Adding misc multiply encodings and tests. llvm-svn: 224273	2014-12-15 21:17:03 +00:00
Colin LeMahieu	26f884aedf	[Hexagon] Adding doubleworld accumulating multiplies of halfwords. llvm-svn: 224267	2014-12-15 20:17:46 +00:00
Colin LeMahieu	572c53e258	[Hexagon] Adding accumulating half word multiplies. llvm-svn: 224266	2014-12-15 20:10:28 +00:00
Colin LeMahieu	d1704cdc07	[Hexagon] Adding multiply with rnd/sat/rndsat llvm-svn: 224265	2014-12-15 20:01:59 +00:00
Colin LeMahieu	fe4012a969	[Hexagon] Adding encoding bits for halfword multiplies. llvm-svn: 224261	2014-12-15 19:22:07 +00:00
Ahmed Bougacha	c2a87ddf01	[X86] Also pretty-print shuffle mask for INSERTPS rm variants. llvm-svn: 224260	2014-12-15 19:17:54 +00:00
Michael Ilseman	addddc441f	Silence more static analyzer warnings. Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255	2014-12-15 18:48:43 +00:00
Vladimir Medic	d7ecf49e97	Add disassembler tests for mips3 platform. There are no functional changes. llvm-svn: 224253	2014-12-15 16:19:34 +00:00
Michael Kuperstein	47c97157ef	[X86] Break false dependencies before partial register updates when the source operand is in memory Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list. Differential Revision: http://reviews.llvm.org/D6620 llvm-svn: 224246	2014-12-15 13:18:21 +00:00
Elena Demikhovsky	72860c341e	AVX-512: Added EXPAND instructions and intrinsics. llvm-svn: 224241	2014-12-15 10:03:52 +00:00
Elena Demikhovsky	3fcafa2cdb	Loop Vectorizer minor changes in the code - some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218	2014-12-14 09:43:50 +00:00
Hal Finkel	4104a1a346	[PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in PPCTL::DAGCombineExtBoolTrunc PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted truncations and extensions when dealing with nodes of the form: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) There was a FIXME in the implementation (now removed) regarding the fact that the function would abort the transformations if any of the non-output operands of a SELECT or SELECT_CC node would need to be promoted (because they were also output operands, for example). As a result, we continued to generate unnecessary zero-extends for code such as this: unsigned foo(unsigned a, unsigned b) { return (a <= b) ? a : b; } which would produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 rldicl 3, 3, 0, 32 blr and now we produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 blr which is better in the obvious way. llvm-svn: 224213	2014-12-14 05:53:19 +00:00
Ahmed Bougacha	0cb861634b	Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores." r223862 tried to also combine base-updating load/stores. r224198 reverted it, as "it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown." Reapply, with a fix to ignore non-normal load/stores. Truncstores are handled elsewhere (you can actually write a pattern for those, whereas for postinc loads you can't, since they return two values), but it should be possible to also combine extloads base updates, by checking that the memory (rather than result) type is of the same size as the addend. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 224203	2014-12-13 23:22:12 +00:00
Renato Golin	df8f9b6dc9	Revert "[ARM] Combine base-updating/post-incrementing vector load/stores." This reverts commit r223862, as it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown. We'll investigate the issue and re-apply when safe. llvm-svn: 224198	2014-12-13 20:23:18 +00:00
Hal Finkel	4c6658feb0	[PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all of the usual places, but also from the ABI, which specifies that values passed are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined to do something to the higher-order bits, and for some instructions, that action clears those bits (thus providing a zero-extended result). This is especially common after rotate-and-mask instructions. Adding an additional instruction to zero-extend the results of these instructions is unnecessary. This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and looks back through their operands to see if all instructions will implicitly zero extend their results. If so, we convert these instructions to their 64-bit variants (which is an internal change only, the actual encoding of these instructions is the same as the original 32-bit ones) and remove the unnecessary zero-extension (changing where the INSERT_SUBREG instructions are to make everything internally consistent). llvm-svn: 224169	2014-12-12 23:59:36 +00:00
Chad Rosier	620fb2206d	[ARMConstantIsland] Insert tbb/tbh optimization where previous jump table resided. llvm-svn: 224165	2014-12-12 23:27:40 +00:00
Colin LeMahieu	90482a77b1	[Hexagon] Adding double word add/min/minu/max/maxu instructions and tests. llvm-svn: 224153	2014-12-12 21:29:25 +00:00
Colin LeMahieu	984ef17d66	[Hexagon] Adding J class call instructions. llvm-svn: 224150	2014-12-12 21:12:27 +00:00
Robert Khasanov	37c3ad6c20	[AVX512] Enabling bit logic lowering Added lowering tests. llvm-svn: 224132	2014-12-12 17:02:18 +00:00
Vasileios Kalintiris	8edbcad8e5	[mips] Enable code generation for MIPS-III. Summary: This commit enables the MIPS-III target and adds support for code generation of SELECT nodes. We have to use pseudo-instructions with custom inserters for these nodes as MIPS-III CPUs do not have conditional-move instructions. Depends on D6212 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6464 llvm-svn: 224128	2014-12-12 15:16:46 +00:00
Robert Khasanov	e82a3630b7	[AVX512] Enabling MIN/MAX lowering. Added lowering tests. llvm-svn: 224127	2014-12-12 15:10:43 +00:00
Vasileios Kalintiris	f53f785a6e	[mips] Support SELECT nodes for targets that don't have conditional-move instructions. Summary: For Mips targets that do not have conditional-move instructions, ie. targets before MIPS32 and MIPS-IV, we have to insert a diamond control-flow pattern in order to support SELECT nodes. In order to do that, we add pseudo-instructions with a custom inserter that emits the necessary control-flow that selects the correct value. With this patch we add complete support for code generation of Mips-II targets based on the LLVM test-suite. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6212 llvm-svn: 224124	2014-12-12 14:41:37 +00:00
Robert Khasanov	4204c1acc6	[AVX512] Minor fix in lowering pattern for broadcast intrustions. No functional change. llvm-svn: 224122	2014-12-12 14:21:30 +00:00
Charlie Turner	1a53996c31	Emit Tag_ABI_FP_16bit_format build attribute. The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet supported, there is not a user switch to change this behaviour. This build attribute should capture the default behaviour of the compiler, which is to expose the IEEE 754 version of __fp16. When -mfp16-format is emitted, that will be the way to control the value of this build attribute. Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0 llvm-svn: 224115	2014-12-12 11:59:18 +00:00
Matt Arsenault	1e3a4ebc6e	R600: Fix min/max matching problems with unordered compares The returned operand needs to be permuted for the unordered compares. Also fix incorrectly producing fmin_legacy / fmax_legacy for f64, which don't exist. llvm-svn: 224094	2014-12-12 02:30:37 +00:00
Matt Arsenault	145d5717f5	R600/SI: fmin/fmax_legacy are not associative llvm-svn: 224093	2014-12-12 02:30:33 +00:00
Matt Arsenault	477b178276	R600/SI: Don't promote f32 select to i32 This is nice for the instruction patterns, but it complicates min / max matching. The select doesn't have the correct type and would require looking through the bitcasts for the real float operands. llvm-svn: 224092	2014-12-12 02:30:29 +00:00
Matt Arsenault	810cb62962	Add target hook for whether it is profitable to reduce load widths Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084	2014-12-12 00:00:24 +00:00
Sanjay Patel	757942a38f	remove function names from comments; NFC llvm-svn: 224080	2014-12-11 23:38:43 +00:00
Matt Arsenault	102a70409e	R600/SI: Handle physical registers in getOpRegClass llvm-svn: 224079	2014-12-11 23:37:34 +00:00
Matt Arsenault	e368cb378f	R600/SI: Don't verify constant bus usage of flag ops This was checking if pseudo-operands like the source modifiers were using the constant bus, which happens to work because the values these all can be happen to be valid inline immediates. This fixes a later commit which starts checking the register class of the operands. llvm-svn: 224078	2014-12-11 23:37:32 +00:00
Sanjay Patel	c694ac5519	return without temporary; NFC llvm-svn: 224076	2014-12-11 23:30:36 +00:00
Matthias Braun	b2f2388a76	Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips. llvm-svn: 224075	2014-12-11 23:18:03 +00:00
Ahmed Bougacha	79c797443b	[X86] Add a temporary testcase for PR21876/r223996. llvm-svn: 224074	2014-12-11 23:07:52 +00:00
Hal Finkel	b5e9b0426a	[PowerPC] Better lowering for add/or of a FrameIndex If we have an add (or an or that is really an add), where one operand is a FrameIndex and the other operand is a small constant, we can combine the lowering of the FrameIndex (which is lowered as an add of the FI and a zero offset) with the constant operand. Amusingly, this is an old potential improvement entry from lib/Target/PowerPC/README.txt which had never been resolved. In short, we used to lower: %X = alloca { i32, i32 } %Y = getelementptr {i32,i32}* %X, i32 0, i32 1 ret i32* %Y as: addi 3, 1, -8 ori 3, 3, 4 blr and now we produce: addi 3, 1, -4 blr which is much more sensible. llvm-svn: 224071	2014-12-11 22:51:06 +00:00
Matt Arsenault	58d502f0d4	R600/SI: Use unordered equal instructions llvm-svn: 224067	2014-12-11 22:15:43 +00:00
Matt Arsenault	8b989efaf9	R600/SI: Make more unordered comparisons legal This saves a second compare and an and / or by using the unordered comparison instructions. llvm-svn: 224066	2014-12-11 22:15:39 +00:00
Matt Arsenault	9cded7a74b	R600/SI: Use unordered not equal instructions llvm-svn: 224065	2014-12-11 22:15:35 +00:00
Matthias Braun	7e37a5f523	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059	2014-12-11 21:26:47 +00:00
Rafael Espindola	01c73610d0	This reverts commit r224043 and r224042. check-llvm was failing. llvm-svn: 224045	2014-12-11 20:03:57 +00:00
Matthias Braun	199aeff7dd	Enable machineverifier in debug mode for X86, ARM, AArch64, Mips llvm-svn: 224043	2014-12-11 19:42:09 +00:00
Matthias Braun	a7c82a9f1d	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042	2014-12-11 19:42:05 +00:00
Colin LeMahieu	150b6b3a73	[Hexagon] Renaming classes in preparation for replacement. llvm-svn: 224036	2014-12-11 19:01:28 +00:00
Tim Northover	e2c33715bc	ARM: convert isTargetIOS checks to isTargetDarwin. The distinction is mostly useful in the front-end. By the time we get here, there are very few situations where we actually want different behaviour for Darwin and IOS (in fact Darwin mostly just exists in a few tests). So this should reduce any surprising weirdness for anyone using it. No functional change on anything anyone actually cares about. llvm-svn: 224035	2014-12-11 18:49:37 +00:00
Hal Finkel	13d104bf78	[PowerPC] Implement BuildSDIVPow2, lower i64 pow2 sdiv using sradi PPCISelDAGToDAG contained existing code to lower i32 sdiv by a power-of-2 using srawi/addze, but did not implement the i64 case. DAGCombine now contains a callback specifically designed for this purpose (BuildSDIVPow2), and part of the logic has been moved to an implementation of that callback. Doing this lowering using BuildSDIVPow2 likely does not matter, compared to handling everything in PPCISelDAGToDAG, for the positive divisor case, but the negative divisor case, which generates an additional negation, can potentially benefit from additional folding from DAGCombine. Now, both the i32 and the i64 cases have been implemented. Fixes PR20732. llvm-svn: 224033	2014-12-11 18:37:52 +00:00
Cameron McInally	5fb084e798	[AVX512] Add support for 512b variable bit shift intrinsics. llvm-svn: 224028	2014-12-11 17:13:05 +00:00
Colin LeMahieu	adab80720d	[Hexagon] Ading i64 <- i32, i32 sextw pattern. llvm-svn: 224027	2014-12-11 17:08:21 +00:00
Colin LeMahieu	eb52f69f59	[Hexagon] Adding encoding information for sign extend word instruction. llvm-svn: 224026	2014-12-11 16:43:06 +00:00
Elena Demikhovsky	908dbf48c8	AVX-512: Added all forms of COMPRESS instruction + intrinsics + tests llvm-svn: 224019	2014-12-11 15:02:24 +00:00
Jozef Kolek	a330a47427	[mips][microMIPS] Implement CodeGen support for LI16 instruction. Differential Revision: http://reviews.llvm.org/D5840 llvm-svn: 224017	2014-12-11 13:56:23 +00:00
Michael Kuperstein	11165674dc	[X86] When converting movs to pushes, don't assume MOVmi operand is an actual immediate This should fix PR21878. llvm-svn: 224010	2014-12-11 11:26:16 +00:00
Elena Demikhovsky	fc081457f1	AVX-512: Fixed a bug in lowering setcc for MVT::i1 type llvm-svn: 224008	2014-12-11 10:21:12 +00:00
Kumar Sukhani	fb60e77fcc	test commit (spelling correction) llvm-svn: 224007	2014-12-11 08:33:36 +00:00
Ahmed Bougacha	611a3ef0bc	[X86] Add back AVX2 VR256 PMOVX patterns. We can't reach those from zext, but other parts of the backend (the shuffle lowering) generate 256-bit VZEXT nodes. Fixes PR21876. llvm-svn: 223996	2014-12-11 04:32:17 +00:00
Tim Northover	2ac7e4b3ee	ARM: correctly expand LDR-lit based globals. Quite a major error here: the expansions for the Pseudos with and without folded load were mixed up. Fortunately it only affects ARM-mode, when not using movw/movt, on Darwin. I'm guessing no-one actually uses that combination. llvm-svn: 223986	2014-12-10 23:40:50 +00:00
Colin LeMahieu	220adb6370	[Hexagon] Adding combine ri/ir instructions. llvm-svn: 223971	2014-12-10 22:23:07 +00:00
Colin LeMahieu	db0b13cef0	[Hexagon] Adding encodings for JR class instructions. Updating complier usages. llvm-svn: 223967	2014-12-10 21:24:10 +00:00
Juergen Ributzka	2326650ceb	[AArch64] MachO large code-model: Materialize FP constants in code. In the large code model we have to first get the address of the GOT entry, load the address of the constant, and then load the constant itself. To avoid these loads and the GOT entry alltogether this commit changes the way how FP constants are materialized in the large code model. The constats are now materialized in a GPR and then bitconverted/moved into the FPR. Reviewed by Tim Northover Fixes rdar://problem/16572564. llvm-svn: 223941	2014-12-10 19:43:32 +00:00
Marek Olsak	0c05645b0f	R600/SI: Use getTargetConstant in AdjustRegClass llvm-svn: 223940	2014-12-10 19:25:31 +00:00
Colin LeMahieu	8872d20788	[Hexagon] Adding JR class predicated call reg instructions. llvm-svn: 223933	2014-12-10 18:24:16 +00:00
Sanjay Patel	e20437f9af	Match new shuffle codegen for MOVHPD patterns Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen when storing the high element of a v2f64. The existing patterns were only checking for an unpckh type of shuffle. http://llvm.org/bugs/show_bug.cgi?id=21791 Differential Revision: http://reviews.llvm.org/D6586 llvm-svn: 223929	2014-12-10 16:58:54 +00:00

... 2 3 4 5 6 ...

31349 Commits