llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Kuperstein	ff5acaf50c	[X86] Combine vector anyext + and into a vector zext Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 llvm-svn: 229480	2015-02-17 08:22:51 +00:00
Eric Christopher	5c0e009d3a	Make the PowerPC AsmPrinter independent of global subtarget initialization. Initialize the subtarget once per function and migrate EmitStartOfAsmFile to either use attributes on the TargetMachine or get information from all of the various subtargets. llvm-svn: 229475	2015-02-17 07:21:21 +00:00
Eric Christopher	75dc3904a5	Add a FIXME to move IsLittleEndian to the target machine. llvm-svn: 229472	2015-02-17 06:45:17 +00:00
Eric Christopher	fee6aaf683	Move ABI handling and 64-bitness to the PowerPC target machine. This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471	2015-02-17 06:45:15 +00:00
Chandler Carruth	55db07016e	[x86] Teach the unpack lowering to try wider element unpacks. This allows it to match still more places where previously we would have to fall back on floating point shuffles or other more complex lowering strategies. I'm hoping to replace some of the hand-rolled unpack matching with this routine is it gets more and more clever. llvm-svn: 229463	2015-02-17 02:12:24 +00:00
Hal Finkel	5cedafb8cd	[PowerPC] Support non-direct-sub/superclass VSX copies Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was almost there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. llvm-svn: 229457	2015-02-16 23:46:30 +00:00
Simon Atanasyan	79ba8407d2	[Mips] Add .MIPS.options section descriptor kinds enumeration No functional changes. llvm-svn: 229452	2015-02-16 22:59:29 +00:00
Ahmed Bougacha	bf2b90e92d	[ARM] Remove unused declaration. NFC. GlobalMerge was moved to lib/CodeGen a while ago, and is no longer called "ARMGlobalMerge". llvm-svn: 229448	2015-02-16 22:30:08 +00:00
Cameron McInally	c5764cbe4e	[AVX512] Make 512b vector floating point rounds legal on AVX512. llvm-svn: 229445	2015-02-16 22:15:42 +00:00
Simon Pilgrim	b2c00f3286	[X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches. Differential Revision: http://reviews.llvm.org/D7600 llvm-svn: 229439	2015-02-16 21:50:56 +00:00
Craig Topper	49df44e2e2	[X86] Remove the multiply by 8 that goes into the shift constant for X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction. llvm-svn: 229431	2015-02-16 20:52:07 +00:00
Craig Topper	44026efa88	[X86] Remove x86.avx2.psll.dq.bs and x86.avx2.psrl.dq.bs intrinsics. llvm-svn: 229430	2015-02-16 20:51:59 +00:00
Matthias Braun	d6b108e445	ARM: Transfer kill flag when lowering VSTMQIA to VSTMDIA. llvm-svn: 229425	2015-02-16 19:34:30 +00:00
Aaron Ballman	da9501b25c	We require MSVC 1800 as our minimum, so these checks can safely go away; NFC. (It seems this code has been copy/pasted around, unfortunately.) llvm-svn: 229417	2015-02-16 18:34:57 +00:00
Andrew Trick	05938a5481	AArch64: Safely handle the incoming sret call argument. This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413	2015-02-16 18:10:47 +00:00
Chandler Carruth	1e57e2deb8	[x86] Add a generic unpack-targeted lowering technique. This can be used to generically lower blends and is particularly nice because it is available frome SSE2 onward. This removes a lot of the remaining domain crossing blends in SSE2 code. I'm hoping to replace some of the "interleaved" lowering hacks with something closer to this which should be more principled. First, this needs to learn how to detect and use other interleavings besides that of the natural type provided. That will be a follow-up patch though. llvm-svn: 229378	2015-02-16 12:28:18 +00:00
Chandler Carruth	c802085b3a	[x86] Add initial basic support for forming blends of v16i8 vectors. This blend instruction is ... really lame. The register usage is insane. As a consequence this is probably only barely better than 2 pshufbs followed by a por, and that mostly because it only has to read from a single memory location. However, this doesn't fix as much as I kind of expected, so more to go. Pretty sure that the ordering and delegation of v16i8 is just really, really bad. llvm-svn: 229373	2015-02-16 10:58:23 +00:00
Chandler Carruth	e63bbd97a7	[x86] Switch my usage of VariadicFunction to a "normal" variadic template now that we can use them. This is, of course, horribly ugly because of the required recursive formulation. Suggestions for making it less ugly welcome. llvm-svn: 229367	2015-02-16 09:59:48 +00:00
Craig Topper	7e8dcef094	[X86] Add support for lowering shuffles to 256-bit PALIGNR instruction. llvm-svn: 229359	2015-02-16 06:29:06 +00:00
Chandler Carruth	87e580a659	[x86] Teach the 128-bit vector shuffle lowering routines to take advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a very fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. llvm-svn: 229350	2015-02-16 01:52:02 +00:00
Aaron Ballman	f9a1897c72	Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition. llvm-svn: 229340	2015-02-15 22:54:22 +00:00
Aaron Ballman	b46962fe5d	Removing LLVM_EXPLICIT, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition. llvm-svn: 229335	2015-02-15 22:00:20 +00:00
Simon Pilgrim	2a7bedb73e	Coding style fixes to recent patches. NFC. llvm-svn: 229312	2015-02-15 14:19:29 +00:00
Simon Pilgrim	00bd79d794	[X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2 This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets. It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions. Differential Revision: http://reviews.llvm.org/D7596 llvm-svn: 229311	2015-02-15 13:19:52 +00:00
Chandler Carruth	bf0fb06e0d	[x86] Teach the decomposed shuffle/blend lowering to use an early blend when that will allow it to lower with a single permute instead of multiple permutes. It tries to detect when it will only have to do a single permute in either case to maximize folding of loads and such. This cuts a lot of the avx2 shuffle permute counts in half. =] llvm-svn: 229309	2015-02-15 12:42:15 +00:00
Chandler Carruth	75d9a97569	[x86] Teach the shuffle mask equivalence test to look through build vectors and detect equivalent inputs. This lets the code match unpck-style instructions when only one of the inputs are lined up but the other input is a splat and so which lanes we pull from doesn't matter. Today, this doesn't really happen, but just by accident. I have a patch that normalizes how we shuffle splats, and with that patch this will be necessary for a lot of the mask equivalence tests to work. I don't really know how to write a test case for this specific change until the other change lands though. llvm-svn: 229307	2015-02-15 12:07:55 +00:00
Chandler Carruth	4fe214b1f2	[x86] Tweak the ordering of unpack matching vs. element insertion, and don't try to do element insertion for non-zero-index floating point vectors. We don't have any useful patterns or lowering for element insertion into high elements of a floating point vector, and the generic shuffle lowering will end up being better -- namely it will fall back to unpck. But we should try to handle other forms of element insertion before matching unpck patterns. While this doesn't matter much right now, I'm working on a patch that makes unpck matching much more powerful, and that patch will break without this re-ordering. llvm-svn: 229306	2015-02-15 12:01:14 +00:00
Chandler Carruth	56e0ceda0d	[x86] Stop shuffling zero vectors. =] I was somewhat surprised this pattern really came up, but it does. It seems better to just directly handle it than try to special case every place where we end up forming a shuffle that devolves to a shuffle of a zero vector. llvm-svn: 229301	2015-02-15 10:34:52 +00:00
Chandler Carruth	3d272daaed	[x86] Use a more helpful parenthesizing of these comparisons. Silences a -Wparentheses complaint from GCC. llvm-svn: 229300	2015-02-15 10:15:20 +00:00
Chandler Carruth	62558c1d4d	[x86] When splitting 256-bit vectors into 128-bit vectors, don't extract subvectors from buildvectors. That doesn't really make any sense and it breaks all of the down-stream matching of buildvectors to cleverly lower shuffles. With this, we now get the shift-based lowering of 256-bit vector shuffles with AVX1 when we split them into 128-bit vectors. We also do much better on the zero-extension patterns, although there remains quite a bit of room for improvement here. llvm-svn: 229299	2015-02-15 10:12:02 +00:00
Chandler Carruth	a6f8a3661c	[x86] Make computing the zeroable elements slightly more powerful, at least in theory. I don't actually have a test case that benefits from this, but theoretically, it could come up, and I don't want to try to think about whether this is the culprit or something else is, so I'd rather just make this code powerful. =/ Makes me sad that I can't really test it though. llvm-svn: 229298	2015-02-15 09:33:36 +00:00
Chandler Carruth	0ddfe0c7c5	[x86] Add a slight variation on some of the other generic shuffle lowerings -- one which decomposes into an initial blend followed by a permute. Particularly on newer chips, blends are handled independently of shuffles and so this is much less bottlenecked on the single port that floating point shuffles are executed with on Intel. I'll be adding this lowering to a bunch of other code paths in subsequent commits to handle still more places where we can effectively leverage blends when they're available in the ISA. llvm-svn: 229292	2015-02-15 08:26:30 +00:00
Craig Topper	78c424dfca	[X86] Add assembly parser support for mnemonic aliases for AVX-512 vpcmp instructions. llvm-svn: 229287	2015-02-15 07:13:48 +00:00
Craig Topper	f02ad93270	[X86] Add assembler predicates for the rest of the AVX512 feature flags. This makes the assembly matching consistent across all AVX512 instructions. Without this we were allowing some AVX512 instructions to be parsed always, but not the foundation instructions. llvm-svn: 229280	2015-02-15 04:54:55 +00:00
Craig Topper	a3776de242	[X86] Add the remaining 11 possible exact ModRM formats. This makes their encodings linear which can then be used to simplify some other code. llvm-svn: 229279	2015-02-15 04:16:44 +00:00
Simon Pilgrim	31457d54f7	[X86][XOP] Enable commutation for XOP instructions Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes. This patch also sets the SSE domains of all the XOP instructions. Differential Revision: http://reviews.llvm.org/D7646 llvm-svn: 229267	2015-02-14 22:40:46 +00:00
Craig Topper	43860838dc	[X86] Improve parsing support AVX/SSE floating point compare instruction mnemonic aliases. They'll now print with the alias the parser received instead of converting to the explicit immediate form. llvm-svn: 229266	2015-02-14 21:54:03 +00:00
Duncan P. N. Exon Smith	025c0ad74c	Target: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229261	2015-02-14 15:36:52 +00:00
Duncan P. N. Exon Smith	b5054333ec	NVPTX: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229260	2015-02-14 15:35:43 +00:00
Simon Pilgrim	b0bac23fcd	Line ending fix. NFC. llvm-svn: 229256	2015-02-14 13:27:53 +00:00
Chandler Carruth	003ed332bf	Remove a variable only used in an assert and sink its initializer into the assert. Fixes -Wunused-variable on non-asserts builds. llvm-svn: 229250	2015-02-14 09:14:44 +00:00
Matt Arsenault	0bbcd8ba2f	R600/SI: Implement correct f64 fdiv This version passes the OpenCL conformance test. llvm-svn: 229239	2015-02-14 04:30:08 +00:00
Matt Arsenault	044f1d19cf	R600/SI: Use complex operand folding for div_scale llvm-svn: 229238	2015-02-14 04:24:28 +00:00
Matt Arsenault	1bc9d95047	R600/SI: Fix implicit vcc operand to v_div_fmas_* This should allow finally fixing the f64 fdiv implementation. Test is disabled for VI since there seems to be a problem with one of the buffer load instructions on it. llvm-svn: 229236	2015-02-14 04:22:00 +00:00
Matt Arsenault	6e26b8d854	R600/SI: Fix schedule model for v_div_scale_{f32\|f64} llvm-svn: 229235	2015-02-14 04:03:18 +00:00
Matt Arsenault	35733e2dec	R600/SI: Really fix size of VReg_1 llvm-svn: 229234	2015-02-14 03:54:32 +00:00
Matt Arsenault	1bcc8cba5a	R600/SI: Rename encoding field to match docs for VOP3b llvm-svn: 229233	2015-02-14 03:54:29 +00:00
Matt Arsenault	31ec598a2a	R600/SI: Fix not encoding src2 for v_div_scale_{f32\|f64} This apparently got lost in the VI changes. llvm-svn: 229230	2015-02-14 03:40:35 +00:00
Matt Arsenault	692acf1438	R600/SI: Fix VOP3b encoding on VI llvm-svn: 229228	2015-02-14 03:02:23 +00:00
Matt Arsenault	95546b46ab	R600/SI: Fix phys reg copies in SIFoldOperands llvm-svn: 229227	2015-02-14 02:55:57 +00:00
Matt Arsenault	9998168982	R600/SI: Fix copies from SGPR to VCC This shows up without optimizations when vcc is required to be used. llvm-svn: 229226	2015-02-14 02:55:56 +00:00
Matt Arsenault	834b1aa806	R600/SI: Add hack to copy from a VGPR to VCC This hopefully should be fixed when VReg_1 is removed. llvm-svn: 229225	2015-02-14 02:55:54 +00:00
Duncan P. N. Exon Smith	5bedaf934f	PowerPC: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229224	2015-02-14 02:54:07 +00:00
Matt Arsenault	f417ff8f2a	R600/SI: Fix size of VReg_1 This is really a 32-bit register, if we try to check the size of it, we want 32-bits. llvm-svn: 229223	2015-02-14 02:51:44 +00:00
Duncan P. N. Exon Smith	8480c87ce6	R600: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229222	2015-02-14 02:45:45 +00:00
Duncan P. N. Exon Smith	2e75314352	Mips: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229221	2015-02-14 02:37:48 +00:00
Duncan P. N. Exon Smith	2cff9e19a2	ARM: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229220	2015-02-14 02:24:44 +00:00
Duncan P. N. Exon Smith	003bb7d96e	AArch64: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229218	2015-02-14 02:09:06 +00:00
Duncan P. N. Exon Smith	5975a703e6	X86: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229214	2015-02-14 01:59:52 +00:00
Ahmed Bougacha	8f2b4f0be8	[X86] Factor out the CMOV pseudo definitions. NFCI. llvm-svn: 229206	2015-02-14 01:36:53 +00:00
Matthias Braun	33cc10724d	Revert "On ELF, put PIC jump tables in a non executable section." This reverts commit r228939. The commit broke something in the output of exception handling tables on darwin x86-64. llvm-svn: 229203	2015-02-14 01:16:54 +00:00
Eric Christopher	b2a5fa98e4	Use the template method to grab the target specific subtarget. llvm-svn: 229191	2015-02-14 00:09:46 +00:00
Eric Christopher	fcd3d87ad8	The base pointer save offset can be computed at initialization time, do so and fix up the calls. llvm-svn: 229169	2015-02-13 22:48:53 +00:00
Eric Christopher	a10d58dba8	Move the target machine variable so that it's initialized early enough we can use it to initialize frame lowering. llvm-svn: 229168	2015-02-13 22:48:51 +00:00
Eric Christopher	e8dbfe1cf8	Stash the TargetMachine on the subtarget so we can access it later. Clean up a subtarget function that has it passed in while we're at it. llvm-svn: 229164	2015-02-13 22:23:04 +00:00
Eric Christopher	a4ae213193	PPC LinkageSize can be computed at initialization time, do so. llvm-svn: 229163	2015-02-13 22:22:57 +00:00
Sanjay Patel	baa6bc378f	[SSE/AVX] Use multiclasses to reduce the mass of scalar math patterns; NFCI This takes the preposterous number of patterns in this section that were last added to in r219033 down to just plain obnoxious. With a little more work, we might get this down to just comical. I've added more test cases to the existing file that checks these patterns, but it seems that some of these patterns simply don't exist with today's shuffle lowering. llvm-svn: 229158	2015-02-13 21:52:42 +00:00
Sanjay Patel	34da52a894	fix typos; NFC llvm-svn: 229155	2015-02-13 21:07:22 +00:00
Tom Stellard	e1e4a2d310	R600/SI: Refactor SOP1 classes llvm-svn: 229152	2015-02-13 21:02:37 +00:00
Tom Stellard	6c65e9a99a	R600/SI: Lowercase register names llvm-svn: 229151	2015-02-13 21:02:36 +00:00
Tom Stellard	d09fa9cec8	R600/SI: Remove some unused TableGen classes llvm-svn: 229150	2015-02-13 21:02:33 +00:00
Vasileios Kalintiris	99eeb8aae4	[mips] Refactor and simplify MipsSEDAGToDAGISel::selectIntAddrLSL2MM(). NFC. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7618 llvm-svn: 229140	2015-02-13 19:14:22 +00:00
Vasileios Kalintiris	46963f6e73	[mips] Use isa<> instead of dyn_cast<> with unused value. NFC. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7615 llvm-svn: 229138	2015-02-13 19:12:16 +00:00
Matt Arsenault	774e20b42a	R600/SI: Remove handling of fpimm llvm-svn: 229136	2015-02-13 19:05:07 +00:00
Matt Arsenault	11a4d6774b	R600/SI: Allow f64 inline immediates in i64 operands This requires considering the size of the operand when checking immediate legality. llvm-svn: 229135	2015-02-13 19:05:03 +00:00
Jozef Kolek	650a61a943	[mips][microMIPS] Delay slot filler: Replace the microMIPS JR with the JRC This patch adds functionality in MIPS delay slot filler such as if delay slot filler have to put NOP instruction into the delay slot of microMIPS JR instruction, then instead of emitting NOP this instruction is replaced by compact jump instruction JRC. Differential Revision: http://reviews.llvm.org/D7522 llvm-svn: 229128	2015-02-13 17:51:27 +00:00
Toma Tabacu	16a74499af	[mips] Improve support for the .set at/noat assembler directives. Summary: Made the following changes: Added calls to emitDirectiveSetNoAt() and emitDirectiveSetAt(). Added special emit function for .set at=$reg, emitDirectiveSetAtWithArg(unsigned RegNo). Improved parsing error checks for .set at. Refactored parser code for .set at. Improved testing of both directives. Improved code readability and comments. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7176 llvm-svn: 229097	2015-02-13 10:30:57 +00:00
Chandler Carruth	30d69c2e36	[PM] Remove the old 'PassManager.h' header file at the top level of LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094	2015-02-13 10:01:29 +00:00
Chandler Carruth	71f308adb7	Re-sort #include lines using my handy dandy ./utils/sort_includes.py script. This is in preparation for changes to lots of include lines. llvm-svn: 229088	2015-02-13 09:09:03 +00:00
Craig Topper	916708f152	[X86] Add support for parsing and printing the mnemonic aliases for the XOP VPCOM instructions. llvm-svn: 229078	2015-02-13 07:42:25 +00:00
Craig Topper	007a713ebf	Fix a typo in a comment. NFC llvm-svn: 229071	2015-02-13 06:07:29 +00:00
Craig Topper	4e0700f365	[X86] Remove int_x86_sse2_psll_dq_bs and int_x86_sse2_psrl_dq_bs intrinsics. The builtins aren't used by clang. llvm-svn: 229069	2015-02-13 06:07:24 +00:00
Matt Arsenault	63bef0d177	R600/SI: Remove unnecessary check for fpimm llvm-svn: 229034	2015-02-13 02:47:22 +00:00
Eric Christopher	dc3a8a4a66	PPCFrameLowering's FramePointerOffset can be computed at initialization time. Do so. llvm-svn: 228998	2015-02-13 00:39:38 +00:00
Eric Christopher	736d39e189	The TOC save offset can be computed at compile time, do so and propagate changes. llvm-svn: 228997	2015-02-13 00:39:36 +00:00
Eric Christopher	f71609b5dd	The return save offset can be computed at initialization time - do so and save the value. llvm-svn: 228996	2015-02-13 00:39:27 +00:00
David Majnemer	a12fcb790f	X86: Don't crash if we can't decode the pshufb mask Constant pool entries are uniqued by their contents regardless of their type. This means that a pshufb can have a shuffle mask which isn't a simple array of bytes. The code path which attempts to decode the mask didn't check for failure, causing PR22559. llvm-svn: 228979	2015-02-12 23:26:26 +00:00
Rafael Espindola	e4bcad4754	Learn that __DATA,__objc_classrefs is not atomized via symbols. This should hopefully fix objc on AArch64. llvm-svn: 228976	2015-02-12 23:11:59 +00:00
Olivier Sallenave	05e69157b6	Change max interleave factor to 12 for POWER7 and POWER8. llvm-svn: 228973	2015-02-12 22:57:58 +00:00
Rafael Espindola	3105fd8335	Remove mostly unused setters. Most of the code was setting the TargetOptions directly. llvm-svn: 228961	2015-02-12 21:16:34 +00:00
Reed Kotler	aa150ed780	Add bulk of returning of values to Mips fast-isel Summary: Implement the bulk of returning values in Mips fast-isel Test Plan: reatabi.ll Passes test-suite at -O0,-O2 and with mips32r2 and mips32r1. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D5920 llvm-svn: 228958	2015-02-12 21:05:12 +00:00
Simon Pilgrim	295eaad2b3	Relaxed over-zealous alignment requirement for VEX-encoded AES instructions llvm-svn: 228953	2015-02-12 20:01:03 +00:00
Rafael Espindola	203c5b9f39	On ELF, put PIC jump tables in a non executable section. Fixes PR22558. llvm-svn: 228939	2015-02-12 17:46:49 +00:00
Rafael Espindola	29786d4c16	Put each jump table in an independent section if the function is too. This allows the linker to GC both, fixing pr22557. llvm-svn: 228937	2015-02-12 17:16:46 +00:00
Benjamin Kramer	5f6a907288	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line with countTrailingZeros Update all callers. llvm-svn: 228930	2015-02-12 15:35:40 +00:00
Michael Kuperstein	f4d1aca568	[X86] Call frame optimization - allow stack-relative movs to be folded into a push Since we track esp precisely, there's no reason not to allow this. llvm-svn: 228924	2015-02-12 14:17:35 +00:00
Asiri Rathnayake	e045e378ad	ARM: Fix another regression introduced in r223113 The changes in r223113 (ARM modified-immediate syntax) have broken instructions like: mov r0, #~0xffffff00 The problem is that I've added a spurious range check on the immediate operand to ensure that it lies between INT32_MIN and UINT32_MAX. While this range check is correct in theory, it causes problems because the operand is stored in an int64_t (by MC). So valid 32-bit constants like \#~0xffffff00 become out of range. The solution is to simply remove this range check. It is not possible to validate the range of the immediate operand with the current setup because: 1) The operand is stored in an int64_t by MC, 2) The immediate can be of the forms #imm, #-imm, #~imm or even #((~imm)) etc. So we just chop the value to 32 bits and use it. Also noted that the original range check was note tested by any of the unit tests. I've added a new test to cover #~imm kind of operands. Change-Id: I411e90d84312a2eff01b732bb238af536c4a7599 llvm-svn: 228920	2015-02-12 13:37:28 +00:00
Elena Demikhovsky	d2cb3c8876	AVX-512: Fixed the "test" operation for i1 type Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits. KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits. I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB. There are some cases where i1 is in the mask register and the upper bits are already zeroed. Then KORTESTW is the better solution, but it is subject for optimization. Meanwhile, I'm fixing the correctness issue. llvm-svn: 228916	2015-02-12 08:40:34 +00:00
Michael Kuperstein	db95d04be4	[X86] A heuristic to estimate the size impact for converting stack-relative parameter movs to pushes This gives a rough estimate of whether using pushes instead of movs is profitable, in terms of size. We go over all calls in the MachineFunction and compute: a) For each callsite that can not use pushes, the penalty of not having a reserved call frame. b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack). Differential Revision: http://reviews.llvm.org/D7561 llvm-svn: 228915	2015-02-12 08:36:35 +00:00
Hal Finkel	7a0516ea66	[PowerPC] Mark jumps as expensive (using using CR bits) On PowerPC, which has a full set of logical operations on (its multiple sets of) condition-register bits, it is not profitable to break of complex conditions feeding a jump into multiple jumps. We can turn off this feature of CGP/SDAGBuilder by marking jumps as "expensive". P7 test-suite speedups (no regressions): MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.626647% +/- 0.323583% MultiSource/Benchmarks/Olden/power/power -18.2821% +/- 8.06481% llvm-svn: 228895	2015-02-12 01:02:52 +00:00
Tom Stellard	0648588e7d	R600/SI: Disable subreg liveness This is temporary while we try to fix a crash in the register coalescer. llvm-svn: 228861	2015-02-11 18:24:53 +00:00
Tom Stellard	de5b7b180a	R600: Split AMDGPUPassConfig into R600PassConfig and GCNPassConfig llvm-svn: 228850	2015-02-11 17:11:51 +00:00
Tom Stellard	c65b36061a	R600: Create an R600TargetMachine for pre-gcn GPUs No functinality change. R600TargetMachine inherits from AMDGPUTargetMachine. llvm-svn: 228849	2015-02-11 17:11:50 +00:00
Daniel Sanders	a19216c8f4	[mips] Merge disassemblers into a single implementation. Summary: Currently we have Mips32 and Mips64 disassemblers and this causes the target triple to affect the disassembly despite all the relevant information being in the ELF header. These implementations do not need to be separate. This patch merges them together such that the appropriate tables are checked for the subtarget (e.g. Mips64 is checked when GP64 is enabled). Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7498 llvm-svn: 228825	2015-02-11 11:28:56 +00:00
Michael Kuperstein	1921d3d6f3	[X86] Split information collection from actual transformation in call frame optimization This splits collecting information from actually performing the transformation, so that we can add a heuristic in between the two. NFC. Differential Revision: http://reviews.llvm.org/D7497 llvm-svn: 228817	2015-02-11 08:53:55 +00:00
Arnaud A. de Grandmaison	de79026d5e	[PBQP] Cautiously update edge costs in the solver The NodeMetadata are maintained in an incremental way. When an edge between 2 nodes has its cost updated, in the course of graph reduction for example, the NodeMetadata need first to have the old edge cost removed, then the new edge cost added. Only once the NodeMetadata have been fully updated, it becomes safe to consider promoting the nodes to the ConservativelyAllocatable or OptimallyReducible sets. Previously, this promotion was occuring right after the removing the old cost, and this was breaking the assumption that a ConservativelyAllocatable should not be spilled. This patch also adds asserts to: - enforces the invariant that a node's reduction can not be downgraded, - only not provably allocatable or optimally reducible nodes can be spilled. llvm-svn: 228816	2015-02-11 08:25:36 +00:00
Zachary Turner	3bd47cee78	Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects. This allows IDEs to recognize the entire set of header files for each of the core LLVM projects. Differential Revision: http://reviews.llvm.org/D7526 Reviewed By: Chris Bieneman llvm-svn: 228798	2015-02-11 03:28:02 +00:00
Tom Stellard	94b7231740	R600/SI: Store immediate offsets > 12-bits in soffset This will save us from having to extend these offsets to 64-bits and storing them in a pair of vgprs. llvm-svn: 228776	2015-02-11 00:34:35 +00:00
Tom Stellard	c53861ab84	R600/SI: Add soffset operand to mubuf addr64 instruction We were previously hard-coding soffset to 0. llvm-svn: 228775	2015-02-11 00:34:32 +00:00
David Majnemer	ca19485f08	X86: @llvm.frameaddress should defer to SelectionDAG for Win CFI llvm-svn: 228754	2015-02-10 22:00:34 +00:00
David Majnemer	13d0b11d7b	X86: Make @llvm.frameaddress work correctly with Windows unwind codes Simply loading or storing the frame pointer is not sufficient for Windows targets. Instead, create a synthetic frame object that we will lower later. References to this synthetic object will be replaced with the correct reference to the frame address. llvm-svn: 228748	2015-02-10 21:22:05 +00:00
Bill Schmidt	67f36bd0d8	Fix up r228725, missed change in PPCSubtarget definition llvm-svn: 228728	2015-02-10 19:31:55 +00:00
Bill Schmidt	82f1c775a0	[PowerPC] Fix reverted patch r227976 to avoid register assignment issues See full discussion in http://reviews.llvm.org/D7491. We now hide the add-immediate and call instructions together in a separate pseudo-op, which is tagged to define GPR3 and clobber the call-killed registers. The PPCTLSDynamicCall pass prior to RA now expands this op into the two separate addi and call ops, with explicit definitions of GPR3 on both instructions, and explicit clobbers on the call instruction. The pass is now marked as requiring and preserving the LiveIntervals and SlotIndexes analyses, and fixes these up after the replacement sequences are introduced. Self-hosting has been verified on LE P8 and BE P7 with various optimization levels, etc. It has also been verified with the --no-tls-optimize flag workaround removed. llvm-svn: 228725	2015-02-10 19:09:05 +00:00
David Majnemer	a7d908eb2b	X86: Emit Win64 SaveXMM opcodes at the right offset in the right order Walk the instructions marked FrameSetup and consider any stores of XMM registers to the stack as needing a SaveXMM opcode. This fixes PR22521. Differential Revision: http://reviews.llvm.org/D7527 llvm-svn: 228724	2015-02-10 19:01:47 +00:00
Hal Finkel	57c6ac5e41	[PowerPC] Support the (old) cntlz instruction alias Some old assembly code uses the cntlz alias for cntlzw, binutils supports this, and we should too. Fixes PR22519. llvm-svn: 228719	2015-02-10 18:45:02 +00:00
Colin LeMahieu	404d5b242d	[Hexagon] Adding vector load with post-increment instructions. Adding decoder function for 64bit control register class. llvm-svn: 228708	2015-02-10 16:59:36 +00:00
Zoran Jovanovic	416886793f	[mips][microMIPS] Implement movep instruction Differential Revision: http://reviews.llvm.org/D7465 llvm-svn: 228703	2015-02-10 16:36:20 +00:00
Simon Pilgrim	d142ab7d08	[X86][AVX2] Missing AVX2 memory folding instructions Added most of the missing vector folding patterns for AVX2 (as well as fixing the vpermpd and verpmq patterns) Differential Revision: http://reviews.llvm.org/D7492 llvm-svn: 228688	2015-02-10 13:22:57 +00:00
Simon Pilgrim	cd32254a35	[X86][XOP] Added XOP memory folding patterns + tests This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc. Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources. Differential Revision: http://reviews.llvm.org/D7484 llvm-svn: 228685	2015-02-10 12:57:17 +00:00
Jozef Kolek	d68d424abf	[mips][microMIPS] Fix disassembling of 16-bit microMIPS instructions LWM16 and SWM16 Differential Revision: http://reviews.llvm.org/D7436 llvm-svn: 228683	2015-02-10 12:41:13 +00:00
Andrea Di Biagio	62622d2396	[X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX. This patch teaches X86FastISel how to select AVX instructions for scalar float/double convert operations. Before this patch, X86FastISel always selected legacy SSE instructions for FPExt (from float to double) and FPTrunc (from double to float). For example: \code define double @foo(float %f) { %conv = fpext float %f to double ret double %conv } \end code Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is legacy SSE: cvtss2sd %xmm0, %xmm0 With this patch, X86FastIsel selects a VCVTSS2SDrr instead: vcvtss2sd %xmm0, %xmm0, %xmm0 Added test fast-isel-fptrunc-fpext.ll to check both the register-register and the register-memory float/double conversion variants. Differential Revision: http://reviews.llvm.org/D7438 llvm-svn: 228682	2015-02-10 12:04:41 +00:00
Craig Topper	9e71b82f40	[X86] Preserve mem refs on newly created 'Store' node instead of 'Load' node when handling store unfolding. Bug spotted by Steve King. I have no idea how to test this. llvm-svn: 228672	2015-02-10 06:29:28 +00:00
Craig Topper	f7e92f10b6	[X86] Remove unnecessary alignment checks from the load folding tables. llvm-svn: 228671	2015-02-10 05:10:50 +00:00
David Majnemer	93c22a45be	X86: Emit an ABI compliant prologue and epilogue for Win64 Win64 has specific contraints on what valid prologues and epilogues look like. This constraint is born from the flexibility and descriptiveness of Win64's unwind opcodes. Prologues previously emitted by LLVM could not be represented by the unwind opcodes, preventing operations powered by stack unwinding to successfully work. Differential Revision: http://reviews.llvm.org/D7520 llvm-svn: 228641	2015-02-10 00:57:42 +00:00
Eric Christopher	d49868080e	Migrate PPCAsmPrinter's subtarget from reference to pointer in preparation for making it MachineFunction dependent. llvm-svn: 228638	2015-02-10 00:44:17 +00:00
David Blaikie	36a036909c	Fix the clang -Werror build (-Wunused-variable) llvm-svn: 228635	2015-02-10 00:16:36 +00:00
Colin LeMahieu	328b1633d7	[Hexagon] Adding missing load instructions and removing an unused multiclass parameter. llvm-svn: 228630	2015-02-09 23:45:24 +00:00
Colin LeMahieu	4282e7cffd	[Hexagon] Factoring classes out of some load patterns and deleting some unused ones. llvm-svn: 228627	2015-02-09 23:05:44 +00:00
Colin LeMahieu	4fd203d3e1	[Hexagon] Removing more V4 predicates since V4 is the required minimum. llvm-svn: 228614	2015-02-09 21:56:37 +00:00
Colin LeMahieu	641c24b9bf	[Hexagon] Removing v2-4 flags. V4 is the minimum supported version. llvm-svn: 228605	2015-02-09 21:07:35 +00:00
Colin LeMahieu	955c4ff9c3	[Hexagon] Factoring classes out of store patterns. llvm-svn: 228602	2015-02-09 20:33:46 +00:00
Colin LeMahieu	ab5a8d6070	[Hexagon] Formatting v5 TD file. Removing commented defs. llvm-svn: 228598	2015-02-09 20:03:42 +00:00
Colin LeMahieu	38e6689276	[Hexagon] Cleaning up definition formatting. llvm-svn: 228593	2015-02-09 19:24:44 +00:00
Kit Barton	0b0cdb1cd4	This change implements the following three logical vector operations: veqv (vector equivalence) vnand vorc I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions. Phabricator review: http://reviews.llvm.org/D7469 llvm-svn: 228580	2015-02-09 17:03:18 +00:00
Sanjay Patel	a7b893d5c0	rename variable to give it some meaning; remove obvious comments; NFC llvm-svn: 228579	2015-02-09 16:30:58 +00:00
Sanjay Patel	fc54c61c56	fix comment that didn't match the code; remove unnecessary braces; NFC llvm-svn: 228578	2015-02-09 16:04:52 +00:00
Craig Topper	141e65e69c	[X86] Remove 256-bit and 512-bit memop pattern fragments. They are no longer used. llvm-svn: 228563	2015-02-09 04:04:53 +00:00
Craig Topper	820d49270d	[X86] Remove 'memop' uses from AVX512. Use 'load' instead. llvm-svn: 228562	2015-02-09 04:04:50 +00:00
Craig Topper	68ab0465a0	[X86] Remove the remaining uses of memop from AVX and AVX2 instruction patterns. AVX and AVX2 can handle unaligned loads being folded so we can just use 'load' llvm-svn: 228551	2015-02-08 22:38:25 +00:00
Sanjay Patel	3510bc7162	fix typos; NFC llvm-svn: 228529	2015-02-08 18:54:22 +00:00
Simon Pilgrim	d11b013623	Moved AVX2 vbroadcast (reg) instruction foldings under the correct grouping. NFC. llvm-svn: 228526	2015-02-08 17:13:54 +00:00
Tim Northover	45aa89c925	ARM & AArch64: teach LowerVSETCC that output type size may differ from input. While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518	2015-02-08 00:50:47 +00:00
Craig Topper	e169c57188	[X86] Add register use/def for wrmsr and rdmsr. llvm-svn: 228515	2015-02-07 23:36:51 +00:00
Craig Topper	1d472db8cc	[X86] Add GETSEC instruction. llvm-svn: 228514	2015-02-07 23:36:36 +00:00
Simon Pilgrim	a2618679a8	[X86][AVX] Added missing stack folding support + test for vptest ymm instruction llvm-svn: 228509	2015-02-07 21:44:06 +00:00
Andrea Di Biagio	4f8bdcb738	Fix typos; NFC. llvm-svn: 228493	2015-02-07 13:56:20 +00:00
Hal Finkel	291cc7bacd	[PowerPC] Handle loop predecessor invokes If a loop predecessor has an invoke as its terminator, and the return value from that invoke is used to determine the loop iteration space, then we can't insert a computation based on that value in the loop predecessor prior to the terminator (oops). If there's such an invoke, or just no predecessor for that matter, insert a new loop preheader. llvm-svn: 228488	2015-02-07 07:32:58 +00:00
Ahmed Bougacha	df956a2e78	[AArch64] Use the source location of the IR branch when creating Bcc from a conditional branch fed by an add/sub/mul-with-overflow node. We previously used the SDLoc of the overflow node, for no good reason. In some cases, this led to the Bcc and B terminators having different source orders, and DBG_VALUEs being inserted between them. The real issue is with the code that can't handle DBG_VALUEs between terminators: the few places affected by this will be fixed soon. In the meantime, fixing the SDLoc is a positive change no matter what. No tests, as I have no idea how to get .loc emitted for branches? rdar://19347133 llvm-svn: 228463	2015-02-06 23:15:39 +00:00
Hal Finkel	0d2a1515d5	Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has already been done), this still breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native). Bill is currently working on an alternate implementation to address the TLS issue in a way that also fully elides the linker bug (which, unfortunately, this approach did not fully), so I'm reverting this now. llvm-svn: 228460	2015-02-06 23:07:40 +00:00
Sanjay Patel	3d982214b0	use local variables; NFC llvm-svn: 228452	2015-02-06 22:43:52 +00:00

1 2 3 4 5 ...

32069 Commits