llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	0e75a06451	R600/SI: Rough first implementation of shouldClusterLoads llvm-svn: 217968	2014-09-17 17:48:30 +00:00
Alexey Samsonov	cce5701cdb	Fix float division-by-zero in R600 scheduler. This bug was reported by UBSan. llvm-svn: 217967	2014-09-17 17:47:21 +00:00
Juergen Ributzka	fb3e14375a	[FastISel][AArch64] Improve branch selection to support all FP conditions. This adds the last two missing floating-point condition codes (FCMP_UEQ and FCMP_ONE) also to the branch selection. In these two cases an additonal branch instruction is required. This also adds unit tests to checks all the different condition codes. This is related o rdar://problem/18358882. llvm-svn: 217966	2014-09-17 17:46:47 +00:00
Robin Morisset	1c8a457575	[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors Summary: I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5304 llvm-svn: 217965	2014-09-17 17:41:16 +00:00
Matt Arsenault	02dc26529e	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. llvm-svn: 217964	2014-09-17 17:32:13 +00:00
Matt Arsenault	253e5da7ad	R600/SI: Remove promotion of instructions to e64 forms. Instructions are now generally selected to the e64 forms originally, and shrunk down later. Rename foldOperands to legalizeOperands, since that's really most of what it tries to do. llvm-svn: 217959	2014-09-17 15:35:43 +00:00
Yaron Keren	559b47d051	Add and update reset() and doInitialization() methods to MC* and passes. This enables reusing a PassManager instead of re-constructing it every time. llvm-svn: 217948	2014-09-17 09:25:36 +00:00
Toma Tabacu	351b2feeb3	[mips] Add assembler support for the .set nodsp directive. Summary: This directive is used to tell the assembler to reject DSP-specific instructions. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5142 llvm-svn: 217946	2014-09-17 09:01:54 +00:00
Pavel Chupin	37b65d81dd	[x32] Fix function indirect calls Summary: Zero-extend register to 64-bit for callq/jmpq. Test Plan: 3 tests added Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5355 llvm-svn: 217942	2014-09-17 07:09:23 +00:00
Richard Trieu	1fbe1a8ba7	\| -> \|\| No functional change. llvm-svn: 217934	2014-09-17 01:47:52 +00:00
Robin Morisset	25c8e318e4	[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. llvm-svn: 217928	2014-09-17 00:06:58 +00:00
Matt Arsenault	6652403c2d	Fix typo llvm-svn: 217892	2014-09-16 18:00:23 +00:00
Adam Nemet	0c7caf434f	[X86] Improve comment llvm-svn: 217885	2014-09-16 17:14:10 +00:00
Moritz Roth	eef9f4dc74	ARM load/store optimizer: Don't materialize a new base register with ADDS/SUBS unless it's safe to clobber the condition flags. If the merged instructions are in a range where the CPSR is live, e.g. between a CMP -> Bcc, we can't safely materialize a new base register. This problem is quite rare, I couldn't come up with a test case and I've never actually seen this happen in the tests I'm running - there is a potential trigger for this in LNT/oggenc (spills being inserted between a CMP/Bcc), but at the moment this isn't being merged. I'll try to reduce that into a small test case once I've committed my upcoming patch to make merging less conservative. llvm-svn: 217881	2014-09-16 16:25:07 +00:00
Toma Tabacu	65f1057191	[mips] Improve the error messages given by MipsAsmParser. Summary: Changed error messages to be more informative and to resemble other clang/llvm error messages (first letter is lower case, no ending punctuation) and updated corresponding tests. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5065 llvm-svn: 217873	2014-09-16 15:00:52 +00:00
Toma Tabacu	18227e6f20	[mips] Move 32-bit ADDiu instruction alias from Mips64InstrInfo.td to MipsInstrInfo.td. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5244 llvm-svn: 217868	2014-09-16 10:19:03 +00:00
Toma Tabacu	25cdd222b0	[mips] Marked the ADDi instruction aliases as not available in Mips32R6 and Mips64R6. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5242 llvm-svn: 217867	2014-09-16 09:26:09 +00:00
Joe Abbey	8e72eb780e	ARMAsmBackend uses a factory method to generate binary file format specific objects. There were a few FIXMEs in ARMAsmBackend.cpp suggesting the class definitions should be in a separate file. Starting with ARMAsmBackend, the class definition has been put in a header file, and #includes reduced. Each sub-type of ARMAsmBackend is now in its own header file. Derived types have been painted with a different color of bike-shed: s/DarwinARMAsmBackend/ARMAsmBackendDarwin/g s/ARMWinCOFFAsmBackend/ARMAsmBackendWinCOFF/g s/ELFARMAsmBackend/ARMAsmBackendELF/g Finally, clang-format has been run across ARMAsmBackend.cpp llvm-svn: 217866	2014-09-16 09:18:23 +00:00
Elena Demikhovsky	27012478d2	AVX-512: added cost for some AVX-512 instructions llvm-svn: 217863	2014-09-16 07:57:37 +00:00
Chandler Carruth	429c29d187	[x86] Remove a FIXME that doesn't make any sense. Only the lanes feeding the blend that is matched by this are "used" in any sense, and so any build_vector or other nodes feeding these will already drop other lanes. llvm-svn: 217855	2014-09-16 02:16:42 +00:00
Chandler Carruth	b1c024a2de	[x86] Cleanup an unused variable by actually using it in the non-asserts place where it was needed. llvm-svn: 217854	2014-09-16 02:14:51 +00:00
Chandler Carruth	74acb46d26	[x86] Remove the last vestiges of the BLENDI-based ADDSUB pattern matching. This design just fundamentally didn't work because ADDSUB is available prior to any legal lowerings of BLENDI nodes. Instead, we have a dedicated ADDSUB synthetic ISD node which is pattern matched trivially into the instructions. These nodes are then recognized by both the existing and a trivial new lowering combine in the backend. Removing these patterns required adding 2 missing shuffle masks to the DAG combine, without which tests would have failed. Added the masks and a helpful assert as well to catch if anything ever goes wrong here. llvm-svn: 217851	2014-09-16 00:39:08 +00:00
Juergen Ributzka	59e631c728	[FastISel][AArch64] Add vector support to argument lowering. Lower the first 8 vector arguments too. llvm-svn: 217850	2014-09-16 00:25:30 +00:00
Chandler Carruth	f845e89425	[x86] As a follow-up to r217819, don't check for VSELECT legality now that we don't use VSELECT and directly emit an addsub synthetic node. Also remove a stale comment referencing VSELECT. The test case is updated to use 'core2' which only has SSE3, not SSE4.1, and it still passes. Previously it would not because we lacked sufficient blend support to legalize the VSELECT. llvm-svn: 217849	2014-09-16 00:24:42 +00:00
Chandler Carruth	de5f2b356b	[x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and ADDSUBPD nodes out of blends of adds and subs. This allows us to actually form these instructions with SSE3 rather than only forming them when we had both SSE3 for the ADDSUB instructions and SSE4.1 for the blend instructions. ;] Kind-of important. I've adjusted the CPU requirements on one of the tests to demonstrate this kicking in nicely for an SSE3 cpu configuration. llvm-svn: 217848	2014-09-16 00:15:20 +00:00
Juergen Ributzka	de47c47cc1	[FastISel][AArch64] Allow handling of vectors during return lowering for little endian machines. Allow handling of vectors during return lowering at least for little endian machines. This was restricted in r208200 to fix it for big endian machines (according to the comment), but it also disabled it for little endian too. llvm-svn: 217846	2014-09-15 23:40:10 +00:00
Juergen Ributzka	b9e49c73ee	[FastISel][AArch64] Update function and variable names to follow the coding standard. NFC. llvm-svn: 217845	2014-09-15 23:20:17 +00:00
Juergen Ributzka	cbe802e730	[FastISel][AArch64] Make AArch64FastISel class final. NFC. llvm-svn: 217840	2014-09-15 22:33:11 +00:00
Juergen Ributzka	993224a553	[FastISel][AArch64] Lower sin/cos/pow to runtime lib calls. Also lower sin/cos/pow to runtime lib calls. This fixes rdar://problem/18343468. llvm-svn: 217839	2014-09-15 22:33:06 +00:00
Juergen Ributzka	afa034fb61	[FastISel][AArch64] Add lowering support for frem. This lowers frem to a runtime libcall inside fast-isel. The test case also checks the CallLoweringInfo bug that was exposed by this change. This fixes rdar://problem/18342783. llvm-svn: 217833	2014-09-15 22:07:49 +00:00
Juergen Ributzka	e1779e2a8b	[FastISel][AArch64] Refactor selectAddSub, selectLogicalOp, and SelectShift. NFC. Small refactor to tidy up the code a little. llvm-svn: 217827	2014-09-15 21:27:56 +00:00
Juergen Ributzka	6127b1968d	[FastISel][AArch64] Refactor code to use isTypeSupported. NFC. Gets rid of isLoadStoreTypeLegal and replace it with isTypeSupported. llvm-svn: 217826	2014-09-15 21:27:54 +00:00
Juergen Ributzka	8984f48d89	[FastISel][AArch64] Improve floating-point compare support. Add support for the last two missing fcmp condition codes: UEQ and ONE. This fixes rdar://problem/18341575. llvm-svn: 217823	2014-09-15 20:47:16 +00:00
Juergen Ributzka	d111d29f90	[FastISel] Move optimizeCmpPredicate to FastISel base class. NFC. Make the optimizeCmpPredicate function available to all targets. llvm-svn: 217822	2014-09-15 20:47:13 +00:00
Reed Kotler	32be74b178	Add mips32 r1 to the list of supported targets for Mips fast-isel Summary: Expand list of supported targets for Mips to include mips32 r1. Previously it only include r2. More patches are coming where there is a difference but in the current patches as pushed upstream, r1 and r2 are equivalent. Test Plan: simplestorefp1.ll add new build bots at mips to test this flavor at both -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5306 llvm-svn: 217821	2014-09-15 20:30:25 +00:00
Chandler Carruth	204ad4c613	[x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions by introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). llvm-svn: 217819	2014-09-15 20:09:47 +00:00
Rafael Espindola	6865d6f08a	Fix a lot of confusion around inserting nops on empty functions. On MachO, and MachO only, we cannot have a truly empty function since that breaks the linker logic for atomizing the section. When we are emitting a frame pointer, the presence of an unreachable will create a cfi instruction pointing past the last instruction. This is perfectly fine. The FDE information encodes the pc range it applies to. If some tool cannot handle this, we should explicitly say which bug we are working around and only work around it when it is actually relevant (not for ELF for example). Given the unreachable we could omit the .cfi_def_cfa_register, but then again, we could also omit the entire function prologue if we wanted to. llvm-svn: 217801	2014-09-15 18:32:58 +00:00
Akira Hatanaka	760814a7e1	[X86] Fix a bug in X86's peephole optimization. Peephole optimization was folding MOVSDrm, which is a zero-extending double precision floating point load, into ADDPDrr, which is a SIMD add of two packed double precision floating point values. (before) %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21 %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21 (after) %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20 X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this from happening. However the check wasn't being conducted for loads from stack objects. This commit factors out the logic into a new function and uses it for checking loads from stack slots are not zero-extending loads. rdar://problem/18236850 llvm-svn: 217799	2014-09-15 18:23:52 +00:00
Matt Arsenault	49dd4283ed	R600/SI: Prefer selecting more e64 instruction forms. Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789	2014-09-15 17:15:02 +00:00
Matt Arsenault	3f98140c87	R600/SI: Add preliminary support for flat address space llvm-svn: 217777	2014-09-15 15:41:53 +00:00
Matt Arsenault	65f67e4dfe	R600/SI: Fix promote alloca pass breaking addrspacecast llvm-svn: 217776	2014-09-15 15:41:44 +00:00
Matt Arsenault	5c4d8409b3	R600/SI: Enable named operand table for MTBUF There is already code trying to use it for getting the offset. llvm-svn: 217775	2014-09-15 15:41:43 +00:00
Toma Tabacu	fda445cb83	[mips] Use early exit in MipsAsmParser::matchCPURegisterName(). NFC. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5270 llvm-svn: 217774	2014-09-15 15:33:01 +00:00
Toma Tabacu	bbd0eca340	[mips] Marked the DADDiu instruction aliases as MIPS III. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5239 llvm-svn: 217770	2014-09-15 14:47:46 +00:00
Chandler Carruth	707a2e098d	[x86] Begin emitting PBLENDW instructions for integer blend operations when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. llvm-svn: 217767	2014-09-15 12:40:54 +00:00
Chandler Carruth	12d4a70cbd	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. llvm-svn: 217761	2014-09-15 11:26:25 +00:00
Chandler Carruth	41a25dd7ef	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758	2014-09-15 11:15:23 +00:00
Chandler Carruth	35e3b545d6	[x86] Undo a flawed transform I added to form UNPCK instructions when AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. llvm-svn: 217755	2014-09-15 10:35:41 +00:00
Chandler Carruth	44e64b5267	[x86] Teach the new vector shuffle lowering to use 'punpcklwd' and 'punpckhwd' instructions when suitable rather than falling back to the generic algorithm. While we could canonicalize to these patterns late in the process, that wouldn't help when the freedom to use them is only visible during initial lowering when undef lanes are well understood. This, it turns out, is very important for matching the shuffle patterns that are used to lower sign extension. Fixes a small but relevant regression in gcc-loops with the new lowering. When I changed this I noticed that several 'pshufd' lowerings became unpck variants. This is bad because it removes the ability to freely copy in the same instruction. I've adjusted the widening test to handle undef lanes correctly and now those will correctly continue to use 'pshufd' to lower. However, this caused a bunch of churn in the test cases. No functional change, just churn. Both of these changes are part of addressing a general weakness in the new lowering -- it doesn't sufficiently leverage undef lanes. I've at least a couple of patches that will help there at least in an academic sense. llvm-svn: 217752	2014-09-15 09:02:37 +00:00
Chandler Carruth	0a98790b32	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD. These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. llvm-svn: 217744	2014-09-14 23:43:33 +00:00
Chandler Carruth	47ebd24e24	[x86] Teach the vector combiner that picks a canonical shuffle from to support transforming the forms from the new vector shuffle lowering to use 'movddup' when appropriate. A bunch of the cases where we actually form 'movddup' don't actually show up in the test results because something even later than DAG legalization maps them back to 'unpcklpd'. If this shows back up as a performance problem, I'll probably chase it down, but it is at least an encoded size loss. =/ To make this work, also always do this canonicalizing step for floating point vectors where the baseline shuffle instructions don't provide any free copies of their inputs. This also causes us to canonicalize unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space win. There is one test which is "regressed" by this: extractelement-load. There, the test case where the optimization it is testing fails, the exact instruction pattern which results is slightly different. This should probably be fixed by having the appropriate extract formed earlier in the DAG, but that would defeat the purpose of the test.... If this test case is critically important for anyone, please let me know and I'll try to work on it. The prior behavior was actually contrary to the comment in the test case and seems likely to have been an accident. llvm-svn: 217738	2014-09-14 22:41:37 +00:00
James Molloy	05ce999134	[A57FPLoadBalancing] Modify r217689 - actually we do need to check defs ... Just make sure we check uses first so we see the kill first. It turns out ignoring defs gives some pretty nasty runtime failures. I'm certain this is the fix but I'm still reducing a testcase. llvm-svn: 217735	2014-09-14 18:24:26 +00:00
Juergen Ributzka	85c1f84650	[FastISel][AArch64] Add support for non-native types for logical ops. Extend the logical ops selection to also support non-native types such as i1, i8, and i16. Fixes rdar://problem/18330589. llvm-svn: 217732	2014-09-13 23:46:28 +00:00
Matt Arsenault	5d26d04357	Fix typo llvm-svn: 217730	2014-09-13 19:58:27 +00:00
Chad Rosier	347ed4e831	[AArch64] Don't enable the post-RA MI scheduler at OptNone. Hopefully, this will appease the bots. llvm-svn: 217712	2014-09-12 22:17:28 +00:00
Yaron Keren	359907decf	The MCAssembler.h include isn't used. llvm-svn: 217705	2014-09-12 20:29:17 +00:00
Chad Rosier	486e087f26	[AArch64] Enable post-RA MI scheduler. Phabricator Revision: http://reviews.llvm.org/D5278 Patch by Sanjin Sijaric! llvm-svn: 217693	2014-09-12 17:40:39 +00:00
James Molloy	4689647dbb	[A57FPLoadBalancing] Remove support for vector types Vector MUL/MLAs have tied operands, which gives us extra constraints that we currently can't handle. Instead of silently doing the wrong thing, remove support to be readded later properly. llvm-svn: 217690	2014-09-12 16:55:32 +00:00
James Molloy	a6e05a789e	[A57FPLoadBalancing] Ignore <def>s when checking if a chain may be killed. Defs are seen before uses, so a def without the kill flag doesn't necessarily mean that the register is not killed on that instruction. It may be killed in a later use operand. llvm-svn: 217689	2014-09-12 16:55:26 +00:00
James Molloy	f0de7e58f6	[A57LoadBalancing] unique_ptr-ify. Thanks to David Blakie for the in-depth review! llvm-svn: 217682	2014-09-12 14:35:17 +00:00
Zoran Jovanovic	c74e3eb9a6	[mips][microMIPS] Implement JRADDIUSP instruction Differential Revision: http://reviews.llvm.org/D5046 llvm-svn: 217681	2014-09-12 14:29:54 +00:00
Bill Schmidt	b73b370809	Address comments on r217622 llvm-svn: 217680	2014-09-12 14:26:36 +00:00
Zoran Jovanovic	ed6dd6bd39	[mips][microMIPS] Implement BGEZALS and BLTZALS instructions Differential Revision: http://reviews.llvm.org/D5004 llvm-svn: 217678	2014-09-12 13:51:58 +00:00
Zoran Jovanovic	ac9ef12fc5	[mips][microMIPS] Implement JALS and JALRS instructions. Differential Revision: http://reviews.llvm.org/D5003 llvm-svn: 217676	2014-09-12 13:43:41 +00:00
Zoran Jovanovic	4e7ac4ad2a	[mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions Differential Revision: http://reviews.llvm.org/D5211 llvm-svn: 217675	2014-09-12 13:33:33 +00:00
James Molloy	a9f47b6bae	[ARM] Teach the cost model that cross-class copies are costly. Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case. llvm-svn: 217674	2014-09-12 13:29:40 +00:00
Patrik Hagglund	c287f4a358	Fix gcc -Wpedantic. llvm-svn: 217669	2014-09-12 12:32:08 +00:00
Craig Topper	fec61ef391	Remove a temporary variable and just construct a unique_ptr directly using make_unique. llvm-svn: 217655	2014-09-12 05:17:20 +00:00
Matt Arsenault	362f345bab	R600/SI: Fix off by 1 error in used register count The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636	2014-09-11 22:51:37 +00:00
Bill Schmidt	be95fd5357	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622	2014-09-11 20:10:03 +00:00
Brad Smith	2ce0d91bde	Provide an implementation of getNoopForMachoTarget for SPARC. llvm-svn: 217611	2014-09-11 17:40:51 +00:00
Adam Nemet	053c4e825c	[AVX512] Fix miscompile for unpack r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack between the low and the high 256 bits of src1 into the low part of the destination and another unpack of the low and high 256 bits of src2 into the high part of the destination. I don't think that's how unpack works. AVX512 unpack simply has more 128-bit lanes but other than it works the same way as AVX. So in each 128-bit lane, we're always interleaving certain parts of both operands rather different parts of one of the operands. E.g. for this: __v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; __v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; __v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16, 24, 17, 25, 20, 28, 21, 29); we generated punpcklps (notice how the elements of a and b are not interleaved in the shuffle). In turn, c was set to this: 0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29 Obviously this should have just returned the mask vector of the shuffle vector. I mostly reverted this change and made sure the original AVX code worked for 512-bit vectors as well. Also updated the tests because they matched the logic from the code. llvm-svn: 217602	2014-09-11 16:51:10 +00:00
Benjamin Kramer	9e5b4a5827	Move constant-sized bitvector to the stack. llvm-svn: 217600	2014-09-11 15:58:39 +00:00
Aaron Watry	1885e53a75	R600: Add cmpxchg instruction for evergreen Refactored the R600_LDS_1A2D class a bit to get it to actually work. It seemed to be previously unused and broken. We also have to disable the conversion to the noret variant for now in R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops. Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to work for more than 1A1D variants of LDS operations. It's being left as a future TODO for now. Signed-off-by: Aaron Watry <awatry at gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217596	2014-09-11 15:02:54 +00:00
Aaron Watry	21591670c9	R600: Add LDS_WRXCHG[_RET] instructions for Evergreen. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217594	2014-09-11 15:02:49 +00:00
Aaron Watry	564a22e995	R600: Add LDS_MIN_[U]INT[_RET] instructions for Evergreen Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217593	2014-09-11 15:02:47 +00:00
Aaron Watry	e51794f2fa	R600: Add LDS_XOR[_RET] instructions for Evergreen Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217592	2014-09-11 15:02:46 +00:00
Aaron Watry	cffa0114c7	R600: Add LDS_OR[_RET] instructions for Evergreen Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217591	2014-09-11 15:02:44 +00:00
Aaron Watry	a7f122da60	R600: Add LDS_AND[_RET] instructions for Evergreen Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217590	2014-09-11 15:02:43 +00:00
Aaron Watry	62a0af4a0d	R600: Add LDS_MAX_[U]INT[_RET] instructions for Evergreen This was only present for SI before. Cayman may still be missing, but I am unable to test that currently. v2: Don't create atomicrmw max tests in separate file Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217589	2014-09-11 15:02:41 +00:00
Matt Arsenault	61a528adc7	R600/SI: Fix losing chain when fixing reg class of loads. The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561	2014-09-10 23:26:19 +00:00
Matt Arsenault	2e9911205f	R600/SI: Report offset in correct units for st64 DS instructions Need to convert the 64 element offset into bytes, not just the element size like the normal case instructions. Noticed by inspection. This can't be hit now because st64 instructions aren't emitted during instruction selection, and the post-RA scheduler isn't enabled. llvm-svn: 217560	2014-09-10 23:26:16 +00:00
Matt Arsenault	16e313343d	R600: Custom lower frem llvm-svn: 217553	2014-09-10 21:44:27 +00:00
Rafael Espindola	c435adcde0	Add doInitialization/doFinalization to DataLayoutPass. With this a DataLayoutPass can be reused for multiple modules. Once we have doInitialization/doFinalization, it doesn't seem necessary to pass a Module to the constructor. Overall this change seems in line with the idea of making DataLayout a required part of Module. With it the only way of having a DataLayout used is to add it to the Module. llvm-svn: 217548	2014-09-10 21:27:43 +00:00
Gerolf Hoflehner	7b0abb89c2	[AArch64] Revert r216141 for cyclone The increase of the interleave factor to 4 has side-effects like performance losses eg. due to reminder loops being executed more frequently and may increase code size. It requires more analysis and careful heuristic tuning. Expect double digit gains in small benchmarks like lowercase.c and losses in puzzle.c. llvm-svn: 217540	2014-09-10 20:31:57 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Arnaud A. de Grandmaison	0dbcfba659	[AArch64] Address Chad's post commit review comments for r217504 (PBQP experimental support) llvm-svn: 217518	2014-09-10 17:03:25 +00:00
Arnaud A. de Grandmaison	cfb28f77a4	[AArch64] Pacify lld buildbot complaining about an unused static function in release build. llvm-svn: 217505	2014-09-10 14:24:02 +00:00
Arnaud A. de Grandmaison	c75dbbbdd6	[AArch64] Add experimental PBQP support This adds target specific support for using the PBQP register allocator on the AArch64, for the A57 cpu. By default, the PBQP allocator is not used, unless explicitely required on the command line with "-aarch64-pbqp". llvm-svn: 217504	2014-09-10 14:06:10 +00:00
Asiri Rathnayake	369c030633	[AArch 64] Use a constant pool load for weak symbol references when using static relocation model and small code model. Summary: currently we generate GOT based relocations for weak symbol references regardless of the underlying relocation model. This should be change so that in static relocation model we use a constant pool load instead. Patch from: Keith Walker Reviewers: Renato Golin, Tim Northover llvm-svn: 217503	2014-09-10 13:54:38 +00:00
Sid Manning	e7b92f0e81	Add missing HWEncoding to base register class. This change gives tblgen the information needed to fill in the HexagonRegEncodingTable. llvm-svn: 217500	2014-09-10 13:09:25 +00:00
Tim Northover	ba1d704229	ARM: don't size-reduce STMs using the LR register. The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which translates to STMDB, so we shouldn't convert STMIAs. Patch by Sergey Dmitrouk. llvm-svn: 217498	2014-09-10 12:53:28 +00:00
Daniel Sanders	24b6572645	[mips] Remove inverted predicates from MipsSubtarget that were only used by MipsCallingConv.td Summary: No functional change Reviewers: echristo, vmedic Reviewed By: echristo, vmedic Subscribers: echristo, llvm-commits Differential Revision: http://reviews.llvm.org/D5266 llvm-svn: 217494	2014-09-10 12:02:27 +00:00
Daniel Sanders	75ee6b4302	[mips] Return an ArrayRef from MipsCC::intArgRegs() and remove MipsCC::numIntArgRegs() Summary: No functional change. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5265 llvm-svn: 217485	2014-09-10 10:37:03 +00:00
Yuri Gorshenin	3939dec1f7	[asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code. Summary: [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5189 llvm-svn: 217482	2014-09-10 09:45:49 +00:00
Job Noorman	eb19aea4f9	Drop the W postfix on the 16-bit registers. This ensures the inline assembly register constraints are properly recognised in TargetLowering::getRegForInlineAsmConstraint. llvm-svn: 217479	2014-09-10 06:58:14 +00:00
Kai Nacke	d287094566	[MIPS] Add aliases for sync instruction used by Octeon CPU This commit adds aliases for the sync instruction (synciobdma, syncs, syncw, syncws) which are used by the Octeon CPU. Reviewed by D. Sanders llvm-svn: 217477	2014-09-10 06:10:24 +00:00
Craig Topper	7ff1592960	Use cast to MVT instead of EVT on a couple calls to getSizeInBits. llvm-svn: 217473	2014-09-10 04:51:36 +00:00
Sanjay Patel	1191adf4df	Add a scheduling model for AMD 16H Jaguar (btver2). This is a first pass at a scheduling model for Jaguar. It's structured largely on the existing SandyBridge and SLM sched models. Using this model, in addition to turning on the PostRA scheduler, results in some perf wins on internal and 3rd party benchmarks. There's not much difference in LLVM's test-suite benchmarking subset of tests. Differential Revision: http://reviews.llvm.org/D5229 llvm-svn: 217457	2014-09-09 20:07:07 +00:00
Toma Tabacu	2664779b27	[mips] Add assembler support for .set mips0 directive. Summary: This directive is used to reset the assembler options to their initial values. Assembly programmers use it in conjunction with the ".set mipsX" directives. This patch depends on the .set push/pop directive (http://reviews.llvm.org/D4821). Contains work done by Matheus Almeida. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4957 llvm-svn: 217438	2014-09-09 12:52:14 +00:00
Daniel Sanders	2b746bc4ae	[mips] Move MipsTargetLowering::MipsCC::regSize() to MipsSubtarget::getGPRSizeInBytes() Summary: The GPR size is more a property of the subtarget than that of the ABI so move this information to the MipsSubtarget. No functional change. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5009 llvm-svn: 217436	2014-09-09 12:11:16 +00:00
Pavel Chupin	e6617fc6d4	[x32] Emit callq for CALLpcrel32 Summary: In AT&T annotation for both x86_64 and x32 calls should be printed as callq in assembly. It's only a matter of correct mnemonic, object output is ok. Test Plan: trivial test added Reviewers: nadav, dschuff, craig.topper Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5213 llvm-svn: 217435	2014-09-09 11:54:12 +00:00
Daniel Sanders	4abcfe2cda	[mips] Don't cache IsO32 and IsFP64 in MipsTargetLowering::MipsCC Summary: Use a MipsSubtarget reference instead. No functional change. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5008 llvm-svn: 217434	2014-09-09 10:46:48 +00:00
Toma Tabacu	9db22db963	[mips] Add assembler support for .set push/pop directive. Summary: These directives are used to save the current assembler options (in the case of ".set push") and restore the previously saved options (in the case of ".set pop"). Contains work done by Matheus Almeida. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4821 llvm-svn: 217432	2014-09-09 10:15:38 +00:00
Renato Golin	63e27980da	ARM: Negative offset support problem This patch is to permit a negative offset usage for a non frame access. Patch by Igor Oblakov. llvm-svn: 217431	2014-09-09 09:57:59 +00:00
Bob Wilson	b3482af341	Set trunc store action to Expand for all X86 targets. When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack. Patch by Luqman Aden! llvm-svn: 217410	2014-09-09 01:13:36 +00:00
Chad Rosier	bdbca15ccd	[AArch64] Enabled AA support for Cortex-A57. llvm-svn: 217381	2014-09-08 15:34:16 +00:00
Matt Arsenault	69bfb90419	R600/SI: Fix assertion from copying a TargetGlobalAddress Assert in scheduler from an inserted copy_to_regclass from a constant. This only seems to break sometimes when a constant initializer address is forced into VGPRs in a non-entry block. No test since the only case I've managed to hit only happens with a future patch, and that case will also not be a problem once scalar instructions are used in non-entry blocks. llvm-svn: 217380	2014-09-08 15:07:33 +00:00
Matt Arsenault	7ac9c4a074	R600/SI: Replace LDS atomics with no return versions llvm-svn: 217379	2014-09-08 15:07:31 +00:00
Matt Arsenault	9903ccf7ee	R600/SI: Add InstrMapping for noret atomics. Only handles LDS atomics for now, and will be used to replace atomics with no uses with the no return versions. llvm-svn: 217378	2014-09-08 15:07:27 +00:00
Chad Rosier	3528c1e4c6	[AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph. Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! Phabricator Review: http://reviews.llvm.org/D5103 llvm-svn: 217371	2014-09-08 14:43:48 +00:00
Chad Rosier	c9f947744d	[AArch64] Enabled AA support for Cortex-A53. Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! Phabricator Review: http://reviews.llvm.org/D5103 llvm-svn: 217370	2014-09-08 14:31:49 +00:00
Sid Manning	ac3e325d67	Spelling correction Another trivial spelling change. llvm-svn: 217364	2014-09-08 13:05:23 +00:00
Chandler Carruth	0a8151e69a	[x86] Revert my over-eager commit in r217332. I hadn't actually run all the tests yet and these combines have somewhat surprisingly far reaching effects. llvm-svn: 217333	2014-09-07 12:37:11 +00:00
Chandler Carruth	8405e8fff9	[x86] Tweak the rules surrounding 0,0 and 1,1 v2f64 shuffles and add support for MOVDDUP which is really important for matrix multiply style operations that do lots of non-vector-aligned load and splats. The original motivation was to add support for MOVDDUP as the lack of it regresses matmul_f64_4x4 by 5% or so. However, all of the rules here were somewhat suspicious. First, we should always be using the floating point domain shuffles, regardless of how many copies we have to make as a movapd is crazy faster than the domain switching cost on some chips. (Mostly because movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick of the PSHUF instructions, there is no need to avoid canonicalizing on UNPCK variants, so do that canonicalizing. This also ensures we have the chance to form MOVDDUP. =] Second, we assume SSE2 support when doing any vector lowering, and given that we should just use UNPCKLPD and UNPCKHPD as they can operate on registers or memory. If vectors get spilled or come from memory at all this is going to allow the load to be folded into the operation. If we want to optimize for encoding size (the only difference, and only a 2 byte difference) it should be done much later, likely after RA. llvm-svn: 217332	2014-09-07 12:02:14 +00:00
Matt Arsenault	76803bd384	R600/SI: Fix register class for some 64-bit atomics llvm-svn: 217323	2014-09-07 00:46:20 +00:00
Chandler Carruth	373b2b1728	[x86] Fix a pretty horrible bug and inconsistency in the x86 asm parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. llvm-svn: 217310	2014-09-06 10:00:01 +00:00
Chandler Carruth	21d27ee95b	[x86] Fix an embarressing bug in the INSERTPS formation code. The mask computation was totally wrong, but somehow it didn't really show up with llc. I've added an assert that triggers on multiple existing test cases and updated one of them to show the correct value. There appear to still be more bugs lurking around insertps's mask. =/ However, note that this only really impacts the new vector shuffle lowering. llvm-svn: 217289	2014-09-05 23:19:45 +00:00
Toma Tabacu	901ba6ea2e	[mips] Change Feature-related types from unsigned to uint64_t in MipsAsmParser. No functional changes. Summary: Found a couple of cases where unsigned was still being used. These two should be the last ones in the (entire) Mips backend. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5028 llvm-svn: 217257	2014-09-05 16:32:09 +00:00
Matt Arsenault	8ae5961065	R600/SI: Use same complex patterns for DS atomics This fixes hitting the same negative base offset problem that was already fixed for regular loads and stores. llvm-svn: 217256	2014-09-05 16:24:58 +00:00
Daniel Sanders	1fcea42e67	[mips] Marked the Trap-on-Condition instructions as Mips II Patch by Vasileios Kalintiris. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5173 llvm-svn: 217255	2014-09-05 15:50:13 +00:00
Toma Tabacu	3c24b0483a	[mips] Rename data members and member functions in MipsAssemblerOptions. Summary: Use the naming convention from the LLVM Coding Standards. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4972 llvm-svn: 217254	2014-09-05 15:43:21 +00:00
Jan Vesely	d1d1334064	R600: Fix FROUND round halfway cases away from zero Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 217250	2014-09-05 14:26:54 +00:00
Tom Stellard	0c93c9ecee	R600/SI: Fix bug in SIInstrInfo::legalizeOpWithMove() We must constrain the destination register class of legalized operands to a VGPR class or else the illegal operand may be folded back into the instruction by the register coalescer. This fixes a bug in add.ll that will be uncovered by future commits. llvm-svn: 217249	2014-09-05 14:08:01 +00:00
Tom Stellard	80942a1b50	R600/SI: Use S_ADD_U32 and S_SUB_U32 for low half of 64-bit operations https://bugs.freedesktop.org/show_bug.cgi?id=83416 llvm-svn: 217248	2014-09-05 14:07:59 +00:00
Chandler Carruth	19cbf0e2c4	[x86] Factor out the zero vector insertion logic in the new vector shuffle lowering for integer vectors and share it from v4i32, v8i16, and v16i8 code paths. Ironically, the SSE2 v16i8 code for this is now better than the SSSE3! =] Will have to fix the SSSE3 code next to just using a single pshufb. llvm-svn: 217240	2014-09-05 10:36:31 +00:00
Tim Northover	c879d06a85	ARM: cover all sub-architecture enumerators to keep compiler happy. No change in behaviour (hopefully). llvm-svn: 217233	2014-09-05 07:56:46 +00:00
Jiangning Liu	1a486da543	[AArch64] Add pass to enable additional comparison optimizations by CSE. Patched by Sergey Dmitrouk. This pass tries to make consecutive compares of values use same operands to allow CSE pass to remove duplicated instructions. For this it analyzes branches and adjusts comparisons with immediate values by converting: GE -> GT GT -> GE LT -> LE LE -> LT and adjusting immediate values appropriately. It basically corrects two immediate values towards each other to make them equal. llvm-svn: 217220	2014-09-05 02:55:24 +00:00
Reid Kleckner	aedf0d705f	X86: cpuid and xgetbv write to 32-bit registers, not 64-bit This fixes an issue where MS inline assembly containing xgetbv wouldn't be marked as clobbering EAX:EDX. Test for that forthcoming on the Clang side. llvm-svn: 217173	2014-09-04 16:58:25 +00:00
Tim Northover	f7423fd090	AArch64: fix vector-immediate BIC/ORR on big-endian devices. Follow up to r217138, extending the logic to other NEON-immediate instructions. As before, the instruction already performs the correct operation and we're just using a different type for convenience, so we want a true nop-cast. Patch by Asiri Rathnayake. llvm-svn: 217159	2014-09-04 15:05:24 +00:00
Toma Tabacu	139644570f	[mips] Rename MipsAsmParser functions to conform to the LLVM Coding Standards. No functional changes. Summary: There are still some functions which should be renamed, but they are inherited from the generic MC classes. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5068 llvm-svn: 217145	2014-09-04 13:23:44 +00:00
Aaron Ballman	169eeb913d	Silencing a usually-helpful-but-braindead-silly-in-this-case sign mismatch warning with MSVC. NFC. llvm-svn: 217143	2014-09-04 11:52:24 +00:00
Tim Northover	bb72e6c804	AArch64: fix big-endian immediate materialisation We were materialising big-endian constants using DAG nodes with types different from what was requested, followed by a bitcast. This is fine on little-endian machines where bitcasting is a nop, but we need a slightly different representation for big-endian. This adds a new set of NVCAST (natural-vector cast) operations which are always nops. Patch by Asiri Rathnayake. llvm-svn: 217138	2014-09-04 09:46:14 +00:00
Chandler Carruth	2e5134f8f4	[x86] Teach the new v4i32 shuffle lowering some more tricks to recognize vzext patterns and insert-element patterns that for SSE4 have dedicated instructions. With this we can enable the experimental mode in a regression test that happens to cover some of the past set of issues. You can see that the new logic does significantly better here on the floating point cases. A follow-up to this change and the previous ones will hoist the logic into helpers so it can be shared across element type sizes as in this particular case it generalizes cleanly. llvm-svn: 217136	2014-09-04 09:26:30 +00:00
Elena Demikhovsky	0f54a0b02a	Fixed compilation problem on Windows (initialization of non-aggregate type). After commit 217131. llvm-svn: 217134	2014-09-04 07:20:39 +00:00
Elena Demikhovsky	228ab3d7b3	X86 Intrinsics table - changed to a static table sorted by intrinsic id. Used binary search over the tables. llvm-svn: 217131	2014-09-04 06:34:34 +00:00
Juergen Ributzka	30c02e36cc	[FastISel][AArch64] Cleanup and simplify 'fastSelectInstruction'. NFC. llvm-svn: 217119	2014-09-04 01:29:21 +00:00
Juergen Ributzka	1dbc15f02d	[FastISel][AArch64] Add target-specific lowering for logical operations. This change adds support for immediate and shift-left folding into logical operations. This fixes rdar://problem/18223183. llvm-svn: 217118	2014-09-04 01:29:18 +00:00
Chandler Carruth	fc0db222b5	[x86] Teach the new vector shuffle lowering about the zero masking abilities of INSERTPS which are really powerful and come up in very important contexts such as forming diagonal matrices, etc. With this I ended up being able to remove the somewhat weird helper I added for INSERTPS because we can collapse the entire state to a no-op mask. Added a bunch of tests for inserting into a zero-ish vector. llvm-svn: 217117	2014-09-04 01:13:48 +00:00
Matt Arsenault	51b7e81d1b	R600/SI: Un-move pattern I forgot to remove in last commit llvm-svn: 217109	2014-09-03 23:28:57 +00:00
Matt Arsenault	869cd07158	R600/SI: Try to keep i32 mul on SALU Also fix bug this exposed where when legalizing an immediate operand, a v_mov_b32 would be created with a VSrc dest register. llvm-svn: 217108	2014-09-03 23:24:35 +00:00
Chandler Carruth	dad5400397	[x86] Teach the new vector shuffle lowering about the simplest of 'insertps' patterns. This replaces two shuffles with a single insertps in very common cases. My next patch will extend this to leverage the zeroing capabilities of insertps which will allow it to be used in a much wider set of cases. llvm-svn: 217100	2014-09-03 22:48:34 +00:00
Chandler Carruth	2317311825	[x86] Teach the asm comment printing to only print the clarification of an immediate operand when we don't have instruction-specific comments. This ensures that instruction-specific comments are attached to the same line as the instruction which is important for using them to write readable and maintainable tests. My next commit will just such a test. llvm-svn: 217099	2014-09-03 22:46:44 +00:00
Robin Morisset	ed3d48f161	Refactor AtomicExpandPass and add a generic isAtomic() method to Instruction Summary: Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs. Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving the X86 backend to this pass. This requires a way of easily detecting which instructions are atomic. I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making isAtomic() a method of Instruction implemented by a switch on the opcodes. Test Plan: make check Reviewers: jfb Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5035 llvm-svn: 217080	2014-09-03 21:29:59 +00:00
Benjamin Kramer	89854ebe8e	Make some helpers static or move into the llvm namespace. llvm-svn: 217077	2014-09-03 21:04:12 +00:00
Robin Morisset	a47cb411dc	Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass) Fixes two latent bugs: - There was no fence inserted before expanded seq_cst load (unsound on Power) - There was only a fence release before seq_cst stores (again unsound, in particular on Power) It is not even clear if this is correct on ARM swift processors (where release fences are DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift as it is not clear whether it is incorrect. I would love to get documentation stating whether it is correct or not. These two bugs were not triggered because Power is not (yet) using this pass, and these behaviours happen to be (mostly?) working on ARM (although they completely butchered the semantics of the llvm IR). See: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html for an example of the problems that can be caused by the second of these bugs. I couldn't see a way of fixing these in a completely target-independent way without adding lots of unnecessary fences on ARM, hence the target-dependent parts of this patch. This patch implements the new target-dependent parts only for ARM (the default of not doing anything is enough for AArch64), other architectures will use this infrastructure in later patches. llvm-svn: 217076	2014-09-03 21:01:03 +00:00
Juergen Ributzka	88e32517c4	[FastISel][tblgen] Rename tblgen generated FastISel functions. NFC. This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075	2014-09-03 20:56:59 +00:00
Juergen Ributzka	5b8bb4d7dd	[FastISel] Rename public visible FastISel functions. NFC. This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074	2014-09-03 20:56:52 +00:00
Eric Christopher	b68e25330b	Remove resetSubtargetFeatures as it is unused. llvm-svn: 217071	2014-09-03 20:36:31 +00:00
Eric Christopher	e08189195b	Remove unnecessary getTarget call now that the subtarget is cached on the machine function. llvm-svn: 217070	2014-09-03 20:36:26 +00:00
Juergen Ributzka	7a76c2409e	[FastISel] Some long overdue spring cleaning of FastISel. Things got a little bit messy over the years and it is time for a little bit spring cleaning. This first commit is focused on the FastISel base class itself. It doxyfies all comments, C++11fies the code where it makes sense, renames internal methods to adhere to the coding standard, and clang-formats the files. Reviewed by Eric llvm-svn: 217060	2014-09-03 18:46:45 +00:00
Juergen Ributzka	31c8054594	[FastISel][AArch64] Move unconditional branch handling into 'SelectBranch'. NFC. llvm-svn: 217054	2014-09-03 17:58:10 +00:00
Tom Stellard	102c68786c	R600/SI: Add a pattern for i64 and in a branch llvm-svn: 217041	2014-09-03 15:22:41 +00:00
Tom Stellard	b8b841366a	R600/SI: Fix typos in SIInstrInfo::areLoadsFromSameBasePtr() This fixes a crash in the OpenCV test: ImgprocWarpResizeArea/Resize.Mat/16 There is no test case for this, because this failure depends on a specific ordering of the loads, which could easily change. llvm-svn: 217040	2014-09-03 15:22:39 +00:00
Benjamin Kramer	8c90fd71f7	Add override to overriden virtual methods, remove virtual keywords. No functionality change. Changes made by clang-tidy + some manual cleanup. llvm-svn: 217028	2014-09-03 11:41:21 +00:00
Alexander Potapenko	c578567b07	Follow-up for r217020: actually commit the fix for PR20800, revert the accidentally committed changes to LLVMSymbolize.cpp llvm-svn: 217021	2014-09-03 07:37:20 +00:00
Juergen Ributzka	31e5b7fb12	Reapply r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR."" This reapplies r216805 with a fix to a copy-past error, which resulted in an incorrect register class. Original commit message: Select the correct register class for the various instructions that are generated when combining instructions and constrain the registers to the appropriate register class. This fixes rdar://problem/18183707. llvm-svn: 217019	2014-09-03 07:07:10 +00:00
Juergen Ributzka	a1148b2173	[FastISel][AArch64] Add target-dependent instruction selection for Add/Sub. There is already target-dependent instruction selection support for Adds/Subs to support compares and the intrinsics with overflow check. This takes advantage of the existing infrastructure to also support Add/Sub, which allows the folding of immediates, sign-/zero-extends, and shifts. This fixes rdar://problem/18207316. llvm-svn: 217007	2014-09-03 01:38:36 +00:00
Renato Golin	e07a22ac14	Only emit movw on ARMv6T2+ Fix PR18364. Patch by Dimitry Andric. llvm-svn: 216989	2014-09-02 22:45:13 +00:00
Juergen Ributzka	53dbef6ef1	[FastISel][AArch64] Use the target-dependent selection code for shifts first. This uses the target-dependent selection code for shifts first, which allows us to create better code for shifts with immediates and sign-/zero-extend folding. Vector type are not handled yet and the code falls back to target-independent instruction selection for these cases. This fixes rdar://problem/17907920. llvm-svn: 216985	2014-09-02 22:33:57 +00:00
Juergen Ributzka	8a4b8bebdc	[FastISel][AArch64] Use a new helper function to determine if a value type is supported. NFCI. FastISel for AArch64 supports more value types than are actually legal. Use a dedicated helper function to reflect this. It is very similar to the isLoadStoreTypeLegal function, with the exception that vector types are not supported yet. llvm-svn: 216984	2014-09-02 22:33:53 +00:00
Eric Christopher	79cc1e3ae7	Reinstate "Nuke the old JIT." Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982	2014-09-02 22:28:02 +00:00
Robin Morisset	df20586a7a	[X86] Allow atomic operations using immediates to avoid using a register The only valid lowering of atomic stores in the X86 backend was mov from register to memory. As a result, storing an immediate required a useless copy of the immediate in a register. Now these can be compiled as a simple mov. Similarily, adding/and-ing/or-ing/xor-ing an immediate to an atomic location (but through an atomic_store/atomic_load, not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)' instead of using a register. And the same applies to inc/dec. This second point matches the first issue identified in http://llvm.org/bugs/show_bug.cgi?id=17281 llvm-svn: 216980	2014-09-02 22:16:29 +00:00
Juergen Ributzka	dbe9e174b6	[FastISel][AArch64] Move over to target-dependent instruction selection only. This change moves FastISel for AArch64 to target-dependent instruction selection only. This change replicates the existing target-independent behavior, therefore there are no changes to the unit tests or new tests. Future changes will take advantage of this change and update functionality and unit tests. llvm-svn: 216955	2014-09-02 21:32:54 +00:00
Sanjay Patel	3f7a24e400	Refactor LowerFABS and LowerFNEG into one function (x86) (NFC) We duplicate ~30 lines of code to lower FABS and FNEG for x86, so this patch combines them into one function. No functional change intended, so no additional test cases. Test-suite behavior is unchanged. Differential Revision: http://reviews.llvm.org/D5064 llvm-svn: 216942	2014-09-02 20:24:47 +00:00
Reid Kleckner	0b2bccc3cd	CodeGen: Handle va_start in the entry block Also fix a small copy-paste bug in X86ISelLowering where Chain should have been used in place of DAG.getEntryToken(). Fixes PR20828. llvm-svn: 216929	2014-09-02 18:42:44 +00:00
Alexey Samsonov	d37bab6197	Fix left shifts of negative values in MipsDisassembler. This bug was reported by UBSan. llvm-svn: 216920	2014-09-02 17:49:16 +00:00
Pete Cooper	1175945710	Change MCSchedModel to be a struct of statically initialized data. This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour Reviewed by Andy Trick and Chandler C llvm-svn: 216919	2014-09-02 17:43:54 +00:00
Alexey Samsonov	9ca4870b49	Fix signed integer overflow in PPCInstPrinter. This bug was reported by UBSan. llvm-svn: 216917	2014-09-02 17:38:34 +00:00
JF Bastien	12cc99eb13	Add missing override on ARMAsmBackend's dtor. Test Plan: ninja check && ninja clang-test Subscribers: aemerson Differential Revision: http://reviews.llvm.org/D5075 llvm-svn: 216912	2014-09-02 16:26:55 +00:00
Alexey Samsonov	729b12ede3	Fix left shifts of negative integers in AArch64 InstPrinter/Disassembler Summary: Left shift of negative integer is an undefined behavior, and is reported by UBSan. It's ok for imm values to be negative, so we can just replace left shifts with multiplications. Test Plan: check-llvm test suite Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: aemerson, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5132 llvm-svn: 216910	2014-09-02 16:19:41 +00:00
Aaron Ballman	8ca53885fa	Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC. llvm-svn: 216902	2014-09-02 12:19:02 +00:00
David Xu	052b9d9282	Merge Extend and Shift into a UBFX llvm-svn: 216899	2014-09-02 09:33:56 +00:00
Hal Finkel	51b3fd1e28	[PowerPC] Guard against illegal selection of add for TargetConstant operands r208640 was reverted because it caused a self-hosting failure on ppc64. The underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant operands. Because we have no patterns for 'add' taking 'timm' nodes, these are selected as r+r add instructions (which is a miscompile). Guard against this kind of behavior in the future by making the backend crash should this occur (instead of silently generating invalid output). llvm-svn: 216897	2014-09-02 06:23:54 +00:00
Saleem Abdulrasool	d1a4ed6a7c	CodeGen: indicate Windows unwind data format The structures for Windows unwinding are shared across multiple platforms. Indicate the encoding to be used for the particular target. Use this to switch the unwind emitter instantiated by the AsmPrinter. llvm-svn: 216895	2014-09-01 23:48:39 +00:00
Sanjay Patel	601492a3e3	Use an integer constant for FABS / FNEG (x86). This change will ease refactoring LowerFABS() and LowerFNEG() since they have a lot of overlap. Remove the creation of a floating point constant from an integer because it's going to be used for a bitwise integer op anyway. No change to codegen expected, but the verbose comment string for asm output may change from float values to hex (integer), depending on whether the constant already exists or not. Differential Revision: http://reviews.llvm.org/D5052 llvm-svn: 216889	2014-09-01 19:01:47 +00:00
Yuri Gorshenin	c107d147dc	[asan-assembly-instrumentation] Prologue and epilogue are moved out from InstrumentMemOperand(). Reviewers: eugenis Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D4923 llvm-svn: 216879	2014-09-01 12:51:00 +00:00
Renato Golin	92c816c68f	Thumb2 M-class MSR instruction support changes This patch implements a few changes related to the Thumb2 M-class MSR instruction: * better handling of unpredictable encodings, * recognition of the _g and _nzcvqg variants by the asm parser only if the DSP extension is available, preferred output of MSR APSR moves with the _<bits> suffix for v7-M. Patch by Petr Pavlu. llvm-svn: 216874	2014-09-01 11:25:07 +00:00
Yuri Gorshenin	e2f01eb730	Revert "[asan-assembly-instrumentation] Prologue and epilogue are moved out from InstrumentMemOperand()." This reverts commit 895aa397038b8de86d83ac0997a70949a486e112. llvm-svn: 216872	2014-09-01 10:24:04 +00:00
Yuri Gorshenin	506a170d63	[asan-assembly-instrumentation] Prologue and epilogue are moved out from InstrumentMemOperand(). llvm-svn: 216869	2014-09-01 09:56:45 +00:00
Craig Topper	fd38cbebda	Remove 'virtual' keyword from methods markedwith 'override' keyword. llvm-svn: 216823	2014-08-30 16:48:34 +00:00
Craig Topper	6dc4a8bc2c	Fix some cases where StringRef was being passed by const reference. Remove const from some other StringRefs since its implicitly const already. llvm-svn: 216820	2014-08-30 16:48:02 +00:00
Brad Smith	e98cdf9b77	JIT support has been added awhile ago. llvm-svn: 216819	2014-08-30 14:52:34 +00:00
Juergen Ributzka	25816b0fdd	Revert r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR." I think this broke the build bot. Reverting it for now until I have time to take a closer look. llvm-svn: 216813	2014-08-30 06:16:26 +00:00
Juergen Ributzka	3e7f88c169	[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR. Select the correct register class for the various instructions that are generated when combining instructions and constrain the registers to the appropriate register class. This fixes rdar://problem/18183707. llvm-svn: 216805	2014-08-29 23:48:09 +00:00
Juergen Ributzka	c5c1c6090f	[FastISel][AArch64] Use the correct register class for branches. Also constrain the register class for branches. This fixes rdar://problem/18181496. llvm-svn: 216804	2014-08-29 23:48:06 +00:00
Alexey Samsonov	700964ea00	Make isValidMCLOHType take unsigned instead of enum to avoid loading invalid enum values llvm-svn: 216797	2014-08-29 22:34:28 +00:00
Reid Kleckner	39ad7c9812	AArch64: Silence -Wabsolute-value warning with std::abs llvm-svn: 216794	2014-08-29 22:14:26 +00:00
Reid Kleckner	d70ab41a4f	Speculative build fix for const, gcc, and ArrayRef overloads llvm-svn: 216793	2014-08-29 22:12:08 +00:00
Robin Morisset	039781ef26	Fix typos in comments, NFC Summary: Just fixing comments, no functional change. Test Plan: N/A Reviewers: jfb Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5130 llvm-svn: 216784	2014-08-29 21:53:01 +00:00
Reid Kleckner	dccd0cbec3	Add a const and munge some comments llvm-svn: 216781	2014-08-29 21:42:21 +00:00
Reid Kleckner	16e5541211	musttail: Forward regparms of variadic functions on x86_64 Summary: If a variadic function body contains a musttail call, then we copy all of the remaining register parameters into virtual registers in the function prologue. We track the virtual registers through the function body, and add them as additional registers to pass to the call. Because this is all done in virtual registers, the register allocator usually gives us good code. If the function does a call, however, it will have to spill and reload all argument registers (ew). Forwarding regparms on x86_32 is not implemented because most compilers don't support varargs in 32-bit with regparms. Reviewers: majnemer Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5060 llvm-svn: 216780	2014-08-29 21:42:08 +00:00
Reid Kleckner	329d4a2b29	Verifier: Don't reject varargs callee cleanup functions We've rejected these kinds of functions since r28405 in 2006 because it's impossible to lower the return of a callee cleanup varargs function. However there are lots of legal ways to leave such a function without returning, such as aborting. Today we can leave a function with a musttail call to another function with the correct prototype, and everything works out. I'm removing the verifier check declaring that a normal return from such a function is UB. Reviewed By: nlewycky Differential Revision: http://reviews.llvm.org/D5059 llvm-svn: 216779	2014-08-29 21:25:28 +00:00
Louis Gerbarg	03c627e8a7	Remove spurious mask operations from AArch64 add->compares on 16 and 8 bit values This patch checks for DAG patterns that are an add or a sub followed by a compare on 16 and 8 bit inputs. Since AArch64 does not support those types natively they are legalized into 32 bit values, which means that mask operations are inserted into the DAG to emulate overflow behaviour. In many cases those masks do not change the result of the processing and just introduce a dependent operation, often in the middle of a hot loop. This patch detects the relevent DAG patterns and then tests to see if the transforms are equivalent with and without the mask, removing the mask if possible. The exact mechanism of this patch was discusses in http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074444.html There is a reasonably good chance there are missed oppurtunities due to similiar (but not identical) DAG patterns that could be funneled into this test, adding them should be simple if we see test cases. Tests included. rdar://13754426 llvm-svn: 216776	2014-08-29 21:00:22 +00:00
Reid Kleckner	ab99e24e94	X86: Fix conflict over ESI between base register and rep;movsl The new solution is to not use this lowering if there are any dynamic allocas in the current function. We know up front if there are dynamic allocas, but we don't know if we'll need to create stack temporaries with large alignment during lowering. Conservatively assume that we will need such temporaries. Reviewed By: hans Differential Revision: http://reviews.llvm.org/D5128 llvm-svn: 216775	2014-08-29 20:50:31 +00:00
Robin Morisset	5ce0ce4430	[X86] Refactor X86ISelDAGToDAG::SelectAtomicLoadArith - NFC Summary: Mostly renaming the (not very explicit) variables Tmp0, .. Tmp4, and grouping related statements together, along with a few lines of comments for the surprising parts. No functional change intended. Test Plan: make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5088 llvm-svn: 216768	2014-08-29 20:19:23 +00:00
Juergen Ributzka	f6ee7a7cdd	[FastISel][AArch64] Fix an incorrect kill flag due to a bug in SelectTrunc. When we select a trunc instruction we don't emit any code if the type is already i32 or smaller. This is because the instruction that uses the truncated value will deal with it. This behavior can incorrectly transfer a kill flag, which was meant for the result of the truncate, onto the source register. %2 = trunc i32 %1 to i16 ... = ... %2 -> ... = ... vreg1 <kill> ... = ... %1 ... = ... vreg1 This commit fixes this by emitting a COPY instruction, so that the result and source register are distinct virtual registers. This fixes rdar://problem/18178188. llvm-svn: 216750	2014-08-29 17:58:16 +00:00
Matt Arsenault	8675db15da	R600/SI: Use mad for fsub + fmul We can use a negate source modifier to match this for fsub. llvm-svn: 216735	2014-08-29 16:01:14 +00:00
Tim Northover	3c0915e858	AArch64: only try to get operand of a known node. A bug in r216725 meant we tried to discover the type of a SETCC before confirming the node actually was a SETCC. llvm-svn: 216734	2014-08-29 15:34:58 +00:00
Sanjay Patel	a065eb44aa	typo llvm-svn: 216732	2014-08-29 15:32:09 +00:00
Jingyue Wu	cb83a155c1	[NVPTX] Make the alignment an explicit argument to ldu/ldg Summary: Instead of specifying the alignment as metadata which may be destroyed by transformation passes, make the alignment the second argument to ldu/ldg intrinsic calls. Test Plan: ldu-ldg.ll ldu-i8.ll ldu-reg-plus-offset.ll Reviewers: eliben, meheff, jholewinski Reviewed By: meheff, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D5093 llvm-svn: 216731	2014-08-29 15:30:20 +00:00
Tim Northover	c1c05aeb5d	AArch64: skip select/setcc combine in complex case. In an llvm-stress generated test, we were trying to create a v0iN type and asserting when that failed. This case could probably be handled by the function, but not without added complexity and the situation it arises in is sufficiently odd that there's probably no benefit anyway. Should fix PR20775. llvm-svn: 216725	2014-08-29 13:05:18 +00:00
Arnaud A. de Grandmaison	6afbf2aa5e	[AArch64] FPLoadBalancing: move ownership of the chain to its current accumulator register and forget about the previously used accumulator. Coming up with a simple testcase is not easy, as this highly depends on what the register allocator is doing: this issue showed up while working with the PBQP allocator, which produced a different allocation scheme. A testcase would need to come up with chain starting in D[0-7], then moving to D[8-15], followed by a call to a function whose regmask clobbers the starting accumulator in D[0-7], then another use of the chain. Fixed some formatting, added some invariant checks while there. llvm-svn: 216721	2014-08-29 09:54:11 +00:00
Robert Khasanov	a651a62340	[SKX] Enable lowering of integer CMP operations. Added new types to Legalizer. Fixed getSetCCResultType function Added lowering tests. Reviewed by Elena Demikhovsky. llvm-svn: 216717	2014-08-29 08:46:04 +00:00
Jiangning Liu	08f4cda2ec	[AArch64] Fix some failures exposed by value type v4f16 and v8f16. 1) Add some missing bitcast patterns for v8f16. 2) Add type promotion for operand of ld/st operations. llvm-svn: 216706	2014-08-29 01:31:42 +00:00
Juergen Ributzka	77bc09f5ab	[FastISel][AArch64] Don't fold instructions that are not in the same basic block. This fix checks first if the instruction to be folded (e.g. sign-/zero-extend, or shift) is in the same machine basic block as the instruction we are folding into. Not doing so can result in incorrect code, because the value might not be live-out of the basic block, where the value is defined. This fixes rdar://problem/18169495. llvm-svn: 216700	2014-08-29 00:19:21 +00:00
Jim Grosbach	ec2b0d0b11	AArch64: More correctly constrain target vector extend lowering. The AArch64 target lowering for [zs]ext of vectors is set up to handle input simple types and expects the generic SDag path to do something reasonable with anything that's not a simple type. The code, however, was only checking that the result type was a simple type and assuming that implied that the source type would also be a simple type. That's not a valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>" demonstrate. The fix is to simply explicitly validate the source type as well as the result type. PR20791 llvm-svn: 216689	2014-08-28 22:08:28 +00:00
Sanjay Patel	81ecbb0737	Fix a logic bug in x86 vector codegen: sext (zext (x) ) != sext (x) (PR20472). Remove a block of code from LowerSIGN_EXTEND_INREG() that was added with: http://llvm.org/viewvc/llvm-project?view=revision&revision=177421 And caused: http://llvm.org/bugs/show_bug.cgi?id=20472 (more analysis here) http://llvm.org/bugs/show_bug.cgi?id=18054 The testcases confirm that we (1) don't remove a zext op that is necessary and (2) generate a pmovz instead of punpck if SSE4.1 is available. Although pmovz is 1 byte longer, it allows folding of the load, and so saves 3 bytes overall. Differential Revision: http://reviews.llvm.org/D4909 llvm-svn: 216679	2014-08-28 18:59:22 +00:00
Sid Manning	67a8936a84	Minor spelling correction. Reviewers: adasgupt, jverma, sidneym Differential Revision: http://reviews.llvm.org/D5025 llvm-svn: 216667	2014-08-28 14:16:32 +00:00
David Xu	ee978203e6	Generate CMN when comparing a short int with minus llvm-svn: 216651	2014-08-28 04:59:53 +00:00
Justin Hibbits	3476db4220	Test commit. Fix whitespace from a previous patch of mine. llvm-svn: 216650	2014-08-28 04:40:55 +00:00
Chandler Carruth	c01ce6bc01	[x86] Fix whitespace and formatting around this function with clang-format, no functionality changed. llvm-svn: 216646	2014-08-28 04:00:24 +00:00
Chandler Carruth	cb07a4adf3	[x86] Hoist conditions from every single if in this routine to a single early exit. And factor the subsequent cast<> from all but one block into a single variable. No functionality changed. llvm-svn: 216645	2014-08-28 03:57:13 +00:00
Chandler Carruth	974aa336b1	[x86] Inline an SSE4 helper function for INSERT_VECTOR_ELT lowering, no functionality changed. Separating this into two functions wasn't helping. There was a decent amount of boilerplate duplicated, and some subsequent refactorings here will pull even more common code out. llvm-svn: 216644	2014-08-28 03:52:45 +00:00
Juergen Ributzka	843f14f411	Revert "[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation." Quentin pointed out that this is not the correct approach and there is a better and easier solution. llvm-svn: 216632	2014-08-27 23:09:40 +00:00
Alexey Samsonov	a8d2f819ad	Fix unaligned reads/writes in X86JIT and RuntimeDyldELF. Summary: Introduce support::ulittleX_t::ref type to Support/Endian.h and use it in x86 JIT to enforce correct endianness and fix unaligned accesses. Test Plan: regression test suite Reviewers: lhames Subscribers: ributzka, llvm-commits Differential Revision: http://reviews.llvm.org/D5011 llvm-svn: 216631	2014-08-27 23:06:08 +00:00
Juergen Ributzka	ad8beabe38	[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation. Currently instructions are folded very aggressively into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. If the computed address is used by only memory operations in the same basic block, then it is safe to fold them. This is because all memory operations will fold the address computation and the original computation will never be emitted. This fixes rdar://problem/18142857. llvm-svn: 216629	2014-08-27 22:52:33 +00:00
Juergen Ributzka	56b4b33190	[FastISel][AArch64] Fix a comment in my previous commit (r216617). llvm-svn: 216622	2014-08-27 21:40:50 +00:00
Juergen Ributzka	3c1b286152	[FastISel][AArch64] Fix simplify address when the address comes from a shift. When the address comes directly from a shift instruction then the address computation cannot be folded into the memory instruction, because the zero register is not available as a base register. Simplify addess needs to emit the shift instruction and use the result as base register. llvm-svn: 216621	2014-08-27 21:38:33 +00:00
Juergen Ributzka	100a9b7fda	[FastISel][AArch64] Use the zero register for stores. Use the zero register directly when possible to avoid an unnecessary register copy and a wasted register at -O0. This also uses integer stores to store a positive floating-point zero. This saves us from materializing the positive zero in a register and then storing it. llvm-svn: 216617	2014-08-27 21:04:52 +00:00
Sanjay Patel	1d23bac843	typo in comment llvm-svn: 216609	2014-08-27 20:27:05 +00:00
Reid Kleckner	7b7a599ac5	X86 MC: Handle instructions like fxsave that match multiple operand sizes Instructions like 'fxsave' and control flow instructions like 'jne' match any operand size. The loop I added to the Intel syntax matcher assumed that using a different size would give a different instruction. Now it handles the case where we get the same instruction for different memory operand sizes. This also allows us to remove the hack we had for unsized absolute memory operands, because we can successfully match things like 'jnz' without reporting ambiguity. Removing this hack uncovered test case involving 'fadd' that was ambiguous. The memory operand could have been single or double precision. llvm-svn: 216604	2014-08-27 20:10:38 +00:00
Alexey Samsonov	a253bf9678	Use BitVector instead of int in R600 SIISelLowering. int may not have enough bits in it, which was detected by UBSan bootstrap (it reported left shift by a too large constant). llvm-svn: 216579	2014-08-27 19:36:53 +00:00
Oliver Stannard	89d1542840	Teach the AArch64 backend about v4f16 and v8f16 This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555	2014-08-27 16:16:04 +00:00
Evgeniy Stepanov	5050553ab8	Clang-format over X86AsmInstrumentation.* with LLVM style. r216536 mistakenly used -style=Google instead of LLVM. llvm-svn: 216543	2014-08-27 13:11:55 +00:00
Chandler Carruth	a5a8a9adc8	[x86] Fix a regression introduced with r213897 for 32-bit targets where we stopped efficiently lowering sextload using the SSE41 instructions for that operation. This is a consequence of a bad predicate I used thinking of the memory access needs. The code actually handles the cases where the predicate doesn't apply, and handles them much better. =] Simple fix and a test case added. Fixes PR20767. llvm-svn: 216538	2014-08-27 11:39:47 +00:00
Chandler Carruth	74ec9e19ee	[SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine. This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be exactly right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537	2014-08-27 11:22:16 +00:00
Evgeniy Stepanov	4d04f66627	Clang-format over X86AsmInstrumentation.*. llvm-svn: 216536	2014-08-27 11:10:54 +00:00
Robert Khasanov	29e3b96734	[SKX] Added new versions of cmp instructions in avx512_icmp_cc multiclass, added VL multiclass. Added encoding tests llvm-svn: 216532	2014-08-27 09:34:37 +00:00
Elena Demikhovsky	ff620edd3c	AVX-512: Added intrinsic for VMOVSS store form with mask. llvm-svn: 216530	2014-08-27 07:38:43 +00:00
Craig Topper	e1d1294853	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created. llvm-svn: 216525	2014-08-27 05:25:25 +00:00
Juergen Ributzka	fb506a417d	[FastISel][AArch64] Fix address simplification. When a shift with extension or an add with shift and extension cannot be folded into the memory operation, then the address calculation has to be materialized separately. While doing so the code forgot to consider a possible sign-/zero- extension. This fix folds now also the sign-/zero-extension into the add or shift instruction which is used to materialize the address. This fixes rdar://problem/18141718. llvm-svn: 216511	2014-08-27 00:58:30 +00:00
Juergen Ributzka	99dd30f338	[FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction. llvm-svn: 216510	2014-08-27 00:58:26 +00:00
Reid Kleckner	f6fb780890	MC: Split the x86 asm matcher implementations by dialect The existing matcher has lots of AT&T assembly dialect assumptions baked into it. In particular, the hack for resolving the size of a memory operand by appending the four most common suffixes doesn't work at all. The Intel assembly dialect mnemonic table has ambiguous entries, so we need to try matching multiple times with different operand sizes, since that's the only way to choose different instruction variants. This makes us more compatible with gas's implementation of Intel assembly syntax. MSVC assumes you want byte-sized operations for the instructions that we reject as ambiguous. Reviewed By: grosbach Differential Revision: http://reviews.llvm.org/D4747 llvm-svn: 216481	2014-08-26 20:32:34 +00:00
James Molloy	36b8a88188	Change the return value of "getEnd()" from a MachineInstr* to a MachineBasicBlock::iterator. It seems on Darwin the illegal round-trip ::iterator -> MachineInstr* -> ::iterator breaks execution horribly when the iterator is not a real MachineInstr, like ::end(). llvm-svn: 216455	2014-08-26 13:41:31 +00:00
Yi Kong	ebaa150e23	ARM: Add patterns for dbg llvm-svn: 216451	2014-08-26 12:47:26 +00:00
Dylan Noblesmith	4af4d2c111	AArch64: use std::fill instead of memset Followup based on review. llvm-svn: 216436	2014-08-26 03:33:26 +00:00
Dylan Noblesmith	b06f77b608	Revert "AArch64: use std::vector for temp array" This reverts commit r216365. llvm-svn: 216433	2014-08-26 02:03:43 +00:00
Dylan Noblesmith	c9e2a2709e	Revert "NVPTX: remove another raw delete call" This reverts commit r216364. llvm-svn: 216430	2014-08-26 02:03:35 +00:00
Juergen Ributzka	1912e24898	[FastISel][AArch64] Refactor float zero materialization. NFCI. llvm-svn: 216403	2014-08-25 19:58:05 +00:00
Rafael Espindola	3fd1e9933f	Modernize raw_fd_ostream's constructor a bit. Take a StringRef instead of a "const char *". Take a "std::error_code &" instead of a "std::string &" for error. A create static method would be even better, but this patch is already a bit too big. llvm-svn: 216393	2014-08-25 18:16:47 +00:00
Chandler Carruth	70f81a98ca	[x86] Fix a bug in r216319 where I was missing a 'break'. This actually was caught by existing tests but those tests were disabled with an XFAIL because of PR20736. While working on fixing that, I noticed the test failure, and tracked it down to this. We even have a really nice Clang warning that would have caught this but it isn't enabled in LLVM! =[ I may look at enabling it. llvm-svn: 216391	2014-08-25 18:06:11 +00:00
Chad Rosier	e62f365458	[AArch32] Add patterns for VCVT{A,N,P,M}. Patterns for lowering libm calls to VCVT{A,N,P,M} are also included. Phabricator Revision: http://reviews.llvm.org/D5033 llvm-svn: 216388	2014-08-25 16:56:33 +00:00
Robert Khasanov	2ea081d4d1	[SKX] avx512_icmp_packed multiclass extension Extended avx512_icmp_packed multiclass by masking versions. Added avx512_icmp_packed_rmb multiclass for embedded broadcast versions. Added corresponding _vl multiclasses. Added encoding tests for CPCMP{EQ\|GT}* instructions. Add more fields for X86VectorVTInfo. Added AVX512VLVectorVTInfo that include X86VectorVTInfo for 512/256/128-bit versions Differential Revision: http://reviews.llvm.org/D5024 llvm-svn: 216383	2014-08-25 14:49:34 +00:00
Karthik Bhat	7f33ff7dea	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Dylan Noblesmith	b899464f5b	AArch64: unique_ptr-ify map structures llvm-svn: 216366	2014-08-25 01:59:38 +00:00
Dylan Noblesmith	6076debd98	AArch64: use std::vector for temp array llvm-svn: 216365	2014-08-25 01:59:36 +00:00
Dylan Noblesmith	130589f804	NVPTX: remove another raw delete call llvm-svn: 216364	2014-08-25 01:59:32 +00:00
Dylan Noblesmith	802b6ce8de	NVPTX: remove raw delete call Also make members that are never accessed outside the class private. llvm-svn: 216363	2014-08-25 01:59:29 +00:00
Craig Topper	4627679cec	Use range based for loops to avoid needing to re-mention SmallPtrSet size. llvm-svn: 216351	2014-08-24 23:23:06 +00:00

... 3 4 5 6 7 ...

30212 Commits