llvm-project

Commit Graph

Author	SHA1	Message	Date
Quentin Colombet	95e053119e	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Other instructions. <rdar://problem/15607571> llvm-svn: 215923	2014-08-18 17:55:59 +00:00
Quentin Colombet	81db56d931	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Logic instructions. <rdar://problem/15607571> llvm-svn: 215922	2014-08-18 17:55:56 +00:00
Quentin Colombet	c13c50e0f3	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Math instructions. <rdar://problem/15607571> llvm-svn: 215921	2014-08-18 17:55:53 +00:00
Quentin Colombet	45c469c0c3	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215920	2014-08-18 17:55:51 +00:00
Quentin Colombet	ca74f23df7	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Conversion instructions. <rdar://problem/15607571> llvm-svn: 215919	2014-08-18 17:55:49 +00:00
Quentin Colombet	71cdecd73c	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215918	2014-08-18 17:55:46 +00:00
Quentin Colombet	bd11563742	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Other instructions. <rdar://problem/15607571> llvm-svn: 215917	2014-08-18 17:55:43 +00:00
Quentin Colombet	91513d9522	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Logic instructions. <rdar://problem/15607571> llvm-svn: 215916	2014-08-18 17:55:41 +00:00
Quentin Colombet	e9f8b4b7ac	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215915	2014-08-18 17:55:39 +00:00
Quentin Colombet	f68e09418c	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215914	2014-08-18 17:55:36 +00:00
Quentin Colombet	33b0bf200d	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point x87 instructions. Sub-group: Math instructions. <rdar://problem/15607571> llvm-svn: 215913	2014-08-18 17:55:32 +00:00
Quentin Colombet	456c991fb4	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point x87 instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215912	2014-08-18 17:55:29 +00:00
Quentin Colombet	0bc907e5e8	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point x87 instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215911	2014-08-18 17:55:26 +00:00
Quentin Colombet	6e62be2f5a	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Other instructions. <rdar://problem/15607571> llvm-svn: 215910	2014-08-18 17:55:23 +00:00
Quentin Colombet	a6c56f5072	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Synchronization instructions. <rdar://problem/15607571> llvm-svn: 215909	2014-08-18 17:55:21 +00:00
Quentin Colombet	c58fc449fd	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: String instructions. <rdar://problem/15607571> llvm-svn: 215908	2014-08-18 17:55:19 +00:00
Quentin Colombet	e1b17768a0	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Control transfer instructions. <rdar://problem/15607571> llvm-svn: 215907	2014-08-18 17:55:16 +00:00
Quentin Colombet	fb887b1c05	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Logic instructions. <rdar://problem/15607571> llvm-svn: 215906	2014-08-18 17:55:13 +00:00
Quentin Colombet	df26059e13	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215905	2014-08-18 17:55:11 +00:00
Quentin Colombet	35d37b7571	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215904	2014-08-18 17:55:08 +00:00
Elena Demikhovsky	34d2d76d25	AVX-512: Fixed a bug in emitting compare for MVT:i1 type. Added a test. llvm-svn: 215889	2014-08-18 11:59:06 +00:00
Tim Northover	26bb14e6a7	TableGen: allow use of uint64_t for available features mask. ARM in particular is getting dangerously close to exceeding 32 bits worth of possible subtarget features. When this happens, various parts of MC start to fail inexplicably as masks get truncated to "unsigned". Mostly just refactoring at present, and there's probably no way to test. llvm-svn: 215887	2014-08-18 11:49:42 +00:00
Elena Demikhovsky	67d9500444	Reverted last commit llvm-svn: 215828	2014-08-17 09:39:48 +00:00
Elena Demikhovsky	c0b420fdf5	Reverted last commit llvm-svn: 215827	2014-08-17 09:36:07 +00:00
Elena Demikhovsky	2bb991a0c5	Added a table for intrinsics on X86. It should remove dosens of lines in handling instrinsics (in a huge switch) and give an easy way to add new intrinsics. I did not completed to move al intrnsics to the table, I'll do this in the upcomming commits. llvm-svn: 215826	2014-08-17 09:00:20 +00:00
Chandler Carruth	5a37dd6ff4	[x86] Fix an indentation goof in a prior commit. Should have re-run clang-format. llvm-svn: 215824	2014-08-17 00:40:34 +00:00
Chandler Carruth	e020f117ce	[x86] Teach lots of the new vector shuffle lowering to use UNPCK instructions for blend operations at 128 bits. This was a serious hole in our prior blend lowering. llvm-svn: 215819	2014-08-16 09:42:15 +00:00
Robin Morisset	d539f866ac	Get rid of dead code: SelectAtomic64 in X86ISelDAGtoDAG.cpp llvm-svn: 215789	2014-08-15 23:36:00 +00:00
Reid Kleckner	a6b86bef4d	Fix the build with MSVC 2013 after new shuffle code MSVC gives this awesome diagnostic: ..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument ..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1' ..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl' Using an anonymous namespace makes the problem go away. llvm-svn: 215744	2014-08-15 18:03:58 +00:00
Chandler Carruth	03f456abbe	[x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions where applicable for blending. llvm-svn: 215737	2014-08-15 17:42:00 +00:00
Rafael Espindola	d610ba99cb	Remove HasLEB128. We already require CFI, so it should be safe to require .leb128 and .uleb128. llvm-svn: 215712	2014-08-15 14:01:07 +00:00
Chandler Carruth	f88612581e	[x86] Add the initial skeleton of type-based dispatch for AVX vectors in the new shuffle lowering and an implementation for v4 shuffles. This allows us to handle non-half-crossing shuffles directly for v4 shuffles, both integer and floating point. This currently misses places where we could perform the blend via UNPCK instructions, but otherwise generates equally good or better code for the test cases included to the existing vector shuffle lowering. There are a few cases that are entertainingly better. ;] llvm-svn: 215702	2014-08-15 11:01:40 +00:00
Chandler Carruth	0288620f67	[x86] Teach the instruction printer to decode immediate operands to BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments. These will be used in my next commit as part of test cases for AVX shuffles which can directly use blend in more places. llvm-svn: 215701	2014-08-15 11:01:37 +00:00
Chandler Carruth	6649f53207	[x86] Remove the duplicated code for testing whether we can widen the elements of a shuffle mask and simplify how it works. No functionality changed now that the bug that was here has been fixed. llvm-svn: 215696	2014-08-15 07:41:57 +00:00
Chandler Carruth	17fd848bfa	[x86] Fix the very broken formation of vpunpck instructions in the target-specific shuffl DAG combines. We were recognizing the paired shuffles backwards. This code needs to be replaced anyways as we have the same functionality elsewhere, but I'll do the refactoring in a follow-up, this is the minimal fix to the behavior. In addition to fixing miscompiles with the new vector shuffle lowering, it also causes the canonicalization to kick in much better, selecting the smaller encoding variants in lots of places in the new AVX path. This still isn't quite ideal as we don't need both the shufpd and the punpck instructions, but that'll get fixed in a follow-up patch. llvm-svn: 215690	2014-08-15 03:54:49 +00:00
Chandler Carruth	372c143c2f	[x86] Fix PR20540 where the x86 shuffle DAG combiner had completely broken logic for merging shuffle masks in the face of SM_SentinelZero mask operands. While these are '-1' they don't mean 'undef' the way '-1' means in the pre-legalized shuffle masks. Instead, they mean that the shuffle operation is forcibly zeroing that lane. Reflect this and explicitly handle it in a bunch of places. In one place the effect is equivalent but much more clear. In the rest it was really weirdly broken. Also, rewrite the entire merging thing to be a more directy operation with a single loop and just doing math to map the indices through the various masks. Also add a bunch of asserts to try to make in extremely clear what the different masks can possibly look like. Finally, add some comments to clarify that we're merging shuffle masks up here rather than down as we do everywhere else, and thus the logic is quite confusing. Thanks to several different people for sending test cases, and for Robert Khasanov for an initial attempt at fixing. llvm-svn: 215687	2014-08-15 02:43:18 +00:00
Juergen Ributzka	790bacf232	Revert several FastISel commits to track down a buildbot error. This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673	2014-08-14 19:56:28 +00:00
Duncan P. N. Exon Smith	e85df16564	Fix whitespace error from r215279, NFC llvm-svn: 215667	2014-08-14 17:18:26 +00:00
Adam Nemet	6fa675754a	[AVX512] Switch FMA intrinsics to the masking version This does the renaming and updates the lowering logic. Part of <rdar://problem/17688758> llvm-svn: 215664	2014-08-14 17:13:30 +00:00
Adam Nemet	9b4f08c729	[X86] Break out logic to map FMA Intrinsic number to Opcode No functional change. Will be used to lower AVX512 masking FMA intrinsics. llvm-svn: 215663	2014-08-14 17:13:27 +00:00
Adam Nemet	50b83f0bb8	[AVX512] Add enum for the static rounding types No functional change. This will be used by the new FMA intrinsic lowering code. We can probably add NO_EXC here as well, I am just not too familiar with this part of AVX512 yet. We can add that later. llvm-svn: 215662	2014-08-14 17:13:26 +00:00
Adam Nemet	6f31063dcf	[AVX512] Break out the logic to lower masking intrinsics No functional change. This will be used by the FMA intrinsic lowering as well and hopefully many more. llvm-svn: 215661	2014-08-14 17:13:24 +00:00
Adam Nemet	2e91ee58fe	[AVX512] Add masking variant for the FMA instructions This change further evolves the base class AVX512_masking in order to make it suitable for the masking variants of the FMA instructions. Besides AVX512_masking there is now a new base class that instructions including FMAs can use: AVX512_masking_3src. With three-source (destructive) instructions one of the sources is already tied to the destination. This difference from AVX512_masking is captured by this new class. The common bits between _masking and _masking_3src are broken out into a new super class called AVX512_masking_common. As with valign, there is some corresponding restructuring of the underlying format classes. The idea is the same we want to derive from two classes essentially: one providing the format bits and another format-independent multiclass supplying the various masking and non-masking instruction variants. Existing fma tests in avx512-fma.ll provide coverage here for the non-masking variants. For masking, the next patches in the series will add intrinsics and intrinsic tests. For AVX512_masking_3src to work, the (ins ...) dag has to be passed without* the leading source operand that is tied to dst ($src1). This is necessary to properly construct the (ins ...) for the different variants. For the record, I did check that if $src is mistakenly included, you do get a fairly intuitive error message from the tablegen backend. Part of <rdar://problem/17688758> llvm-svn: 215660	2014-08-14 17:13:19 +00:00
Chandler Carruth	a8311b3681	[x86] Begin stubbing out the AVX support in the new vector shuffle lowering scheme. Currently, this just directly bails to the fallback path of splitting the 256-bit vector into two 128-bit vectors, operating there, and then joining the results back together. While the results are far from perfect, they are shockingly good for what we're doing here. I'll be layering the rest of the functionality on top of this piece by piece and updating tests as I go. Note that 256-bit vectors in this mode are still somewhat WIP. While I think the code paths that I'm adding here are clean and good-to-go, there are still a lot of 128-bit assumptions that I'll need to stomp out as I march through the functional spread here. llvm-svn: 215637	2014-08-14 12:13:59 +00:00
Quentin Colombet	57fb040beb	[X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32. Found by code inspection. llvm-svn: 215604	2014-08-13 23:49:24 +00:00
Juergen Ributzka	0f8bc043c5	[FastISel][X86] Add large code model support for materializing floating-point constants. In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 215595	2014-08-13 22:25:35 +00:00
Juergen Ributzka	ba8b79e932	[FastISel][X86] Use XOR to materialize the "0" value. llvm-svn: 215594	2014-08-13 22:22:17 +00:00
Juergen Ributzka	230494b399	[FastISel][X86] Emit more efficient instructions for integer constant materialization. This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 215593	2014-08-13 22:18:11 +00:00
Juergen Ributzka	2b98e393f2	[FastISel][X86] Refactor constant materialization. NFCI. Split the constant materialization code into three separate helper functions for Integer-, Floating-Point-, and GlobalValue-Constants. llvm-svn: 215586	2014-08-13 22:01:55 +00:00
Benjamin Kramer	a7c40ef022	Canonicalize header guards into a common format. Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558	2014-08-13 16:26:38 +00:00
Aaron Ballman	1013b6b60c	Silence a -Wparenthesis warning with these asserts. NFC. llvm-svn: 215537	2014-08-13 10:49:07 +00:00
Robert Khasanov	ed8829703f	[SKX] Extended non-temporal load/store instructions for AVX512VL subsets. Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction. Added encoding and lowering tests. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 215536	2014-08-13 10:46:00 +00:00
Elena Demikhovsky	51bbd011c3	AVX-512: Fixed a bug in shufflevector lowering. PALIGNR instruction does not exist in AVX-512F set. Added a test. llvm-svn: 215526	2014-08-13 07:58:43 +00:00
Chandler Carruth	b7eda21bb0	[x86] Rewrite a core part of the new vector shuffle lowering to handle one pesky test case correctly. This test case caused the old code to infloop occilating between solving the low-half and the high-half. The 'side balancing' part of single-input v8 shuffle lowering didn't handle the one pattern which can cause it to occilate. Fortunately the fuzz testing found this case. Unfortuately it was terrible to handle. I'm really sorry for the amount and density of the code here, I'd love suggestions on how to simplify it. I feel like there must be a simpler form here, but after a lot of days I've not found it. This is the only one I've found that even works. I've added the one pesky test case along with some nice comments explaining the core problem that we have to solve here. So far this has survived approximately 32k test cases. More strenuous fuzzing commencing. llvm-svn: 215519	2014-08-13 01:25:45 +00:00
Adam Nemet	cee9d0a460	[AVX512] Handle valign masking intrinsic via C++ lowering I think that this will scale better in most cases than adding a Pat<> for each mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We can just lower to the SDNode and have the resulting DAG be matches by the DAG patterns. Alternatively (long term), we could keep the Pat<>s but generate them via the new AVX512_masking multiclass. The difficulty is that in order to formulate that we would have to concatenate DAGs. Currently this is only supported if the operators of the input DAGs are identical. llvm-svn: 215473	2014-08-12 21:13:12 +00:00
Sanjay Patel	8687f320e0	fixed typos llvm-svn: 215451	2014-08-12 16:00:06 +00:00
Elena Demikhovsky	40a77144a4	AVX-512: added a missing bitcast from v16f32 to v16i32 llvm-svn: 215351	2014-08-11 09:59:08 +00:00
Hans Wennborg	21b20cb11a	Increase the size of these SmallVectors in X86ISelLowering.cpp. In a Clang bootstrap, their sizes were always 12, 16 and 16, respectively. llvm-svn: 215336	2014-08-11 02:21:22 +00:00
Sanjay Patel	b48d1cfa53	fixed typos llvm-svn: 215299	2014-08-09 22:23:02 +00:00
Eric Christopher	e950b6776b	Initialize X86 DataLayout based on the Triple only. llvm-svn: 215279	2014-08-09 04:38:53 +00:00
Eric Christopher	4629ed75e4	Move some X86 subtarget configuration onto the subtarget that's being created. llvm-svn: 215271	2014-08-09 01:07:25 +00:00
Rui Ueyama	4c956fe129	[FastISel][X86] Silence -Wenum-compare warning llvm-svn: 215253	2014-08-08 22:47:49 +00:00
Juergen Ributzka	793f28d274	[FastISel][X86] Fix INC/DEC optimization (r215230) I accidentally also used INC/DEC for unsigned arithmetic which doesn't work, because INC/DEC don't set the required flag which is used for the overflow check. llvm-svn: 215237	2014-08-08 18:47:04 +00:00
Juergen Ributzka	4022614899	[FastISel][X86] Use INC/DEC when possible for {sadd\|ssub}.with.overflow intrinsics. This is a small peephole optimization to emit INC/DEC when possible. Fixes <rdar://problem/17952308>. llvm-svn: 215230	2014-08-08 17:21:37 +00:00
Patrik Hagglund	b0e86ec814	[pr19635] Revert most of r170537, and add new testcase. Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190	2014-08-08 08:21:19 +00:00
Adam Nemet	7d498629f1	[AVX512] Add zero-masking variant to AVX512_masking multiclass This completes one item from the todo-list of r215125 "Generate masking instruction variants with tablegen". The AddedComplexity is needed just like for the k variant. Added a codegen test based on valignq. llvm-svn: 215173	2014-08-07 23:53:38 +00:00
Adam Nemet	fa1f7201fc	[AVX512] Add codegen test for the masking variant of valign The AddedComplexity is needed just like in avx512_perm_3src. There may be a bug in the complexity computation... llvm-svn: 215168	2014-08-07 23:18:18 +00:00
Eric Christopher	b9fd9ed37e	Temporarily Revert "Nuke the old JIT." as it's not quite ready to be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154	2014-08-07 22:02:54 +00:00
Adam Nemet	2e2537f665	[AVX512] Generate masking instruction variants with tablegen After adding the masking variants to several instructions, I have decided to experiment with generating these from the non-masking/unconditional variant. This will hopefully reduce the amount repetition that we currently have in order to define an instruction with all its variants (for a reg/mem instruction this would be 6 instruction defs and 2 Pat<> for the intrinsic). The patch is the first cut that is currently only applied to valignd/q to make the patch small. A few notes on the approach: * In order to stitch together the dag for both the conditional and the unconditional patterns I pass the RHS of the set rather than the full pattern (set dest, RHS). * Rather than subclassing each instruction base class (e.g. AVX512AIi8), with a masking variant which wouldn't scale, I derived the masking instructions from a new base class AVX512 (this is just I<> with Requires<HasAVX512>). The instructions derive from this now, plus a new set of classes that add the format bits and everything else that instruction base class provided (i.e. AVX512AIi8 vs. AVX512AIi8Base). I hope we can go incrementally from here. I expect that: * We will need different variants of the masking class. One example is instructions requiring three vector sources. In this case we tie one of the source operands to dest rather than a new implicit source operand ($src0) * Add the zero-masking variant * Add more AVX512*Base classes as new uses are added I've looked at X86.td.expanded before and after to make sure that nothing got lost for valignd/q. llvm-svn: 215125	2014-08-07 17:53:55 +00:00
Rafael Espindola	f8b27c41e8	Nuke the old JIT. I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111	2014-08-07 14:21:18 +00:00
Alexander Kornienko	7151ad7762	Insert parens to avoid a warning: suggest parentheses around arithmetic in operand of '^' [-Wparentheses] llvm-svn: 215101	2014-08-07 12:09:34 +00:00
Pavel Chupin	124889243a	Fix lld-x86_64-win7 Build #11969 llvm-svn: 215097	2014-08-07 11:09:59 +00:00
Chandler Carruth	4e8fcbd3fd	[x86] Fix another miscompile found through fuzz testing the new vector shuffle lowering. This is closely related to the previous one. Here we failed to use the source offset when swapping in the other case -- where we end up swapping the final shuffle. The cause of this bug is a bit different: I simply wasn't thinking about the fact that this mask is actually a slice of a wide mask and thus has numbers that need SourceOffset applied. Simple fix. Would be even more simple with an algorithm-y thing to use here, but correctness first. =] llvm-svn: 215095	2014-08-07 10:37:35 +00:00
Chandler Carruth	e206385e99	[x86] Fix another miscompile in the new vector shuffle lowering found via the fuzz tester. Here I missed an offset when round-tripping a value through a shuffle mask. I got it right 2 lines below. See a problem? I do. ;] I'll probably be adding a little "swap" algorithm which accepts a range and two values and swaps those values where they occur in the range. Don't really have a name for it, let me know if you do. llvm-svn: 215094	2014-08-07 10:14:27 +00:00
Chandler Carruth	78494364d1	[x86] Fix another miscompile in the new vector shuffle lowering found through the new fuzzer. This one is great: bad operator precedence led the modulus to happen at the wrong point. All the asserts didn't fire because there were usually the right values past the end of the 4 element region we were looking at. Probably could have gotten a crash here with ASan + fuzzing, but the correctness tests pinpointed this really nicely. llvm-svn: 215092	2014-08-07 09:45:02 +00:00
Pavel Chupin	f55eb450e5	[x32] Use ebp/esp as frame and stack pointer Summary: Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still require 64-bit register, so using 64-bit MachineFramePtr where required. X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing this issue here as well by making isTarget64BitLP64 false. Also mark hasReservedSpillSlot unreachable on X86. See inlined comments. Test Plan: Add one new simple test and upgrade 2 existing with x32 target case. Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4617 llvm-svn: 215091	2014-08-07 09:41:19 +00:00
Chandler Carruth	27046758de	[x86] Fix a miscompile in the new shuffle lowering found through the new fuzz testing. The function which tested for adjacency did what it said on the tin, but when I called it, I wanted it to do something more thorough: I wanted to know if the pairs of shuffle elements were adjacent and started at 0 mod 2. In one place I had the decency to try to test for this, but in the other it was completely skipped, miscompiling this test case. Fix this by making the helper actually do what I wanted it to do everywhere I called it (and removing the now redundant code in one place). I really dislike the name "canWidenShuffleElements" for this predicate. If anyone can come up with a better name, please let me know. The other name I thought about was "canWidenShuffleMask" but is it really widening the mask to reduce the number of lanes shuffled? I don't know. Naming things is hard. llvm-svn: 215089	2014-08-07 08:11:31 +00:00
Saleem Abdulrasool	64a8cc7d0d	MC: split Win64EHUnwindEmitter into a shared streamer This changes Win64EHEmitter into a utility WinEH UnwindEmitter that can be shared across multiple architectures and a target specific bit which is overridden (Win64::UnwindEmitter). This enables sharing the section selection code across X86 and the intended use in ARM for emitting unwind information for Windows on ARM. llvm-svn: 215050	2014-08-07 02:59:41 +00:00
Quentin Colombet	0233d49574	[X86][SchedModel] Fixed missing/wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 215045	2014-08-07 00:20:44 +00:00
Reid Kleckner	ce63b791fe	MC X86: Accept ".att_syntax prefix" and diagnose noprefix Fixes PR18916. I don't think we need to implement support for either hybrid syntax. Nobody should write Intel assembly with '%' prefixes on their registers or AT&T assembly without them. llvm-svn: 215031	2014-08-06 23:21:13 +00:00
Sanjay Patel	b63e43c931	fix typo llvm-svn: 214995	2014-08-06 21:08:38 +00:00
Eric Christopher	b5217507c7	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988	2014-08-06 18:45:26 +00:00
Robert Khasanov	3c30c4bdec	[AVX512] Added load/store instructions to Register2Memory opcode tables. Added lowering tests for load/store. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214972	2014-08-06 15:40:34 +00:00
Chandler Carruth	c3927cd8c9	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954	2014-08-06 10:16:36 +00:00
Chandler Carruth	8f23ba26d2	[x86] Switch to a formulation of a for loop that is much more obviously not corrupting the mask by mutating it more times than intended. No functionality changed (the results were non-overlapping so the old version "worked" but was non-obvious). llvm-svn: 214953	2014-08-06 10:16:33 +00:00
Adam Nemet	5ec912881f	[X86] Fixes commit r214890 to match the posted patch This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951	2014-08-06 07:13:12 +00:00
Quentin Colombet	33ea1681ce	[X86][SchedModel] Fixed some wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 214940	2014-08-06 00:22:39 +00:00
JF Bastien	ac8b66b32c	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934	2014-08-05 23:27:34 +00:00
Rafael Espindola	b8141d55b9	Remove a virtual function from TargetMachine. NFC. llvm-svn: 214929	2014-08-05 22:10:21 +00:00
Chandler Carruth	a746239be3	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
NAKAMURA Takumi	ca562297d9	X86CodeEmitter.cpp: Add SEH_Epilogue to ignored list for legacy JIT, corresponding to r214775. llvm-svn: 214905	2014-08-05 18:04:15 +00:00
Adam Nemet	c04f3f9f73	[X86] Improve comments for r214888 A rebase somehow ate my comments. This restores them. llvm-svn: 214903	2014-08-05 17:58:49 +00:00
Adam Nemet	fd2161b710	[AVX512] Add masking variant and intrinsics for valignd/q This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890	2014-08-05 17:23:04 +00:00
Adam Nemet	4688a2e5cb	[X86] Increase X86_MAX_OPERANDS from 5 to 6 This controls the number of operands in the disassembler's x86OperandSets table. The entries describe how the operand is encoded and its type. Not to surprisingly 5 operands is insufficient for AVX512. Consider VALIGNDrrik in the next patch. These are its operand specifiers: { /* 328 */ { ENCODING_DUP, TYPE_DUP1 }, { ENCODING_REG, TYPE_XMM512 }, { ENCODING_WRITEMASK, TYPE_VK8 }, { ENCODING_VVVV, TYPE_XMM512 }, { ENCODING_RM_CD64, TYPE_XMM512 }, { ENCODING_IB, TYPE_IMM8 }, }, llvm-svn: 214889	2014-08-05 17:23:01 +00:00
Adam Nemet	164b07fbfe	[X86] Add lowering to VALIGN This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these at the same time, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. llvm-svn: 214888	2014-08-05 17:22:59 +00:00
Adam Nemet	2f10cc699d	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887	2014-08-05 17:22:55 +00:00
Adam Nemet	d00a05e3e2	[AVX512] alignr: Use suffix rather than name argument to multiclass Again no functional change. This prepares for the suffix to be used with the intrinsic matching. llvm-svn: 214886	2014-08-05 17:22:52 +00:00
Adam Nemet	f92139dd61	[AVX512] Pull everything alignr-related into the multiclass The packed integer pattern becomes the DAG pattern for rri and the packed float, another Pat<> inside the multiclass. No functional change. llvm-svn: 214885	2014-08-05 17:22:50 +00:00
Adam Nemet	1c752d8f5e	Wrap long lines llvm-svn: 214884	2014-08-05 17:22:47 +00:00
Chandler Carruth	183771bd8e	[x86] Reformat some code I moved around in a prior commit but left poorly formatted. Sorry about that. llvm-svn: 214853	2014-08-05 10:35:30 +00:00
Chandler Carruth	947cef191d	[x86] Fix a crash and wrong-code bug in the new vector lowering all found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the used slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] llvm-svn: 214849	2014-08-05 08:19:21 +00:00
Eric Christopher	fc6de428c8	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838	2014-08-05 02:39:49 +00:00
Eric Christopher	d913448b38	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Reid Kleckner	e704010450	Fix failure to invoke exception handler on Win64 When the last instruction prior to a function epilogue is a call, we need to emit a nop so that the return address is not in the epilogue IP range. This is consistent with MSVC's behavior, and may be a workaround for a bug in the Win64 unwinder. Differential Revision: http://reviews.llvm.org/D4751 Patch by Vadim Chugunov! llvm-svn: 214775	2014-08-04 21:05:27 +00:00
Akira Hatanaka	e457f3e17a	[X86] Place parentheses around "isMask_32(STReturns) && N <= 2". This corrects r214672, which was committed to silence a gcc warning. llvm-svn: 214732	2014-08-04 17:23:38 +00:00
Robert Khasanov	7ca7df0bf9	[SKX] Enabling load/store instructions: encoding Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS, Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214719	2014-08-04 14:35:15 +00:00
Aaron Ballman	c36c6abc45	Improving the name of the function parameter, which happens to solve two likely-less-than-useful MSVC warnings: warning C4258: 'I' : definition from the for loop is ignored; the definition from the enclosing scope is used. llvm-svn: 214717	2014-08-04 13:51:27 +00:00
Chandler Carruth	0e2ddb2790	[x86] Just unilaterally prefer SSSE3-style PSHUFB lowerings over clever use of PACKUS. It's cleaner that way. I looked at implementing clever combine-based folding of PACKUS chains into PSHUFB but it is quite hard and doesn't seem likely to be worth it. The most annoying part would be detecting that the correct masking had been done to use PACKUS-style instructions as a blend operation rather than there being any saturating as is indicated by its name. We generate really nice code for what few test cases I've come up with that aren't completely contrived for this by just directly prefering PSHUFB and so let's go with that strategy for now. =] llvm-svn: 214707	2014-08-04 10:17:35 +00:00
Chandler Carruth	06e6f1cae2	[x86] Implement more aggressive use of PACKUS chains for lowering common patterns of v16i8 shuffles. This implements one of the more important FIXMEs for the SSE2 support in the new shuffle lowering. We now generate the optimal shuffle sequence for truncate-derived shuffles which show up essentially everywhere. Unfortunately, this exposes a weakness in other parts of the shuffle logic -- we can no longer form PSHUFB here. I'll add the necessary support for that and other things in a subsequent commit. llvm-svn: 214702	2014-08-04 09:40:02 +00:00
Chandler Carruth	37a18821cd	[x86] Handle single input shuffles in the SSSE3 case more intelligently. I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. llvm-svn: 214680	2014-08-04 01:14:24 +00:00
Saleem Abdulrasool	557023e349	X86: silence warning (-Wparentheses) GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition: lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] assert(STReturns == 0 \|\| isMask_32(STReturns) && N <= 2); llvm-svn: 214672	2014-08-03 23:00:39 +00:00
Saleem Abdulrasool	4544c16eab	MC: virtualise EmitWindowsUnwindTables This makes EmitWindowsUnwindTables a virtual function and lowers the implementation of the function to the X86WinCOFFStreamer. This method is a target specific operation. This enables making the behaviour target dependent by isolating it entirely to the target specific streamer. llvm-svn: 214664	2014-08-03 18:51:26 +00:00
Chandler Carruth	16c13cad35	[x86] Remove the FIXME that was implemented in r214628. Managed to forget to update the comment here... =/ llvm-svn: 214630	2014-08-02 11:34:23 +00:00
Chandler Carruth	4c57955fe3	[x86] Largely complete the use of PSHUFB in the new vector shuffle lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628	2014-08-02 10:39:15 +00:00
Chandler Carruth	d10b29240c	[x86] Switch to using the variable we extracted this operand into. Spotted this missed refactoring by inspection when reading code, and it doesn't changethe functionality at all. llvm-svn: 214627	2014-08-02 10:29:36 +00:00
Chandler Carruth	5219d4eff6	[x86] Fix a few typos in my comments spotted in passing. llvm-svn: 214626	2014-08-02 10:29:34 +00:00
Chandler Carruth	34f9a987e9	[x86] Teach the target shuffle mask extraction to recognize unary forms of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point outputs despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. llvm-svn: 214625	2014-08-02 10:27:38 +00:00
Chandler Carruth	2ad69eea8d	[x86] Teach my pshufb comment printer to handle VPSHUFB forms as well as PSHUFB forms. This will be important to update some AVX tests when I add PSHUFB combining. llvm-svn: 214624	2014-08-02 10:08:17 +00:00
Akira Hatanaka	3516669a50	[X86] Simplify X87 stackifier pass. Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580	2014-08-01 22:19:41 +00:00
Eric Christopher	6c05d9135f	Add a non-const subtarget returning function to the target machine so that we can use it to get the old-style JIT out of the subtarget. This code should be removed when the old-style JIT is removed (imminently). llvm-svn: 214560	2014-08-01 21:18:01 +00:00
Reid Kleckner	5b37c18129	MS inline asm: Use memory constraints for functions instead of registers This is consistent with how we parse them in a standalone .s file, and inline assembly shouldn't differ. This fixes errors about requiring more registers than available in cases like this: void f(); void __declspec(naked) g() { __asm pusha __asm call f __asm popa __asm ret } There are no registers available to pass the address of 'f' into the asm blob. The asm should now directly call 'f'. Tests will land in Clang shortly. llvm-svn: 214550	2014-08-01 20:21:24 +00:00
Philip Reames	7684618401	Add support for StackMap section for ELF/Linux systems This patch adds code to emits the StackMap section on ELF systems. This section is required to support llvm.experimental.stackmap and llvm.experimental.patchpoint intrinsics. Reviewers: ributzka, echristo Differential Revision: http://reviews.llvm.org/D4574 llvm-svn: 214538	2014-08-01 18:47:09 +00:00
Reid Kleckner	71ff3f223f	MS inline asm: Fix null SMLoc when 'ptr' is missing after dword & co This improves the diagnostics from the regular assembler, but more importantly it fixes an assertion when parsing inline assembly. Test landing in Clang. llvm-svn: 214468	2014-08-01 00:59:22 +00:00
Kevin Enderby	0d928a142b	Add support for the X86 secure guard extensions instructions in assembler (SGX). This allows assembling the two new instructions, encls and enclu for the SKX processor model. Note the diffs are a bigger than what might think, but to fit the new MRM_CF and MRM_D7 in things in the right places things had to be renumbered and shuffled down causing a bit more diffs. rdar://16228228 llvm-svn: 214460	2014-07-31 23:57:38 +00:00
Reid Kleckner	b7e2f6015a	X86 MC: Don't crash on empty memory operand parens Instead, create an absolute memory operand. Fixes PR20504. llvm-svn: 214457	2014-07-31 23:26:35 +00:00
Reid Kleckner	0c5da97dd0	X86 MC: Reject invalid segment registers before a memory operand colon Previously we would execute unreachable during object emission. llvm-svn: 214456	2014-07-31 23:03:22 +00:00
Louis Gerbarg	67474e3755	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449	2014-07-31 21:45:05 +00:00
Evgeniy Stepanov	77ad86681f	[asan] Support x86 REP MOVS asm instrumentation. Patch by Yuri Gorshenin. llvm-svn: 214395	2014-07-31 09:11:04 +00:00
Juergen Ributzka	39032673da	[FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering. UNDEF arguments are not ment to be touched - especially for the webkit_js calling convention. This fix reproduces the already existing behavior of SelectionDAG in FastISel. llvm-svn: 214366	2014-07-31 00:11:11 +00:00
Reid Kleckner	b1f2d2f4ef	X86 asm parser: Avoid duplicating the list of aliased instructions No functional change. llvm-svn: 214364	2014-07-31 00:07:33 +00:00
Reid Kleckner	7b1e1a0d8e	X86 asm parser: Use a loop to disambiguate suffixes instead of copy paste This works towards making the Intel syntax asm matcher use a completely different disambiguation strategy. No functional change. llvm-svn: 214352	2014-07-30 22:23:11 +00:00
Juergen Ributzka	fa1d61e6c3	[FastISel] Move the helper function isCommutativeIntrinsic into FastISel base class. Move the helper function isCommutativeIntrinsic into the FastISel base class, so it can be used by more than just one backend. llvm-svn: 214347	2014-07-30 22:04:28 +00:00
Robert Khasanov	595683da00	[SKX] Enabling mask logic instructions: encoding, lowering Instructions: KAND{BWDQ}, KANDN{BWDQ}, KOR{BWDQ}, KXOR{BWDQ}, KXNOR{BWDQ} Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214081	2014-07-28 13:46:45 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Chandler Carruth	64a7c828cb	[x86] Sink a variable only used by asserts into the asserts. Should fix some -Werror bots, sorry for the noise. llvm-svn: 214043	2014-07-27 01:45:49 +00:00
Chandler Carruth	80c5bfd843	[x86] Add a much more powerful framework for combining x86 shuffle instructions in the legalized DAG, and leverage it to combine long sequences of instructions to PSHUFB. Eventually, the other x86-instruction-specific shuffle combines will probably all be driven out of this routine. But the real motivation is to detect after we have fully legalized and optimized a shuffle to the minimal number of x86 instructions whether it is profitable to replace the chain with a fully generic PSHUFB instruction even though doing so requires either a load from a constant pool or tying up a register with the mask. While the Intel manuals claim it should be used when it replaces 5 or more instructions (!!!!) my experience is that it is actually very fast on modern chips, and so I've gon with a much more aggressive model of replacing any sequence of 3 or more instructions. I've also taught it to do some basic canonicalization to special-purpose instructions which have smaller encodings than their generic counterparts. There are still quite a few FIXMEs here, and I've not yet implemented support for lowering blends with PSHUFB (where its power really shines due to being able to zero out lanes), but this starts implementing real PSHUFB support even when using the new, fancy shuffle lowering. =] llvm-svn: 214042	2014-07-27 01:15:58 +00:00
Nick Lewycky	d7c726c5e9	Fix broken assert. llvm-svn: 214019	2014-07-26 05:44:15 +00:00
NAKAMURA Takumi	1fa7769ba9	X86ShuffleDecode.cpp: Silence a warning. [-Wunused-variable] llvm-svn: 214016	2014-07-26 04:53:05 +00:00
Chandler Carruth	5896698e2e	[x86] Fix PR20355 (for real). There are many layers to this bug. The tale starts with r212808 which attempted to fix inversion of the low and high bits when lowering MUL_LOHI. Sadly, that commit did not include any positive test cases, and just removed some operations from a test case where the actual logic being changed isn't fully visible from the test. What this commit did was two things. First, it reversed the low and high results in the formation of the MERGE_VALUES node for the multiple results. This is entirely correct. Second it changed the shuffles for extracting the low and high components from the i64 results of the multiplies to extract them assuming a big-endian-style encoding of the multiply results. This second change is wrong. There is no big-endian encoding in x86, the results of the multiplies are normal v2i64s: when cast to v4i32, the low i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3. However, the first change wasn't enough to actually fix the bug, which is (I assume) why the second change was also made. There was another bug in the MERGE_VALUES formation: we weren't using a VTList, and so were getting a single result node! When grabbing the second result from the node, we got... well.. colud be anything. I think this appeared to invert things, but had to be causing other problems as well. Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should have been fine, right? NOOOPE! Because the core bug was never addressed, the test in vector-idiv failed when I fixed the MERGE_VALUES node. Because there are essentially no docs for this node, I had to guess at how to fix it and tried swapping the operands, restoring the order of the original code before r212808. While this "fixed" the test case (in that we produced the write instructions) we were still extracting the wrong elements of the i64s, and thus PR20355 was still broken. This commit essentially reverts the big-endian-style extraction part of r212808 and goes back to the original masks which were correct. Now that the MERGE_VALUES node formation is also correct, everything works. I've also included a more detailed test from PR20355 to make sure this stays fixed. llvm-svn: 214011	2014-07-26 03:46:57 +00:00
Chandler Carruth	f6406ac5d6	[x86] Revert r214007: Fix PR20355 ... The clever way to implement signed multiplication with unsigned is already implemented and tested and working correctly. The bug is somewhere else. Re-investigating. This will teach me to not scroll far enough to read the code that did what I thought needed to be done. llvm-svn: 214009	2014-07-26 02:14:54 +00:00
Chandler Carruth	1bf4d19172	[x86] Fix PR20355 (and dups) by not using unsigned multiplication when signed multiplication is requested. While there is not a difference in the low half of the result, the high half (used specifically to implement the signed division by these constants) certainly is used. The test case I've nuked was actively asserting wrong code. There is a delightful solution to doing signed multiplication even when we don't have it that Richard Smith has crafted, but I'll add the machinery back and implement that in a follow-up patch. This at least restores correctness. llvm-svn: 214007	2014-07-26 01:52:13 +00:00
NAKAMURA Takumi	8b2e7bfac1	Update X86/Utils/LLVMBuild.txt corresponding to r213986. "Core" has been introduced. llvm-svn: 213995	2014-07-26 00:45:43 +00:00
Chandler Carruth	0e469609f3	[x86] Fix unused variable warning in no-asserts build. llvm-svn: 213989	2014-07-26 00:04:41 +00:00
Chandler Carruth	185cc18d42	[x86] Teach the X86 backend to print shuffle comments for PSHUFB instructions which happen to have a constant mask. Currently, this only handles a very narrow set of cases, but those happen to be the cases that I care about for testing shuffles sanely. This is a bit trickier than other shuffle instructions because we're decoding constants out of the constant pool. The current MC layer makes it completely impossible to inspect a constant pool entry, so we have to do it at the MI level and attach the comment to the streamer on its way out. So no joy for disassembling, but it does make test cases and asm dumps much nicer. Sorry for no test cases, but it didn't really seem that valuable to go trolling through existing old test cases and updating them. I'll have lots of testing of this in the upcoming patch for SSSE3 emission in the new vector shuffle lowering code paths. llvm-svn: 213986	2014-07-25 23:47:11 +00:00
Akira Hatanaka	e5b6e0d231	[stack protector] Fix a potential security bug in stack protector where the address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967	2014-07-25 19:31:34 +00:00
Chandler Carruth	3de980d2ff	[SDAG] Enable the new assert for out-of-range result numbers in SDValues, fixing the two bugs left in the regression suite. The key for both of these was the use a single value type rather than a VTList which caused an unintentionally single-result merge-value node. Fix this by getting the appropriate VTList in place. Doing this exposed that the comments in x86's code abouth how MUL_LOHI operands are handle is wrong. The bug with the use of out-of-range result numbers was hiding the bug about the order of operands here (as best i can tell). There are more places where the code appears to get this backwards still... llvm-svn: 213931	2014-07-25 09:19:23 +00:00
Lang Hames	5432649be7	[X86] Clarify some stackmap shadow optimization code as based on review feedback from Eric Christopher. No functional change. llvm-svn: 213917	2014-07-25 02:29:19 +00:00
Chandler Carruth	80b869461e	[x86] Make vector legalization of extloads work more like the "normal" vector operation legalization with support for custom target lowering and fallback to expand when it fails, and use this to implement sext and anyext load lowering for x86 in a more principled way. Previously, the x86 backend relied on a target DAG combine to "combine away" sextload and extload nodes prior to legalization, or would expand them during legalization with terrible code. This is particularly problematic because the DAG combine relies on running over non-canonical DAG nodes at just the right time to match several common and important patterns. It used a combine rather than lowering because we didn't have good lowering support, and to expose some tricks being employed to more combine phases. With this change it becomes a proper lowering operation, the backend marks that it can lower these nodes, and I've added support for handling the canonical forms that don't have direct legal representations such as sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases for this behavior continue to pass even after the DAG combiner beigns running more systematically over every node. There is some noise caused by this in the test suite where we actually use vector extends instead of subregister extraction. This doesn't really seem like the right thing to do, but is unlikely to be a critical regression. We do regress in one case where by lowering to the target-specific patterns early we were able to combine away extraneous legal math nodes. However, this regression is completely addressed by switching to a widening based legalization which is what I'm working toward anyways, so I've just switched the test to that mode. Differential Revision: http://reviews.llvm.org/D4654 llvm-svn: 213897	2014-07-24 22:09:56 +00:00
Lang Hames	f49bc3f1b1	[X86] Optimize stackmap shadows on X86. This patch minimizes the number of nops that must be emitted on X86 to satisfy stackmap shadow constraints. To minimize the number of nops inserted, the X86AsmPrinter now records the size of the most recent stackmap's shadow in the StackMapShadowTracker class, and tracks the number of instruction bytes emitted since the that stackmap instruction was encountered. Padding is emitted (if it is required at all) immediately before the next stackmap/patchpoint instruction, or at the end of the basic block. This optimization should reduce code-size and improve performance for people using the llvm stackmap intrinsic on X86. <rdar://problem/14959522> llvm-svn: 213892	2014-07-24 20:40:55 +00:00
Reid Kleckner	9a412d13c1	Replace an assertion with a fatal error Frontends are responsible for putting inalloca on parameters that would be passed in memory and not registers. llvm-svn: 213891	2014-07-24 19:53:33 +00:00
Saleem Abdulrasool	34610e33ae	X86: silence sign comparison warning GCC 4.8 detected a signed compare [-Wsign-compare]. Add a cast for the destination index. Add an assert to catch a potential overflow however unlikely it may be. llvm-svn: 213878	2014-07-24 17:12:06 +00:00
NAKAMURA Takumi	9c3bd7618a	Update library dependencies. llvm-svn: 213832	2014-07-24 02:10:42 +00:00
Filipe Cabecinhas	933cccf3fa	Fixed PR20411 - bug in getINSERTPS() When we had a vector_shuffle where we had an input from each vector, we could miscompile it because we were assuming the input from V2 wouldn't be moved from where it was on the vector. Added a test case. llvm-svn: 213826	2014-07-24 01:28:21 +00:00
Rafael Espindola	5addace56d	Finish inverting the MC -> Object dependency. There were still some disassembler bits in lib/MC, but their use of Object was only visible in the includes they used, not in the symbols. llvm-svn: 213808	2014-07-23 22:26:07 +00:00
Jim Grosbach	724e438c62	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants. The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800	2014-07-23 20:41:43 +00:00
Jim Grosbach	8f6f0858ec	X86: restrict combine to when type sizes are safe. The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799	2014-07-23 20:41:38 +00:00
Robert Khasanov	74acbb7767	[SKX] Enabling mask instructions: encoding, lowering KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757	2014-07-23 14:49:42 +00:00
Andrea Di Biagio	842355e900	Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736	2014-07-23 11:20:24 +00:00
Tim Northover	0942e39061	X86: drop relocations on __eh_frame sections globally. Without this, we produce non-extern relocations when targeting older OS X versions that ld64 can't cope with in the particular context of __eh_frame sections (who'd want generic relocation-processing anyway?). This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be needed when targeting such platforms with a modern version of LLVM, but this is probably the case anyway and a reasonable requirement. PR20212, rdar://problem/17544795 llvm-svn: 213665	2014-07-22 15:47:09 +00:00
Elena Demikhovsky	f164859efc	AVX-512: Fixed intrinsic of VSQRTPS/PD instructions. I set number and types of parameters according to GCC intrinsics. llvm-svn: 213640	2014-07-22 11:07:31 +00:00
Robert Khasanov	bfa0131365	[SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features. Enabling HasAVX512{DQ,BW,VL} predicates. Adding VK2, VK4, VK32, VK64 masked register classes. Adding new types (v64i8, v32i16) to VR512. Extending calling conventions for new types (v64i8, v32i16) Patch by Zinovy Nis <zinovy.y.nis@intel.com> Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213545	2014-07-21 14:54:21 +00:00
Andrea Di Biagio	4d8bd41600	[DAG] Refactor some logic. No functional change. This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'. This refactoring is in preperation of an upcoming change to the DAGCombiner. llvm-svn: 213503	2014-07-21 07:28:51 +00:00
Tim Northover	871de902af	X86: support fpext/fptrunc operations to and from 16-bit floats. llvm-svn: 213374	2014-07-18 13:01:25 +00:00
Jim Grosbach	b6535c32f5	X86: Constant fold converting vector setcc results to float. Since the result of a SETCC for X86 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the SSE code is generated as: LCPI0_0: .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 retq After, the code is improved to: LCPI0_0: .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 retq The cvtdq2ps has been constant folded away and the floating point 1.0f vector lanes are materialized directly via the ModRM operand of andps. llvm-svn: 213342	2014-07-18 00:40:56 +00:00
Nico Weber	42f79dbf02	ms inline asm: Don't add x86 segment registers to the clobber list. Clang tries to check the clobber list but doesn't list segment registers in its x86 register list. This fixes PR20343. llvm-svn: 213303	2014-07-17 20:24:55 +00:00
Adam Nemet	5933c2f824	[X86] AVX512: Add disassembler support for compressed displacement There are two parts here. First is to modify tablegen to adjust the encoding type ENCODING_RM with the scaling factor. The second is to use the new encoding types to compute the correct displacement in the decoder. Fixes <rdar://problem/17608489> llvm-svn: 213281	2014-07-17 17:04:56 +00:00
Adam Nemet	4c339abab3	[X86] AVX512: Rename EVEX_CD8V to CD8_Form This is to match the naming of CD8_EltSize, CD8_Scale, etc. No functional change. llvm-svn: 213280	2014-07-17 17:04:52 +00:00
Adam Nemet	54adb0fcbc	[X86] AVX512: Use the TD version of CD8_Scale in the assembler Passes the computed scaling factor in TSFlags rather than the old attributes. Also removes the C++ version of computing the scaling factor (MemObjSize) along with the asserts added by the previous patch. No functional change. llvm-svn: 213279	2014-07-17 17:04:50 +00:00
Adam Nemet	4dc92b9a84	[X86] AVX512: Move compressed displacement logic to TD This does not actually move the logic yet but reimplements it in the Tablegen language. Then asserts that the new implementation results in the same value. The next patch will remove the assert and the temporary use of the TSFlags and remove the C++ implementation. The formula requires a limited form of the logical left and right operators. I implemented these with the bit-extract/insert operator (i.e. blah{bits}). No functional change. llvm-svn: 213278	2014-07-17 17:04:34 +00:00
Saleem Abdulrasool	862e60c75c	MC: fix MCAsmInfo usage for windows-itanium Windows itanium uses the GNUCOFF assmebly format, not ELF. llvm-svn: 213274	2014-07-17 16:27:40 +00:00
Tim Northover	84ce0a642e	CodeGen: generate single libcall for fptrunc -> f16 operations. Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252	2014-07-17 11:12:12 +00:00
Tim Northover	2131044814	X86: support double extension of f16 type. x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. llvm-svn: 213251	2014-07-17 11:04:04 +00:00
Tim Northover	fd7e424935	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Sanjay Patel	6360441f99	Remove Atom references in description. Any CPU can run this pass. llvm-svn: 213190	2014-07-16 20:18:49 +00:00
Tim Northover	7f3e11e7c0	CodeGen: don't form illegail EXTLOAD operations. It turns out that in most cases (the main exception being i1-related types) once these operations are formed we cannot separate them and the targets end up having to deal with them whether they want to or not. This is not a good situation, and a more reasonable default can be formed by ackowledging this and having targets leave them as Legal. Only x86 seems to be affected (other targets don't even try marking the operation Expand). Mostly there's no visible change here yet, but it will be useful to have truly expanded EXTLOADS for MVT::f16 softening support. llvm-svn: 213162	2014-07-16 15:37:24 +00:00
Andrea Di Biagio	a03624d8ab	[X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'. Before this change, method 'isShuffleMaskLegal' didn't know that shuffles implementing a 'movhlps' operation were perfectly legal for SSE targets. This patch adds the missing check for 'isMOVHLPSMask' inside method 'isShuffleMaskLegal' to fix the problem. The reason why it is important to do this is because the DAGCombiner conservatively avoids combining a pair of shuffles if the resulting shuffle node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were wrongly considered not to be legal. This was the root cause of some poor-code generation bugs. llvm-svn: 213137	2014-07-16 11:29:39 +00:00
David Majnemer	3821ff03cd	X86: Simplify X86WindowsTargetObjectFile::getSectionForConstant There exists a helper function to abstract away the various differences between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc. Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant. llvm-svn: 213104	2014-07-15 23:01:10 +00:00
Sanjay Patel	a2f658d69d	Move Post RA Scheduling flag bit into SchedMachineModel Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101	2014-07-15 22:39:58 +00:00
Cameron McInally	44f3e30cf2	Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...). llvm-svn: 213073	2014-07-15 16:24:24 +00:00
Cameron McInally	53bc7a3330	Add x86 patterns to match a specific add-with-carry. llvm-svn: 213070	2014-07-15 15:03:32 +00:00
Andrea Di Biagio	04d5a7b337	Silence a warning in conditional expression. Fixes a gcc warning caused by a typo. A redundant assignment operation was accidentally used as the third operand of a conditional expression. No functional change intended. llvm-svn: 213061	2014-07-15 10:53:44 +00:00
David Majnemer	d4d9944416	Fix typo in comment No functionality changed. llvm-svn: 213052	2014-07-15 07:11:32 +00:00
Juergen Ributzka	8f073c8d60	[FastISel][X86] Remove no longer needed functions. llvm-svn: 213051	2014-07-15 06:35:53 +00:00
Juergen Ributzka	3566c08dd9	[FastISel][X86] Implement the FastLowerIntrinsicCall hook. Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 213050	2014-07-15 06:35:50 +00:00
Juergen Ributzka	23d43318c7	[FastISel][X86] Implement the FastLowerCall hook. This implements the FastLowerCall hook, which is based on the DoSelectCall function. The implementation is very similar, but the target-independent call lowering part has been factored out. This should also enable patchpoint intrinsic lowering for FastISel on X86. Related to <rdar://problem/17427052>. llvm-svn: 213049	2014-07-15 06:35:47 +00:00
Juergen Ributzka	5ee9d90248	Revert "[FastISel][X86] Remove no longer needed functions." Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook." Revert "[FastISel][X86] Implement the FastLowerCall hook." This reverts commit r213035, r213036, and r213037 to make the buildbots happy again. llvm-svn: 213048	2014-07-15 05:23:40 +00:00
David Majnemer	4e3ccc0505	CodeGen: Handle ConstantVector and undef in WinCOFF constant pools The constant pool entry code for WinCOFF assumed that vector constants would be formed using ConstantDataVector, it did not expect to see a ConstantVector. Furthermore, it did not expect undef as one of the elements of the vector. ConstantVectors should be handled like ConstantDataVectors, treat Undef as zero. llvm-svn: 213038	2014-07-15 02:34:12 +00:00
Juergen Ributzka	9fbf33d70f	[FastISel][X86] Remove no longer needed functions. llvm-svn: 213037	2014-07-15 02:22:56 +00:00
Juergen Ributzka	170f9354bb	[FastISel][X86] Implement the FastLowerIntrinsicCall hook. Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 213036	2014-07-15 02:22:53 +00:00
Juergen Ributzka	a9cced8a94	[FastISel][X86] Implement the FastLowerCall hook. This implements the FastLowerCall hook, which is based on the DoSelectCall function. The implementation is very similar, but the target-independent call lowering part has been factored out. This should also enable patchpoint intrinsic lowering for FastISel on X86. Related to <rdar://problem/17427052>. llvm-svn: 213035	2014-07-15 02:22:49 +00:00
Adam Nemet	cf7c905cfb	[X86] Specify all TSFlags bit-offsets symbolically No functional change. The offsets for the other bitfields are specified symbolically. I need to increase the size for one of the earlier fields which is easier after this cleanup. Why these bits are relative to VEXShift is a bit strange but that is for another cleanup. I made sure that the values for the enums are unchanged after this change. llvm-svn: 213011	2014-07-14 23:18:39 +00:00
David Majnemer	8bce66b093	CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006	2014-07-14 22:57:27 +00:00
Saleem Abdulrasool	b51d464f1e	X86: correct 64-bit atomics on 32-bit We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a 32-bit target. They were added at different times and would result in the border condition being mishandled. This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic operations on x86 at the cost of restoring a long-standing bug in the codegen. We emit a cmpxchg8b on all x86 targets even where the CPU does not support this instruction (pre-Pentium CPUs). Although this bug should be fixed, this was present prior to SVN r212119 and this change, so this is not really introducing a regression. llvm-svn: 212956	2014-07-14 16:28:13 +00:00
Tim Northover	6c647eae8b	X86: remove temporary atomicrmw used during lowering. We construct a temporary "atomicrmw xchg" instruction when lowering atomic stores for widths that aren't supported natively. This isn't on the top-level worklist though, so it won't be removed automatically and we have to do it ourselves once that itself has been lowered. Thanks Saleem for pointing this out! llvm-svn: 212948	2014-07-14 15:31:13 +00:00
Saleem Abdulrasool	3f3cefd392	MC: make DWARF and Windows unwinding handling more similar Rename member variables and functions for the MCStreamer for DWARF-like unwinding management. Rename the Windows ones as well and make the naming and handling similar across the two. No functional change intended. llvm-svn: 212912	2014-07-13 19:03:36 +00:00
Juergen Ributzka	d755e9f730	Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook." This reverts commit r212851, because it broke the memset lowering. llvm-svn: 212855	2014-07-11 23:10:08 +00:00
Juergen Ributzka	04b444913b	[FastISel][X86] Implement the FastLowerIntrinsicCall hook. Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 212851	2014-07-11 22:37:43 +00:00
Quentin Colombet	0f179c4d8a	[X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI. Also add a few comments. <rdar://problem/17581756> llvm-svn: 212808	2014-07-11 12:08:23 +00:00
Adam Nemet	26f817497c	[X86] AVX512: Improve readability of isCDisp8 No functional change. As I was trying to understand this function, I found that variables were reused with confusing names and the broadcast case was a bit too implicit. Hopefully, this is an improvement. llvm-svn: 212795	2014-07-11 05:23:25 +00:00
Adam Nemet	e311c3c836	[X86] AVX512: Simplify logic in isCDisp8 It was computing the VL/n case as: MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize ElemByteSize not only falls out but VectorByteSize/Divider now actually matches the definition of VL/n. Also some formatting fixes. llvm-svn: 212794	2014-07-11 05:23:12 +00:00
Akira Hatanaka	7cc27649a6	[X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0. Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable macro-fusion. <rdar://problem/15680770> llvm-svn: 212747	2014-07-10 18:00:53 +00:00
Zinovy Nis	cad431c122	[x32] Add AsmBackend for X32 which uses ELF32 with x86_64 (the author is Pavel Chupin). This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow. Differential Revision: http://reviews.llvm.org/D4181 llvm-svn: 212716	2014-07-10 13:03:26 +00:00
Chandler Carruth	df8d0caab7	[x86] Add another combine that is particularly useful for the new vector shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. llvm-svn: 212705	2014-07-10 11:09:29 +00:00
Chandler Carruth	853fa0ac8d	[x86] Expand the target DAG combining for PSHUFD nodes to be able to combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be really nice. llvm-svn: 212695	2014-07-10 09:57:36 +00:00
Chandler Carruth	a34a8e230d	[x86] Tweak the v16i8 single input special case lowering for shuffles that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. llvm-svn: 212692	2014-07-10 09:16:40 +00:00
Chandler Carruth	7d2ffb5492	[x86] Initial improvements to the new shuffle lowering for v16i8 shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680	2014-07-10 04:34:06 +00:00
Chandler Carruth	b3840a55ae	[x86] Refactor some of the new code for lowering v16i8 shuffles to remove duplication and make it easier to select different strategies. No functionality changed. llvm-svn: 212674	2014-07-10 02:24:26 +00:00
Chandler Carruth	d3561f6fec	[SDAG] Make the new zext-vector-inreg node default to expand so targets don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661	2014-07-09 22:53:04 +00:00
Benjamin Kramer	c560a6cadc	TargetRegisterInfo: Remove function that fell out of use years ago. llvm-svn: 212636	2014-07-09 18:53:57 +00:00
Adam Nemet	2820a5b9e9	[X86] AVX512: Enable it in the Loop Vectorizer This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634	2014-07-09 18:22:33 +00:00
Benjamin Kramer	d6f1733add	X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2. Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611	2014-07-09 11:12:39 +00:00
Chandler Carruth	afe4b2507e	[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610	2014-07-09 10:58:18 +00:00
Chandler Carruth	ef5dcf571e	[x86] Initialize a pointer to null to fix a bug in r212602. This should restore GCC hosts (which happen to put the bad stuff into the pointer) and MSan, etc. llvm-svn: 212606	2014-07-09 10:36:42 +00:00
Chandler Carruth	2ebc942683	[x86] Re-apply a variant of the x86 side of r212324 now that the rest has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. llvm-svn: 212602	2014-07-09 10:06:58 +00:00
Chandler Carruth	142e966261	[x86,SDAG] Sink the logic for folding shuffles of splats more aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could make N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517	2014-07-08 08:45:38 +00:00
Adam Nemet	79580db918	[X86] AVX512: Only allow k1-k7 as predicates to vpcmp* As destination k0 is allowed but not as predicate/writemask. I also modified the test to allow checking of error messages by the assembler. I applied a similar approach to the test ret.s in the same directory. llvm-svn: 212504	2014-07-08 00:22:32 +00:00
Andrea Di Biagio	2620b877b6	[x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types. When combining a sequence of two PSHUFD dag nodes into a single PSHUFD, make sure that we assign the correct type to the resulting PSHUFD. X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32. Before this change, an assertion failure was triggered in method 'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test below into a single PSHUFD. define <4 x float> @test1(<4 x float> %V) { %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> ret <4 x float> %2 } llvm-svn: 212498	2014-07-07 23:25:23 +00:00
Juergen Ributzka	665ea71fcd	[FastISel][X86] Fix smul.with.overflow.i8 lowering. Add custom lowering code for signed multiply instruction selection, because the default FastISel instruction selection for ISD::MUL will use unsigned multiply for the i8 type and signed multiply for all other types. This would set the incorrect flags for the overflow check. This fixes <rdar://problem/17549300> llvm-svn: 212493	2014-07-07 21:52:21 +00:00
Chandler Carruth	beeacac0b3	[x86] Revert r212324 which was too aggressive w.r.t. allowing undef lanes in vector splats. The core problem here is that undef lanes can't unilaterally be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is dramatically improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475	2014-07-07 19:03:32 +00:00
Tim Northover	3705283b24	X86: revert unintentional change to X86FastISel. This crept in with r212443. llvm-svn: 212459	2014-07-07 14:06:42 +00:00
Evgeniy Stepanov	6fa6c677cc	[asan] Generate asm instrumentation in MC. Generate entire ASan asm instrumentation in MC without relying on runtime helper functions. Patch by Yuri Gorshenin. llvm-svn: 212455	2014-07-07 13:57:37 +00:00
Chandler Carruth	0dcb366268	[x86] Teach the new vector shuffle lowering code to handle what is essentially a DAG combine that never gets a chance to run. We might typically expect DAG combining to remove shuffles-of-splats and other similar patterns, but we don't get a chance to run the DAG combiner when we recursively form sub-shuffles during the lowering of a shuffle. So instead hand-roll a really important combine directly into the lowering code to detect shuffles-of-splats, especially shuffles of an all-zero splat which needn't even have the same element width, etc. This lets the new vector shuffle lowering handle shuffles which implement things like zero-extension really nicely. This will become even more important when I wire the legalization of zero-extension to vector shuffles with the new widening legalization strategy. llvm-svn: 212444	2014-07-07 09:06:58 +00:00
Tim Northover	55beb64bd0	CodeGen: it turns out that NAND is not the same thing as BIC. At all. We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443	2014-07-07 09:06:35 +00:00
Ehsan Akhgari	4103da6bfb	Add support for parsing the not operator in Microsoft inline assembly This fixes http://llvm.org/PR20202 llvm-svn: 212352	2014-07-04 19:13:05 +00:00
Chandler Carruth	5d79bb5d32	[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is dramatically improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324	2014-07-04 08:11:49 +00:00
Alexey Volkov	302309f39f	[X86] Limit maximum nop length on Silvermont Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes. Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty. Maximum nop length is limited to 7 bytes when used for padding on Silvermont. For other x86 processors max nop length remains unchanged 15 bytes. Differential Revision: http://reviews.llvm.org/D4374 llvm-svn: 212321	2014-07-04 07:14:56 +00:00
Chandler Carruth	19cff8205e	[x86] Clarify that this lowering only applies to vectors and is only used when we have SSE2. llvm-svn: 212300	2014-07-03 22:57:44 +00:00
Andrea Di Biagio	c8e8bda58f	[CostModel][x86] Improved cost model for alternate shuffles. This patch: 1) Improves the cost model for x86 alternate shuffles (originally added at revision 211339); 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles. Alternate shuffles are a special kind of blend; on x86, we can often easily lowered alternate shuffled into single blend instruction (depending on the subtarget features). The existing cost model didn't take into account subtarget features. Also, it had a couple of "dead" entries for vector types that are never legal (example: on x86 types v2i32 and v2f32 are not legal; those are always either promoted or widened to 128-bit vector types). The new x86 cost model takes into account what target features we have before returning the shuffle cost (i.e. the number of instructions after the blend is lowered/expanded). This patch also teaches the Cost Model Analysis how to identify and analyze alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions): - added function 'isAlternateVectorMask'; - added some logic to check if an instruction is a alternate shuffle and, in case, call the target specific TTI to get the corresponding shuffle cost; - added a test to verify the cost model analysis on alternate shuffles. llvm-svn: 212296	2014-07-03 22:24:18 +00:00
Andrea Di Biagio	a37a2fc81f	[X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes. This patch adds tablegen patterns to select F16C float-to-half-float conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes. If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32' are expanded into library calls. llvm-svn: 212293	2014-07-03 21:51:06 +00:00
Chandler Carruth	739b6ada99	[x86] Fix crashes in lowering bitcast instructions with the widening mode. This also runs the test in that mode which would reproduce the crash. What I love is that every single FIXME in the test is addressed by switching to widening. llvm-svn: 212254	2014-07-03 03:43:47 +00:00
Chandler Carruth	49a8b10d82	[x86] Based on a long conversation between myself, Jim Grosbach, Hal Finkel, Eric Christopher, and a bunch of other people I'm probably forgetting (sorry), add an option to the x86 backend to widen vectors during type legalization rather than promote them. This still would promote vNi1 vectors to get the masks right, but would widen other vectors. A lot of experiments are piling up right now showing that widening should probably be the default legalization strategy outside of vNi1 cases, but it is very hard to test the rammifications of that and fix bugs in widening-based legalization without an option that enables it. I'll be checking in tests shortly that use this option to exercise cases where widening doesn't work well and hopefully we'll be able to switch fully to this soon. llvm-svn: 212249	2014-07-03 02:11:29 +00:00
Adam Nemet	11dd5cf9f1	[X86] AVX512: Allow writemask argument in vpermt* intrinsics llvm-svn: 212223	2014-07-02 21:26:01 +00:00
Adam Nemet	efe9c98a16	[X86] AVX512: Generate Pat<>'s for the vpermt2* intrinsics via multiclass This new multiclass, avx512_perm_table_3src derives from the current one and provides the Pat<>. The next patch will add another Pat<> that uses the writemask. Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32 VR512:$src1) -> R512:$src1. I think that this should be fine (at least many intrinsic calls don't provide them) and it greatly reduces the number of template arguments. llvm-svn: 212222	2014-07-02 21:25:58 +00:00
Adam Nemet	2415a497b5	[X86] AVX512: Add writemask variants for vperm2 This includes assembler and codegen support (see the new tests in avx512-encodings.s and avx512-shuffle.ll). <rdar://problem/17492620> llvm-svn: 212221	2014-07-02 21:25:54 +00:00
Benjamin Kramer	e739cf3eb5	X86: When combining shuffles just remove shuffles that are completely redundant. CombineTo doesn't allow replacing a node with itself so this would crash if the combined shuffle is the same as the input shuffle. llvm-svn: 212181	2014-07-02 15:09:44 +00:00
Elena Demikhovsky	678bd5ba4a	AVX-512: dec/inc instructions are slow on KNL After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC. Added a test. llvm-svn: 212178	2014-07-02 14:11:05 +00:00
Tim Northover	334d8eebe5	X86: remove atomic instructions after we've iterated through them. Otherwise they get freed and the implicit "isa<XYZ>" tests following turn out badly (at least under sanitizers). Also corrects the ordering of unordered atomic stores. llvm-svn: 212136	2014-07-01 22:10:30 +00:00
Juergen Ributzka	3bd03c7099	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI. The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135	2014-07-01 22:01:54 +00:00
Tim Northover	df58625e3c	X86: delegate expanding atomic libcalls to generic code. On targets without cmpxchg16b or cmpxchg8b, the borderline atomic operations were slipping through the gaps. X86AtomicExpand.cpp was delegating to ISelLowering. Generic ISelLowering was delegating to X86ISelLowering and X86ISelLowering was asserting. The correct behaviour is to expand to a libcall, preferably in generic ISelLowering. This can be achieved by X86ISelLowering deciding it doesn't want the faff after all. llvm-svn: 212134	2014-07-01 21:44:59 +00:00
Tim Northover	277066ab43	X86: expand atomics in IR instead of as MachineInstrs. The logic for expanding atomics that aren't natively supported in terms of cmpxchg loops is much simpler to express at the IR level. It also allows the normal optimisations and CodeGen improvements to help out with atomics, instead of using a limited set of possible instructions.. rdar://problem/13496295 llvm-svn: 212119	2014-07-01 18:53:31 +00:00
Adam Nemet	16de2486cb	[X86] AVX512: Allow writemasks with vpcmp For now I only updated the _alt variants. The main variants are used by codegen and that will need a bit more work to trigger. <rdar://problem/17492620> llvm-svn: 212114	2014-07-01 18:03:45 +00:00
Adam Nemet	1efcb90fcd	[X86] AVX512: Factor generating the AsmString into avx512_icmp_cc Adding a writemask variant would require a third asm string to be passed to the template. Generate the AsmString in the template instead. No change in X86.td.expanded. llvm-svn: 212113	2014-07-01 18:03:43 +00:00
Reid Kleckner	b5dd9452b4	Fix .seh_stackalloc 0 seh_stackalloc 0 is not representable in Win64 SEH info, so emitting it is a bug. Reviewers: rnk Differential Revision: http://reviews.llvm.org/D4334 Patch by Vadim Chugunov! llvm-svn: 212081	2014-07-01 00:42:47 +00:00
Andrea Di Biagio	53b6830069	[X86] Add support for builtin to read performance monitoring counters. This patch adds support for a new builtin instruction called __builtin_ia32_rdpmc. Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can be used to read performance monitoring counters. It takes as input the index of the performance counter to read, and returns the value of the specified performance counter as a 64-bit number. Calls to this new builtin will map to instruction RDPMC. The index in input to the builtin call is moved to register %ECX. The result of the builtin call is the value of the specified performance counter (RDPMC would return that quantity in registers RDX:RAX). This patch: - Adds builtin int_x86_rdpmc as a GCCBuiltin; - Adds a new x86 DAG node called 'RDPMC_DAG'; - Teaches how to lower this new builtin; - Adds an ISel pattern to select instruction RDPMC; - Fixes the definition of instruction RDPMC adding %RAX and %RDX as implicit definitions, and adding %ECX as implicit use; - Adds a LLVM test to verify that the new builtin is correctly selected. llvm-svn: 212049	2014-06-30 17:14:21 +00:00
Saleem Abdulrasool	e3c3fe53eb	X86: fix comment Fix a comment typo `DbgLocLImport` instead of `DLLImport`. llvm-svn: 212012	2014-06-30 03:11:18 +00:00
Saleem Abdulrasool	67b548154e	CodeGen: rename Win64 ExceptionHandling to WinEH This exception format is not specific to Windows x64. A similar approach is taken on nearly all architectures. Generalise the name to reflect reality. This will eventually be used for Windows on ARM data emission as well. Switch the enum and namespace into an enum class. llvm-svn: 212000	2014-06-29 21:43:47 +00:00
Saleem Abdulrasool	7206a52522	MC: rename EmitWin64EH routines Rename the routines to reflect the reality that they are more related to call frame information than to Win64 EH. Although EH is implemented in an intertwined manner by augmenting with an exception handler and an associated parameter, the majority of these routines emit information required to unwind the frames. This also helps identify that these routines are generic for most windows platforms (they apply equally to nearly all architectures except x86) although the encoding of the information is architecture dependent. Unwinding data is emitted via EmitWinCFI* and exception handling information via EmitWinEH*. llvm-svn: 211994	2014-06-29 01:52:01 +00:00
Chandler Carruth	bd0717d7cc	[x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like lowering for v16i8. ASan and some bots caught this bug with existing test cases. Fixing it even fixed a miscompile with one of the test cases. I'm still a bit suspicious of this test case as I've not taken a proper amount of time to think about it, but the fix here is strict goodness. llvm-svn: 211976	2014-06-28 05:46:28 +00:00
Chandler Carruth	887c2c3482	[x86] Add handling for splat-like widenings of v16i8 shuffles. These show up really frequently, not the least with actual splats. =] We lowered these quite badly before. The new code path tries to widen i8 shuffles to i16 shuffles in a splat-like way. There are still some inefficiencies in our i16 splat logic though, so we aren't really done here. Also, for certain patterns (bit of a gather-and-splat) we still generate pretty silly code, and I've left a fixme for addressing it. However, I'm not actually worried about this code pattern as much. The old shuffle lowering generates a 29 instruction monstrosity for it that should execute much more slowly. llvm-svn: 211974	2014-06-28 05:16:40 +00:00
Chandler Carruth	a94ef908d9	[x86] Fix another bug hit when bootstrapping with the new shuffle lowering. For maximum irony, I had already discovered this bug, diagnosed it, and left FIXMEs about it in the test cases. =[ I just failed to go back over those until after i had reduced a bootstrap miscompile down to a single TU, stared at the assembly for an hour, and figured out the bug. Again. Oh well. llvm-svn: 211955	2014-06-27 20:07:40 +00:00

... 3 4 5 6 7 ...

10769 Commits