llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	bba72c7c1e	Don't prevent a vselect of constants from becoming a single load (PR20648). Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648 This patch checks the operands of a vselect to see if all values are constants. If yes, bail out of any further attempts to create a blend or shuffle because SelectionDAGLegalize knows how to turn this kind of vselect into a single load. This already happens for machines without SSE4.1, so the added checks just send more targets down that path. Differential Revision: http://reviews.llvm.org/D4934 llvm-svn: 216121	2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith	b18263531d	X86: Align the stack on word boundaries in LowerFormalArguments() The goal of the patch is to implement section 3.2.3 of the AMD64 ABI correctly. The controlling sentence is, "The size of each argument gets rounded up to eightbytes. Therefore the stack will always be eightbyte aligned." The equivalent sentence in the i386 ABI page 37 says, "At all times, the stack pointer should point to a word-aligned area." For both architectures, the stack pointer is not being rounded up to the nearest eightbyte or word between the last normal argument and the first variadic argument. Patch by Thomas Jablin! llvm-svn: 216119	2014-08-20 19:40:59 +00:00
Keno Fischer	d750723d29	Do not insert a tail call when returning multiple values on X86 Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530. The problem is that X86ISelLowering erroneously thought the third call was eligible for tail call elimination. It would have been if it's return value was actually the one returned by the calling function, but here that is not the case and additional values are being returned. Test Plan: Test case from the original bug report is included. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D4968 llvm-svn: 216117	2014-08-20 19:00:37 +00:00
Elena Demikhovsky	34d2d76d25	AVX-512: Fixed a bug in emitting compare for MVT:i1 type. Added a test. llvm-svn: 215889	2014-08-18 11:59:06 +00:00
Elena Demikhovsky	67d9500444	Reverted last commit llvm-svn: 215828	2014-08-17 09:39:48 +00:00
Elena Demikhovsky	2bb991a0c5	Added a table for intrinsics on X86. It should remove dosens of lines in handling instrinsics (in a huge switch) and give an easy way to add new intrinsics. I did not completed to move al intrnsics to the table, I'll do this in the upcomming commits. llvm-svn: 215826	2014-08-17 09:00:20 +00:00
Chandler Carruth	5a37dd6ff4	[x86] Fix an indentation goof in a prior commit. Should have re-run clang-format. llvm-svn: 215824	2014-08-17 00:40:34 +00:00
Chandler Carruth	e020f117ce	[x86] Teach lots of the new vector shuffle lowering to use UNPCK instructions for blend operations at 128 bits. This was a serious hole in our prior blend lowering. llvm-svn: 215819	2014-08-16 09:42:15 +00:00
Reid Kleckner	a6b86bef4d	Fix the build with MSVC 2013 after new shuffle code MSVC gives this awesome diagnostic: ..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument ..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1' ..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl' Using an anonymous namespace makes the problem go away. llvm-svn: 215744	2014-08-15 18:03:58 +00:00
Chandler Carruth	03f456abbe	[x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions where applicable for blending. llvm-svn: 215737	2014-08-15 17:42:00 +00:00
Chandler Carruth	f88612581e	[x86] Add the initial skeleton of type-based dispatch for AVX vectors in the new shuffle lowering and an implementation for v4 shuffles. This allows us to handle non-half-crossing shuffles directly for v4 shuffles, both integer and floating point. This currently misses places where we could perform the blend via UNPCK instructions, but otherwise generates equally good or better code for the test cases included to the existing vector shuffle lowering. There are a few cases that are entertainingly better. ;] llvm-svn: 215702	2014-08-15 11:01:40 +00:00
Chandler Carruth	6649f53207	[x86] Remove the duplicated code for testing whether we can widen the elements of a shuffle mask and simplify how it works. No functionality changed now that the bug that was here has been fixed. llvm-svn: 215696	2014-08-15 07:41:57 +00:00
Chandler Carruth	17fd848bfa	[x86] Fix the very broken formation of vpunpck instructions in the target-specific shuffl DAG combines. We were recognizing the paired shuffles backwards. This code needs to be replaced anyways as we have the same functionality elsewhere, but I'll do the refactoring in a follow-up, this is the minimal fix to the behavior. In addition to fixing miscompiles with the new vector shuffle lowering, it also causes the canonicalization to kick in much better, selecting the smaller encoding variants in lots of places in the new AVX path. This still isn't quite ideal as we don't need both the shufpd and the punpck instructions, but that'll get fixed in a follow-up patch. llvm-svn: 215690	2014-08-15 03:54:49 +00:00
Chandler Carruth	372c143c2f	[x86] Fix PR20540 where the x86 shuffle DAG combiner had completely broken logic for merging shuffle masks in the face of SM_SentinelZero mask operands. While these are '-1' they don't mean 'undef' the way '-1' means in the pre-legalized shuffle masks. Instead, they mean that the shuffle operation is forcibly zeroing that lane. Reflect this and explicitly handle it in a bunch of places. In one place the effect is equivalent but much more clear. In the rest it was really weirdly broken. Also, rewrite the entire merging thing to be a more directy operation with a single loop and just doing math to map the indices through the various masks. Also add a bunch of asserts to try to make in extremely clear what the different masks can possibly look like. Finally, add some comments to clarify that we're merging shuffle masks up here rather than down as we do everywhere else, and thus the logic is quite confusing. Thanks to several different people for sending test cases, and for Robert Khasanov for an initial attempt at fixing. llvm-svn: 215687	2014-08-15 02:43:18 +00:00
Adam Nemet	6fa675754a	[AVX512] Switch FMA intrinsics to the masking version This does the renaming and updates the lowering logic. Part of <rdar://problem/17688758> llvm-svn: 215664	2014-08-14 17:13:30 +00:00
Adam Nemet	9b4f08c729	[X86] Break out logic to map FMA Intrinsic number to Opcode No functional change. Will be used to lower AVX512 masking FMA intrinsics. llvm-svn: 215663	2014-08-14 17:13:27 +00:00
Adam Nemet	6f31063dcf	[AVX512] Break out the logic to lower masking intrinsics No functional change. This will be used by the FMA intrinsic lowering as well and hopefully many more. llvm-svn: 215661	2014-08-14 17:13:24 +00:00
Chandler Carruth	a8311b3681	[x86] Begin stubbing out the AVX support in the new vector shuffle lowering scheme. Currently, this just directly bails to the fallback path of splitting the 256-bit vector into two 128-bit vectors, operating there, and then joining the results back together. While the results are far from perfect, they are shockingly good for what we're doing here. I'll be layering the rest of the functionality on top of this piece by piece and updating tests as I go. Note that 256-bit vectors in this mode are still somewhat WIP. While I think the code paths that I'm adding here are clean and good-to-go, there are still a lot of 128-bit assumptions that I'll need to stomp out as I march through the functional spread here. llvm-svn: 215637	2014-08-14 12:13:59 +00:00
Quentin Colombet	57fb040beb	[X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32. Found by code inspection. llvm-svn: 215604	2014-08-13 23:49:24 +00:00
Aaron Ballman	1013b6b60c	Silence a -Wparenthesis warning with these asserts. NFC. llvm-svn: 215537	2014-08-13 10:49:07 +00:00
Elena Demikhovsky	51bbd011c3	AVX-512: Fixed a bug in shufflevector lowering. PALIGNR instruction does not exist in AVX-512F set. Added a test. llvm-svn: 215526	2014-08-13 07:58:43 +00:00
Chandler Carruth	b7eda21bb0	[x86] Rewrite a core part of the new vector shuffle lowering to handle one pesky test case correctly. This test case caused the old code to infloop occilating between solving the low-half and the high-half. The 'side balancing' part of single-input v8 shuffle lowering didn't handle the one pattern which can cause it to occilate. Fortunately the fuzz testing found this case. Unfortuately it was terrible to handle. I'm really sorry for the amount and density of the code here, I'd love suggestions on how to simplify it. I feel like there must be a simpler form here, but after a lot of days I've not found it. This is the only one I've found that even works. I've added the one pesky test case along with some nice comments explaining the core problem that we have to solve here. So far this has survived approximately 32k test cases. More strenuous fuzzing commencing. llvm-svn: 215519	2014-08-13 01:25:45 +00:00
Adam Nemet	cee9d0a460	[AVX512] Handle valign masking intrinsic via C++ lowering I think that this will scale better in most cases than adding a Pat<> for each mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We can just lower to the SDNode and have the resulting DAG be matches by the DAG patterns. Alternatively (long term), we could keep the Pat<>s but generate them via the new AVX512_masking multiclass. The difficulty is that in order to formulate that we would have to concatenate DAGs. Currently this is only supported if the operators of the input DAGs are identical. llvm-svn: 215473	2014-08-12 21:13:12 +00:00
Sanjay Patel	8687f320e0	fixed typos llvm-svn: 215451	2014-08-12 16:00:06 +00:00
Hans Wennborg	21b20cb11a	Increase the size of these SmallVectors in X86ISelLowering.cpp. In a Clang bootstrap, their sizes were always 12, 16 and 16, respectively. llvm-svn: 215336	2014-08-11 02:21:22 +00:00
Sanjay Patel	b48d1cfa53	fixed typos llvm-svn: 215299	2014-08-09 22:23:02 +00:00
Patrik Hagglund	b0e86ec814	[pr19635] Revert most of r170537, and add new testcase. Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190	2014-08-08 08:21:19 +00:00
Alexander Kornienko	7151ad7762	Insert parens to avoid a warning: suggest parentheses around arithmetic in operand of '^' [-Wparentheses] llvm-svn: 215101	2014-08-07 12:09:34 +00:00
Chandler Carruth	4e8fcbd3fd	[x86] Fix another miscompile found through fuzz testing the new vector shuffle lowering. This is closely related to the previous one. Here we failed to use the source offset when swapping in the other case -- where we end up swapping the final shuffle. The cause of this bug is a bit different: I simply wasn't thinking about the fact that this mask is actually a slice of a wide mask and thus has numbers that need SourceOffset applied. Simple fix. Would be even more simple with an algorithm-y thing to use here, but correctness first. =] llvm-svn: 215095	2014-08-07 10:37:35 +00:00
Chandler Carruth	e206385e99	[x86] Fix another miscompile in the new vector shuffle lowering found via the fuzz tester. Here I missed an offset when round-tripping a value through a shuffle mask. I got it right 2 lines below. See a problem? I do. ;] I'll probably be adding a little "swap" algorithm which accepts a range and two values and swaps those values where they occur in the range. Don't really have a name for it, let me know if you do. llvm-svn: 215094	2014-08-07 10:14:27 +00:00
Chandler Carruth	78494364d1	[x86] Fix another miscompile in the new vector shuffle lowering found through the new fuzzer. This one is great: bad operator precedence led the modulus to happen at the wrong point. All the asserts didn't fire because there were usually the right values past the end of the 4 element region we were looking at. Probably could have gotten a crash here with ASan + fuzzing, but the correctness tests pinpointed this really nicely. llvm-svn: 215092	2014-08-07 09:45:02 +00:00
Pavel Chupin	f55eb450e5	[x32] Use ebp/esp as frame and stack pointer Summary: Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still require 64-bit register, so using 64-bit MachineFramePtr where required. X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing this issue here as well by making isTarget64BitLP64 false. Also mark hasReservedSpillSlot unreachable on X86. See inlined comments. Test Plan: Add one new simple test and upgrade 2 existing with x32 target case. Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4617 llvm-svn: 215091	2014-08-07 09:41:19 +00:00
Chandler Carruth	27046758de	[x86] Fix a miscompile in the new shuffle lowering found through the new fuzz testing. The function which tested for adjacency did what it said on the tin, but when I called it, I wanted it to do something more thorough: I wanted to know if the pairs of shuffle elements were adjacent and started at 0 mod 2. In one place I had the decency to try to test for this, but in the other it was completely skipped, miscompiling this test case. Fix this by making the helper actually do what I wanted it to do everywhere I called it (and removing the now redundant code in one place). I really dislike the name "canWidenShuffleElements" for this predicate. If anyone can come up with a better name, please let me know. The other name I thought about was "canWidenShuffleMask" but is it really widening the mask to reduce the number of lanes shuffled? I don't know. Naming things is hard. llvm-svn: 215089	2014-08-07 08:11:31 +00:00
Eric Christopher	b5217507c7	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988	2014-08-06 18:45:26 +00:00
Chandler Carruth	c3927cd8c9	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954	2014-08-06 10:16:36 +00:00
Chandler Carruth	8f23ba26d2	[x86] Switch to a formulation of a for loop that is much more obviously not corrupting the mask by mutating it more times than intended. No functionality changed (the results were non-overlapping so the old version "worked" but was non-obvious). llvm-svn: 214953	2014-08-06 10:16:33 +00:00
JF Bastien	ac8b66b32c	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934	2014-08-05 23:27:34 +00:00
Chandler Carruth	a746239be3	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
Adam Nemet	c04f3f9f73	[X86] Improve comments for r214888 A rebase somehow ate my comments. This restores them. llvm-svn: 214903	2014-08-05 17:58:49 +00:00
Adam Nemet	164b07fbfe	[X86] Add lowering to VALIGN This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these at the same time, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. llvm-svn: 214888	2014-08-05 17:22:59 +00:00
Adam Nemet	2f10cc699d	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887	2014-08-05 17:22:55 +00:00
Chandler Carruth	183771bd8e	[x86] Reformat some code I moved around in a prior commit but left poorly formatted. Sorry about that. llvm-svn: 214853	2014-08-05 10:35:30 +00:00
Chandler Carruth	947cef191d	[x86] Fix a crash and wrong-code bug in the new vector lowering all found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the used slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] llvm-svn: 214849	2014-08-05 08:19:21 +00:00
Eric Christopher	fc6de428c8	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838	2014-08-05 02:39:49 +00:00
Eric Christopher	d913448b38	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Chandler Carruth	0e2ddb2790	[x86] Just unilaterally prefer SSSE3-style PSHUFB lowerings over clever use of PACKUS. It's cleaner that way. I looked at implementing clever combine-based folding of PACKUS chains into PSHUFB but it is quite hard and doesn't seem likely to be worth it. The most annoying part would be detecting that the correct masking had been done to use PACKUS-style instructions as a blend operation rather than there being any saturating as is indicated by its name. We generate really nice code for what few test cases I've come up with that aren't completely contrived for this by just directly prefering PSHUFB and so let's go with that strategy for now. =] llvm-svn: 214707	2014-08-04 10:17:35 +00:00
Chandler Carruth	06e6f1cae2	[x86] Implement more aggressive use of PACKUS chains for lowering common patterns of v16i8 shuffles. This implements one of the more important FIXMEs for the SSE2 support in the new shuffle lowering. We now generate the optimal shuffle sequence for truncate-derived shuffles which show up essentially everywhere. Unfortunately, this exposes a weakness in other parts of the shuffle logic -- we can no longer form PSHUFB here. I'll add the necessary support for that and other things in a subsequent commit. llvm-svn: 214702	2014-08-04 09:40:02 +00:00
Chandler Carruth	37a18821cd	[x86] Handle single input shuffles in the SSSE3 case more intelligently. I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. llvm-svn: 214680	2014-08-04 01:14:24 +00:00
Chandler Carruth	16c13cad35	[x86] Remove the FIXME that was implemented in r214628. Managed to forget to update the comment here... =/ llvm-svn: 214630	2014-08-02 11:34:23 +00:00
Chandler Carruth	4c57955fe3	[x86] Largely complete the use of PSHUFB in the new vector shuffle lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628	2014-08-02 10:39:15 +00:00
Chandler Carruth	5219d4eff6	[x86] Fix a few typos in my comments spotted in passing. llvm-svn: 214626	2014-08-02 10:29:34 +00:00
Chandler Carruth	34f9a987e9	[x86] Teach the target shuffle mask extraction to recognize unary forms of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point outputs despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. llvm-svn: 214625	2014-08-02 10:27:38 +00:00
Akira Hatanaka	3516669a50	[X86] Simplify X87 stackifier pass. Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580	2014-08-01 22:19:41 +00:00
Louis Gerbarg	67474e3755	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449	2014-07-31 21:45:05 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Chandler Carruth	64a7c828cb	[x86] Sink a variable only used by asserts into the asserts. Should fix some -Werror bots, sorry for the noise. llvm-svn: 214043	2014-07-27 01:45:49 +00:00
Chandler Carruth	80c5bfd843	[x86] Add a much more powerful framework for combining x86 shuffle instructions in the legalized DAG, and leverage it to combine long sequences of instructions to PSHUFB. Eventually, the other x86-instruction-specific shuffle combines will probably all be driven out of this routine. But the real motivation is to detect after we have fully legalized and optimized a shuffle to the minimal number of x86 instructions whether it is profitable to replace the chain with a fully generic PSHUFB instruction even though doing so requires either a load from a constant pool or tying up a register with the mask. While the Intel manuals claim it should be used when it replaces 5 or more instructions (!!!!) my experience is that it is actually very fast on modern chips, and so I've gon with a much more aggressive model of replacing any sequence of 3 or more instructions. I've also taught it to do some basic canonicalization to special-purpose instructions which have smaller encodings than their generic counterparts. There are still quite a few FIXMEs here, and I've not yet implemented support for lowering blends with PSHUFB (where its power really shines due to being able to zero out lanes), but this starts implementing real PSHUFB support even when using the new, fancy shuffle lowering. =] llvm-svn: 214042	2014-07-27 01:15:58 +00:00
Chandler Carruth	5896698e2e	[x86] Fix PR20355 (for real). There are many layers to this bug. The tale starts with r212808 which attempted to fix inversion of the low and high bits when lowering MUL_LOHI. Sadly, that commit did not include any positive test cases, and just removed some operations from a test case where the actual logic being changed isn't fully visible from the test. What this commit did was two things. First, it reversed the low and high results in the formation of the MERGE_VALUES node for the multiple results. This is entirely correct. Second it changed the shuffles for extracting the low and high components from the i64 results of the multiplies to extract them assuming a big-endian-style encoding of the multiply results. This second change is wrong. There is no big-endian encoding in x86, the results of the multiplies are normal v2i64s: when cast to v4i32, the low i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3. However, the first change wasn't enough to actually fix the bug, which is (I assume) why the second change was also made. There was another bug in the MERGE_VALUES formation: we weren't using a VTList, and so were getting a single result node! When grabbing the second result from the node, we got... well.. colud be anything. I think this appeared to invert things, but had to be causing other problems as well. Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should have been fine, right? NOOOPE! Because the core bug was never addressed, the test in vector-idiv failed when I fixed the MERGE_VALUES node. Because there are essentially no docs for this node, I had to guess at how to fix it and tried swapping the operands, restoring the order of the original code before r212808. While this "fixed" the test case (in that we produced the write instructions) we were still extracting the wrong elements of the i64s, and thus PR20355 was still broken. This commit essentially reverts the big-endian-style extraction part of r212808 and goes back to the original masks which were correct. Now that the MERGE_VALUES node formation is also correct, everything works. I've also included a more detailed test from PR20355 to make sure this stays fixed. llvm-svn: 214011	2014-07-26 03:46:57 +00:00
Chandler Carruth	f6406ac5d6	[x86] Revert r214007: Fix PR20355 ... The clever way to implement signed multiplication with unsigned is already implemented and tested and working correctly. The bug is somewhere else. Re-investigating. This will teach me to not scroll far enough to read the code that did what I thought needed to be done. llvm-svn: 214009	2014-07-26 02:14:54 +00:00
Chandler Carruth	1bf4d19172	[x86] Fix PR20355 (and dups) by not using unsigned multiplication when signed multiplication is requested. While there is not a difference in the low half of the result, the high half (used specifically to implement the signed division by these constants) certainly is used. The test case I've nuked was actively asserting wrong code. There is a delightful solution to doing signed multiplication even when we don't have it that Richard Smith has crafted, but I'll add the machinery back and implement that in a follow-up patch. This at least restores correctness. llvm-svn: 214007	2014-07-26 01:52:13 +00:00
Akira Hatanaka	e5b6e0d231	[stack protector] Fix a potential security bug in stack protector where the address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967	2014-07-25 19:31:34 +00:00
Chandler Carruth	3de980d2ff	[SDAG] Enable the new assert for out-of-range result numbers in SDValues, fixing the two bugs left in the regression suite. The key for both of these was the use a single value type rather than a VTList which caused an unintentionally single-result merge-value node. Fix this by getting the appropriate VTList in place. Doing this exposed that the comments in x86's code abouth how MUL_LOHI operands are handle is wrong. The bug with the use of out-of-range result numbers was hiding the bug about the order of operands here (as best i can tell). There are more places where the code appears to get this backwards still... llvm-svn: 213931	2014-07-25 09:19:23 +00:00
Chandler Carruth	80b869461e	[x86] Make vector legalization of extloads work more like the "normal" vector operation legalization with support for custom target lowering and fallback to expand when it fails, and use this to implement sext and anyext load lowering for x86 in a more principled way. Previously, the x86 backend relied on a target DAG combine to "combine away" sextload and extload nodes prior to legalization, or would expand them during legalization with terrible code. This is particularly problematic because the DAG combine relies on running over non-canonical DAG nodes at just the right time to match several common and important patterns. It used a combine rather than lowering because we didn't have good lowering support, and to expose some tricks being employed to more combine phases. With this change it becomes a proper lowering operation, the backend marks that it can lower these nodes, and I've added support for handling the canonical forms that don't have direct legal representations such as sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases for this behavior continue to pass even after the DAG combiner beigns running more systematically over every node. There is some noise caused by this in the test suite where we actually use vector extends instead of subregister extraction. This doesn't really seem like the right thing to do, but is unlikely to be a critical regression. We do regress in one case where by lowering to the target-specific patterns early we were able to combine away extraneous legal math nodes. However, this regression is completely addressed by switching to a widening based legalization which is what I'm working toward anyways, so I've just switched the test to that mode. Differential Revision: http://reviews.llvm.org/D4654 llvm-svn: 213897	2014-07-24 22:09:56 +00:00
Reid Kleckner	9a412d13c1	Replace an assertion with a fatal error Frontends are responsible for putting inalloca on parameters that would be passed in memory and not registers. llvm-svn: 213891	2014-07-24 19:53:33 +00:00
Saleem Abdulrasool	34610e33ae	X86: silence sign comparison warning GCC 4.8 detected a signed compare [-Wsign-compare]. Add a cast for the destination index. Add an assert to catch a potential overflow however unlikely it may be. llvm-svn: 213878	2014-07-24 17:12:06 +00:00
Filipe Cabecinhas	933cccf3fa	Fixed PR20411 - bug in getINSERTPS() When we had a vector_shuffle where we had an input from each vector, we could miscompile it because we were assuming the input from V2 wouldn't be moved from where it was on the vector. Added a test case. llvm-svn: 213826	2014-07-24 01:28:21 +00:00
Jim Grosbach	724e438c62	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants. The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800	2014-07-23 20:41:43 +00:00
Jim Grosbach	8f6f0858ec	X86: restrict combine to when type sizes are safe. The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799	2014-07-23 20:41:38 +00:00
Robert Khasanov	74acbb7767	[SKX] Enabling mask instructions: encoding, lowering KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757	2014-07-23 14:49:42 +00:00
Andrea Di Biagio	842355e900	Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736	2014-07-23 11:20:24 +00:00
Andrea Di Biagio	4d8bd41600	[DAG] Refactor some logic. No functional change. This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'. This refactoring is in preperation of an upcoming change to the DAGCombiner. llvm-svn: 213503	2014-07-21 07:28:51 +00:00
Tim Northover	871de902af	X86: support fpext/fptrunc operations to and from 16-bit floats. llvm-svn: 213374	2014-07-18 13:01:25 +00:00
Jim Grosbach	b6535c32f5	X86: Constant fold converting vector setcc results to float. Since the result of a SETCC for X86 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the SSE code is generated as: LCPI0_0: .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 retq After, the code is improved to: LCPI0_0: .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 retq The cvtdq2ps has been constant folded away and the floating point 1.0f vector lanes are materialized directly via the ModRM operand of andps. llvm-svn: 213342	2014-07-18 00:40:56 +00:00
Tim Northover	84ce0a642e	CodeGen: generate single libcall for fptrunc -> f16 operations. Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252	2014-07-17 11:12:12 +00:00
Tim Northover	2131044814	X86: support double extension of f16 type. x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. llvm-svn: 213251	2014-07-17 11:04:04 +00:00
Tim Northover	fd7e424935	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Tim Northover	7f3e11e7c0	CodeGen: don't form illegail EXTLOAD operations. It turns out that in most cases (the main exception being i1-related types) once these operations are formed we cannot separate them and the targets end up having to deal with them whether they want to or not. This is not a good situation, and a more reasonable default can be formed by ackowledging this and having targets leave them as Legal. Only x86 seems to be affected (other targets don't even try marking the operation Expand). Mostly there's no visible change here yet, but it will be useful to have truly expanded EXTLOADS for MVT::f16 softening support. llvm-svn: 213162	2014-07-16 15:37:24 +00:00
Andrea Di Biagio	a03624d8ab	[X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'. Before this change, method 'isShuffleMaskLegal' didn't know that shuffles implementing a 'movhlps' operation were perfectly legal for SSE targets. This patch adds the missing check for 'isMOVHLPSMask' inside method 'isShuffleMaskLegal' to fix the problem. The reason why it is important to do this is because the DAGCombiner conservatively avoids combining a pair of shuffles if the resulting shuffle node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were wrongly considered not to be legal. This was the root cause of some poor-code generation bugs. llvm-svn: 213137	2014-07-16 11:29:39 +00:00
David Majnemer	d4d9944416	Fix typo in comment No functionality changed. llvm-svn: 213052	2014-07-15 07:11:32 +00:00
Quentin Colombet	0f179c4d8a	[X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI. Also add a few comments. <rdar://problem/17581756> llvm-svn: 212808	2014-07-11 12:08:23 +00:00
Chandler Carruth	df8d0caab7	[x86] Add another combine that is particularly useful for the new vector shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. llvm-svn: 212705	2014-07-10 11:09:29 +00:00
Chandler Carruth	853fa0ac8d	[x86] Expand the target DAG combining for PSHUFD nodes to be able to combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be really nice. llvm-svn: 212695	2014-07-10 09:57:36 +00:00
Chandler Carruth	a34a8e230d	[x86] Tweak the v16i8 single input special case lowering for shuffles that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. llvm-svn: 212692	2014-07-10 09:16:40 +00:00
Chandler Carruth	7d2ffb5492	[x86] Initial improvements to the new shuffle lowering for v16i8 shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680	2014-07-10 04:34:06 +00:00
Chandler Carruth	b3840a55ae	[x86] Refactor some of the new code for lowering v16i8 shuffles to remove duplication and make it easier to select different strategies. No functionality changed. llvm-svn: 212674	2014-07-10 02:24:26 +00:00
Chandler Carruth	d3561f6fec	[SDAG] Make the new zext-vector-inreg node default to expand so targets don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661	2014-07-09 22:53:04 +00:00
Benjamin Kramer	d6f1733add	X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2. Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611	2014-07-09 11:12:39 +00:00
Chandler Carruth	afe4b2507e	[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610	2014-07-09 10:58:18 +00:00
Chandler Carruth	ef5dcf571e	[x86] Initialize a pointer to null to fix a bug in r212602. This should restore GCC hosts (which happen to put the bad stuff into the pointer) and MSan, etc. llvm-svn: 212606	2014-07-09 10:36:42 +00:00
Chandler Carruth	2ebc942683	[x86] Re-apply a variant of the x86 side of r212324 now that the rest has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. llvm-svn: 212602	2014-07-09 10:06:58 +00:00
Chandler Carruth	142e966261	[x86,SDAG] Sink the logic for folding shuffles of splats more aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could make N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517	2014-07-08 08:45:38 +00:00
Andrea Di Biagio	2620b877b6	[x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types. When combining a sequence of two PSHUFD dag nodes into a single PSHUFD, make sure that we assign the correct type to the resulting PSHUFD. X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32. Before this change, an assertion failure was triggered in method 'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test below into a single PSHUFD. define <4 x float> @test1(<4 x float> %V) { %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> ret <4 x float> %2 } llvm-svn: 212498	2014-07-07 23:25:23 +00:00
Chandler Carruth	beeacac0b3	[x86] Revert r212324 which was too aggressive w.r.t. allowing undef lanes in vector splats. The core problem here is that undef lanes can't unilaterally be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is dramatically improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475	2014-07-07 19:03:32 +00:00
Chandler Carruth	0dcb366268	[x86] Teach the new vector shuffle lowering code to handle what is essentially a DAG combine that never gets a chance to run. We might typically expect DAG combining to remove shuffles-of-splats and other similar patterns, but we don't get a chance to run the DAG combiner when we recursively form sub-shuffles during the lowering of a shuffle. So instead hand-roll a really important combine directly into the lowering code to detect shuffles-of-splats, especially shuffles of an all-zero splat which needn't even have the same element width, etc. This lets the new vector shuffle lowering handle shuffles which implement things like zero-extension really nicely. This will become even more important when I wire the legalization of zero-extension to vector shuffles with the new widening legalization strategy. llvm-svn: 212444	2014-07-07 09:06:58 +00:00
Chandler Carruth	5d79bb5d32	[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is dramatically improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324	2014-07-04 08:11:49 +00:00
Chandler Carruth	19cff8205e	[x86] Clarify that this lowering only applies to vectors and is only used when we have SSE2. llvm-svn: 212300	2014-07-03 22:57:44 +00:00
Andrea Di Biagio	a37a2fc81f	[X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes. This patch adds tablegen patterns to select F16C float-to-half-float conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes. If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32' are expanded into library calls. llvm-svn: 212293	2014-07-03 21:51:06 +00:00
Chandler Carruth	739b6ada99	[x86] Fix crashes in lowering bitcast instructions with the widening mode. This also runs the test in that mode which would reproduce the crash. What I love is that every single FIXME in the test is addressed by switching to widening. llvm-svn: 212254	2014-07-03 03:43:47 +00:00
Chandler Carruth	49a8b10d82	[x86] Based on a long conversation between myself, Jim Grosbach, Hal Finkel, Eric Christopher, and a bunch of other people I'm probably forgetting (sorry), add an option to the x86 backend to widen vectors during type legalization rather than promote them. This still would promote vNi1 vectors to get the masks right, but would widen other vectors. A lot of experiments are piling up right now showing that widening should probably be the default legalization strategy outside of vNi1 cases, but it is very hard to test the rammifications of that and fix bugs in widening-based legalization without an option that enables it. I'll be checking in tests shortly that use this option to exercise cases where widening doesn't work well and hopefully we'll be able to switch fully to this soon. llvm-svn: 212249	2014-07-03 02:11:29 +00:00
Benjamin Kramer	e739cf3eb5	X86: When combining shuffles just remove shuffles that are completely redundant. CombineTo doesn't allow replacing a node with itself so this would crash if the combined shuffle is the same as the input shuffle. llvm-svn: 212181	2014-07-02 15:09:44 +00:00
Juergen Ributzka	3bd03c7099	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI. The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135	2014-07-01 22:01:54 +00:00
Tim Northover	df58625e3c	X86: delegate expanding atomic libcalls to generic code. On targets without cmpxchg16b or cmpxchg8b, the borderline atomic operations were slipping through the gaps. X86AtomicExpand.cpp was delegating to ISelLowering. Generic ISelLowering was delegating to X86ISelLowering and X86ISelLowering was asserting. The correct behaviour is to expand to a libcall, preferably in generic ISelLowering. This can be achieved by X86ISelLowering deciding it doesn't want the faff after all. llvm-svn: 212134	2014-07-01 21:44:59 +00:00
Tim Northover	277066ab43	X86: expand atomics in IR instead of as MachineInstrs. The logic for expanding atomics that aren't natively supported in terms of cmpxchg loops is much simpler to express at the IR level. It also allows the normal optimisations and CodeGen improvements to help out with atomics, instead of using a limited set of possible instructions.. rdar://problem/13496295 llvm-svn: 212119	2014-07-01 18:53:31 +00:00
Andrea Di Biagio	53b6830069	[X86] Add support for builtin to read performance monitoring counters. This patch adds support for a new builtin instruction called __builtin_ia32_rdpmc. Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can be used to read performance monitoring counters. It takes as input the index of the performance counter to read, and returns the value of the specified performance counter as a 64-bit number. Calls to this new builtin will map to instruction RDPMC. The index in input to the builtin call is moved to register %ECX. The result of the builtin call is the value of the specified performance counter (RDPMC would return that quantity in registers RDX:RAX). This patch: - Adds builtin int_x86_rdpmc as a GCCBuiltin; - Adds a new x86 DAG node called 'RDPMC_DAG'; - Teaches how to lower this new builtin; - Adds an ISel pattern to select instruction RDPMC; - Fixes the definition of instruction RDPMC adding %RAX and %RDX as implicit definitions, and adding %ECX as implicit use; - Adds a LLVM test to verify that the new builtin is correctly selected. llvm-svn: 212049	2014-06-30 17:14:21 +00:00
Chandler Carruth	bd0717d7cc	[x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like lowering for v16i8. ASan and some bots caught this bug with existing test cases. Fixing it even fixed a miscompile with one of the test cases. I'm still a bit suspicious of this test case as I've not taken a proper amount of time to think about it, but the fix here is strict goodness. llvm-svn: 211976	2014-06-28 05:46:28 +00:00
Chandler Carruth	887c2c3482	[x86] Add handling for splat-like widenings of v16i8 shuffles. These show up really frequently, not the least with actual splats. =] We lowered these quite badly before. The new code path tries to widen i8 shuffles to i16 shuffles in a splat-like way. There are still some inefficiencies in our i16 splat logic though, so we aren't really done here. Also, for certain patterns (bit of a gather-and-splat) we still generate pretty silly code, and I've left a fixme for addressing it. However, I'm not actually worried about this code pattern as much. The old shuffle lowering generates a 29 instruction monstrosity for it that should execute much more slowly. llvm-svn: 211974	2014-06-28 05:16:40 +00:00
Chandler Carruth	a94ef908d9	[x86] Fix another bug hit when bootstrapping with the new shuffle lowering. For maximum irony, I had already discovered this bug, diagnosed it, and left FIXMEs about it in the test cases. =[ I just failed to go back over those until after i had reduced a bootstrap miscompile down to a single TU, stared at the assembly for an hour, and figured out the bug. Again. Oh well. llvm-svn: 211955	2014-06-27 20:07:40 +00:00
Chandler Carruth	dd6470a9dd	[x86] Fix a miscompile in the new shuffle lowering uncovered by a bootstrap. I managed to mis-remember how PACKUS worked on x86, and was using undef for the high bytes instead of zero. The fix is fairly obvious. llvm-svn: 211922	2014-06-27 18:25:23 +00:00
Alexander Kornienko	b673b4b187	Clean up unused variable warning in release build. llvm-svn: 211902	2014-06-27 15:30:55 +00:00
Chandler Carruth	ed4a0bc734	[x86] Clean up some unused variables, especially in release builds. llvm-svn: 211894	2014-06-27 12:04:18 +00:00
Chandler Carruth	688001f042	[x86] Teach the target combine step to aggressively fold pshufd insturcions. Summary: This allows it to fold pshufd instructions across intervening half-shuffles and other noise. This pattern actually shows up in the generic lowering tests, but I've also added direct tests using intrinsics to make sure that the specific desired functionality is working even if the lowering stuff changes in the future. Differential Revision: http://reviews.llvm.org/D4292 llvm-svn: 211892	2014-06-27 11:40:13 +00:00
Chandler Carruth	0d6d1f2b17	[x86] Teach the target-specific combining how to aggressively fold half-shuffles, even looking through intervening instructions in a chain. Summary: This doesn't happen to show up with any test cases I've found for the current shuffle lowering, but previous attempts would benefit from this and it seems generally useful. I've tested it directly using intrinsics, which also shows that it will work with hand vectorized code as well. Note that even though pshufd isn't directly used in these tests, it gets exercised because we combine some of the half shuffles into a pshufd first, and then merge them. Differential Revision: http://reviews.llvm.org/D4291 llvm-svn: 211890	2014-06-27 11:34:40 +00:00
Chandler Carruth	97ebc2362c	[x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are trivially redundant. This fixes several cases in the new vector shuffle lowering algorithm which would generate redundant shuffle instructions for the sake of simplicity. I'm also deleting a testcase which was somewhat ridiculous. It was checking for a bug in 2007 about incorrectly transforming shuffles by looking for the string "-86" in the output of a pretty substantial function. This test case doesn't seem to have any value at this point. Differential Revision: http://reviews.llvm.org/D4240 llvm-svn: 211889	2014-06-27 11:27:52 +00:00
Chandler Carruth	83860cfcfa	[x86] Begin a significant overhaul of how vector lowering is done in the x86 backend. This sketches out a new code path for vector lowering, hidden behind an off-by-default flag while it is under development. The fundamental idea behind the new code path is to aggressively break down the problem space in ways that ease selecting the odd set of instructions available on x86, and carefully avoid scalarizing code even when forced to use older ISAs. Notably, this starts off restricting itself to SSE2 and implements the complete vector shuffle and blend space for 128-bit vectors in SSE2 without scalarizing. The plan is to layer on top of this ISA extensions where we can bail out of the complex SSE2 lowering and opt for a cheaper, specialized instruction (or set of instructions). It also needs to be generalized to AVX and AVX512 vector widths. Currently, this does a decent but not perfect job for SSE2. There are some specific shortcomings that I plan to address: - We need a peephole combine to fold together shuffles where possible. There are cases where a previous shuffle could be modified slightly to arrange for elements to be in the correct position and a later shuffle eliminated. Doing this eagerly added quite a bit of complexity, and so my plan is to combine away these redundancies afterward. - There are a lot more clever ways to use unpck and pack that need to be added. This is essential for real world shuffles as it turns out... Once SSE2 is polished a bit I should be able to get interesting numbers on performance improvements on benchmarks conducive to vectorization. All of this will be off by default until it is functionally equivalent of course. Differential Revision: http://reviews.llvm.org/D4225 llvm-svn: 211888	2014-06-27 11:23:44 +00:00
Andrea Di Biagio	1ee38843ac	Silence a warning due to a comparison between signed and unsigned. No functional change intended. llvm-svn: 211782	2014-06-26 13:41:10 +00:00
Andrea Di Biagio	7fb85256bc	[X86] Improve the selection of SSE3/AVX addsub instructions. This patch teaches the backend how to canonicalize a shuffle vectors according to the rule: - (shuffle (FADD A, B), (FSUB A, B), Mask) -> (shuffle (FSUB A, -B), (FADD A, -B), Mask) Where 'Mask' is: <0,5,2,7> ;; for v4f32 and v4f64 shuffles. <0,3> ;; for v2f64 shuffles. <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles. In general, ISel only knows how to pattern-match a canonical 'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction. This new rule allows to convert a non-canonical dag sequence into a canonical one that will be matched by a single ADDSUB at ISel stage. The idea of converting a non-canonical ADDSUB into a canonical one by swapping the first two operands of the shuffle, and then negating the second operand of the FADD and FSUB, was originally proposed by Hal Finkel. llvm-svn: 211771	2014-06-26 10:45:21 +00:00
Andrea Di Biagio	07cdffc324	[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128). This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720	2014-06-25 17:41:58 +00:00
Chandler Carruth	e5724d7532	[x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions. llvm-svn: 211694	2014-06-25 13:12:54 +00:00
NAKAMURA Takumi	1db5995d14	Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter. -- This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211691	2014-06-25 12:41:52 +00:00
Andrea Di Biagio	6d9b9e125d	[X86] Add target combine rule to select ADDSUB instructions from a build_vector This patch teaches the backend how to combine a build_vector that implements an 'addsub' between packed float vectors into a sequence of vector add and vector sub followed by a VSELECT. The new VSELECT is expected to be lowered into a BLENDI. At ISel stage, the sequence 'vector add + vector sub + BLENDI' is pattern-matched against ISel patterns added at r211427 to select 'addsub' instructions. Added three more ISel patterns for ADDSUB. Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub' instructions. llvm-svn: 211679	2014-06-25 10:02:21 +00:00
Robert Khasanov	21c836823f	vpblend intrinsics combines as shifts intrinsics due to absence return stmt between them Fix PR20088 Differential Revision: http://reviews.llvm.org/D4277 llvm-svn: 211617	2014-06-24 18:08:04 +00:00
NAKAMURA Takumi	d77cefe633	Revert r211399, "Generate native unwind info on Win64" It broke Legacy JIT Tests on x86_64-{mingw32\|msvc}, aka Windows x64. llvm-svn: 211480	2014-06-22 22:00:56 +00:00
Filipe Cabecinhas	1af2dfd274	Fix PR20087 by using the source index when changing the vector load llvm-svn: 211472	2014-06-22 17:21:37 +00:00
Reid Kleckner	4a01230db4	Generate native unwind info on Win64 This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211399	2014-06-20 20:35:47 +00:00
Chandler Carruth	8366cebeb5	[x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDW instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332	2014-06-20 01:05:28 +00:00
Andrea Di Biagio	54b0949af9	[X86] Teach how to combine horizontal binop even in the presence of undefs. Before this change, the backend was unable to fold a build_vector dag node with UNDEF operands into a single horizontal add/sub. This patch teaches how to combine a build_vector with UNDEF operands into a horizontal add/sub when possible. The algorithm conservatively avoids to combine a build_vector with only a single non-UNDEF operand. Added test haddsub-undef.ll to verify that we correctly fold horizontal binop even in the presence of UNDEFs. llvm-svn: 211265	2014-06-19 10:29:41 +00:00
Cameron McInally	0d0489cea6	Hook up vector int_ctlz for AVX512. llvm-svn: 211024	2014-06-16 14:12:28 +00:00
Tim Northover	51472bc600	X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly Lowering this new node allows us to fold the almost universal comparison for success before it's even formed. Instead we can create a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg" instructions set the zero-flag to the correct value. rdar://problem/13201607 llvm-svn: 210923	2014-06-13 17:29:39 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Andrea Di Biagio	2dd3b3b674	[X86] Teach how to dump the name of target node RDTSCP_DAG. When I originally added node RDTSCP_DAG (r207127) I forgot to add a string name for it in method 'getTargetNodeName'. No functional change intended. llvm-svn: 210769	2014-06-12 11:37:24 +00:00
Andrea Di Biagio	972ff97f8c	[X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors. This patch adds target combine rules to match: - [AVX] Horizontal add/sub of packed single/double precision floating point values from 256-bit vectors; - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors. llvm-svn: 210761	2014-06-12 10:53:48 +00:00
Tim Northover	4dc9eaa6ba	X86: add stringy name for X86ISD::LCMPXCHG16_DAG I don't know what "target specific node #383" is, and I don't want to have to. llvm-svn: 210663	2014-06-11 17:04:08 +00:00
Andrea Di Biagio	c7af75f9a7	[X86] Refactor the logic to select horizontal adds/subs to a helper function. This patch moves part of the logic implemented by the target specific combine rules added at r210477 to a separate helper function. This should make easier to add more rules for matching AVX/AVX2 horizontal adds/subs. This patch also fixes a problem caused by a wrong check performed on indices of extract_vector_elt dag nodes in input to the scalar adds/subs. New tests have been added to verify that we correctly check indices of extract_vector_elt dag nodes when selecting a horizontal operation. llvm-svn: 210644	2014-06-11 07:57:50 +00:00
Eric Christopher	68d7559e97	Use the TargetMachine on the DAG or the MachineFunction instead of using the cached TargetMachine. llvm-svn: 210589	2014-06-10 21:25:13 +00:00
Eric Christopher	19b1d73e88	Add a FIXME. llvm-svn: 210559	2014-06-10 18:31:18 +00:00
Andrea Di Biagio	fa508af0fe	[X86] Improved target combine rules for selecting horizontal add/sub. This patch slightly changes the algorithm introduced at revision 210477 to fix a problem where the algorithm was producing incorrect code for the VEX.256 encoded versions of horizontal add/sub. For these cases, we now try to split the two 256-bit vectors into 128-bit chunks before emitting horizontal add/sub dag nodes. Added a new test case into haddsub-2.ll. llvm-svn: 210545	2014-06-10 16:42:57 +00:00
Tom Stellard	3787b12255	SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CC The SelectionDAG bad a special case for ISD::SELECT_CC, where it would allow targets to specify: setOperationAction(ISD::SELECT_CC, MVT::Other, Expand); to indicate that they wanted to expand ISD::SELECT_CC for all types. This wasn't applied correctly everywhere, and it makes writing new DAG patterns with ISD::SELECT_CC difficult. llvm-svn: 210541	2014-06-10 16:01:29 +00:00
Tim Northover	7b9f86da5d	Revert "X86: elide comparisons after cmpxchg instructions." This reverts commit r210523. It was committed prematurely without waiting for review. llvm-svn: 210524	2014-06-10 10:50:11 +00:00
Tim Northover	84ad29ca1f	X86: elide comparisons after cmpxchg instructions. The C++ and C semantics of the compare_and_swap operations actually require us to return a boolean "success" value. In LLVM terms this means a second comparison of the output of "cmpxchg" against the input desired value. However, x86's "cmpxchg" instruction sets all flags for the comparison formed, so we can skip any secondary comparison. (N.b. this isn't true for cmpxchg8b/16b, which only set ZF). rdar://problem/13201607 llvm-svn: 210523	2014-06-10 10:49:07 +00:00
Eric Christopher	a08f30bd40	Move all of the x86 subtarget initialized variables down into the x86 subtarget from the x86 target machine. Should be no functional change. llvm-svn: 210479	2014-06-09 17:08:19 +00:00
Andrea Di Biagio	f99dd64f0a	[X86] Add target combine rules for horizontal add/sub. This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477	2014-06-09 16:54:41 +00:00
Andrea Di Biagio	dfbdc71ea1	[X86] Avoid emitting unnecessary test instructions. This patch teaches the backend how to check for the 'NoSignedWrap' flag on binary operations to improve the emission of 'test' instructions. If the result of a binary operation is known not to overflow we know that resetting the Overflow flag is unnecessary and so we can avoid emitting the test instruction. Patch by Marcello Maggioni. llvm-svn: 210468	2014-06-09 12:34:50 +00:00
Alexey Volkov	5260dba323	[X86] Use ADD/SUB instead of INC/DEC for Silvermont According to Intel Software Optimization Manual on Silvermont INC or DEC instructions require an additional uop to merge the flags. As a result, a branch instruction depending on an INC or a DEC instruction incurs a 1 cycle penalty. Differential Revision: http://reviews.llvm.org/D3990 llvm-svn: 210466	2014-06-09 11:40:41 +00:00
Craig Topper	66f09ad041	[C++11] Use 'nullptr'. llvm-svn: 210442	2014-06-08 22:29:17 +00:00
Benjamin Kramer	d0700b2919	X86: Don't turn shifts into ands if there's another use that may not check for equality. Fixes PR19964. llvm-svn: 210371	2014-06-06 21:08:55 +00:00
Filipe Cabecinhas	5181255696	Fixed a bug in lowering shuffle_vectors to insertps Summary: We were being too strict and not accounting for undefs. Added a test case and fixed another one where we improved codegen. Reviewers: grosbach, nadav, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4039 llvm-svn: 210361	2014-06-06 18:07:06 +00:00
Andrea Di Biagio	4760813831	[X86] Fix checked arithmetic for i8 on X86. When lowering a ISD::BRCOND into a test+branch, make sure that we always use the correct condition code to emit the test operation. This fixes PR19858: "i8 checked mul is wrong on x86". Patch by Keno Fisher! llvm-svn: 210032	2014-06-02 16:00:27 +00:00
Eric Christopher	8995833a34	Have the TLOF creation take a Triple rather than needing a subtarget. llvm-svn: 209937	2014-05-31 00:07:32 +00:00
Andrea Di Biagio	446a527905	[X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930	2014-05-30 23:17:53 +00:00
Filipe Cabecinhas	d3aebaf875	Separate the check for blend shuffle_vector masks Summary: Separate the check for blend shuffle_vector masks into isBlendMask. This function will also be used to check if a vector shuffle is legal. No change in functionality was intended, but we ended up improving codegen on two tests, which were being (more) optimized only if the resulting shuffle was legal. Reviewers: nadav, delena, andreadb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3964 llvm-svn: 209923	2014-05-30 21:31:21 +00:00
Rafael Espindola	a31f3e50dc	Delete dead code. GV is never used past this point. This was probably a copy and paste error. llvm-svn: 209518	2014-05-23 15:07:51 +00:00
Andrea Di Biagio	c8dd1ad85b	[X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8. This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to MVT::v8i8 (and vice versa). This patch extends the logic from revision 208107 to also handle MVT::v4i16 and MVT::v8i8. Also, this patch correctly propagates Undef values when performing the widening of a vector (example: when widening from v2i32 to v4i32, the upper 64bits of the resulting vector are 'undef'). llvm-svn: 209451	2014-05-22 16:21:39 +00:00
Quentin Colombet	b4d53f1afa	[X86] Fix a bug in the lowering of BLENDI introduced in r209043. ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the second argument. On the other hand, BLENDI uses 0 to identify the first argument and 1 to identify the second argument. Fix the generation of the blend mask to account for this difference. The bug did not show up with r209043, because we were not checking for the actual arguments of the blend instruction! This commit also fixes the test cases. Note: The same mask works for the BLENDr variant because the arguments are swapped during instruction selection (see the BLENDXXrr patterns). <rdar://problem/16975435> llvm-svn: 209324	2014-05-21 22:00:39 +00:00
Simon Atanasyan	e7fa2314af	Add parentheses to suppress the gcc warning '-Wparentheses'. No functional changes. llvm-svn: 209203	2014-05-20 10:23:04 +00:00
Filipe Cabecinhas	dc92102766	Added more insertps optimizations Summary: When inserting an element that's coming from a vector load or a broadcast of a vector (or scalar) load, combine the load into the insertps instruction. Added PerformINSERTPSCombine for the case where we need to fix the load (load of a vector + insertps with a non-zero CountS). Added patterns for the broadcasts. Also added tests for SSE4.1, AVX, and AVX2. Reviewers: delena, nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3581 llvm-svn: 209156	2014-05-19 19:45:57 +00:00
Benjamin Kramer	f3ad23551d	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00
Saleem Abdulrasool	f3a5a5c546	Target: remove old constructors for CallLoweringInfo This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082	2014-05-17 21:50:17 +00:00
Chandler Carruth	c85473143c	[x86] Fix a bad predicate I spotted by inspection -- pshufhw and pshuflw were added in SSE2, no SSSE3. Found this while auditing all uses of SSSE3 in the X86 target. I don't actually expect this to make a significant difference on anything and I don't have any detailed test cases but I updated the existing test cases that already covered some of this code path. llvm-svn: 209056	2014-05-17 03:29:20 +00:00
Filipe Cabecinhas	89654da069	Implemented special cases for PerformVSELECTCombine. vselects with constant masks, after legalization, will get turned into specialized shuffle_vectors so they can be matched to blend+imm instructions. Fixed some tests. llvm-svn: 209044	2014-05-16 22:47:54 +00:00
Filipe Cabecinhas	e15551832c	Lower vselects into X86ISD::BLENDI when appropriate. LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the condition is constant and we can emit that instruction, given the subtarget. This is not enough for all cases. An additional SELECTCombine optimization will be committed. Fixed tests that were expecting variable blends but where a blend+imm can be generated. Added test where we can't emit blend+immediate. Added avx2 blend+imm tests. llvm-svn: 209043	2014-05-16 22:47:49 +00:00
Filipe Cabecinhas	17254aaead	Implemented LowerVSELECT to custom lower some instructions. No functionality change intended. The types that previously were set to lower as Expand or Legal are doing the same thing with this lowering function. llvm-svn: 209042	2014-05-16 22:47:43 +00:00
Rafael Espindola	e0098928c9	Delete getAliasedGlobal. llvm-svn: 209040	2014-05-16 22:37:03 +00:00
Andrea Di Biagio	d621120533	[X86] Teach the backend how to fold SSE4.1/AVX/AVX2 blend intrinsics. Added target specific combine rules to fold blend intrinsics according to the following rules: 1) fold(blend A, A, Mask) -> A; 2) fold(blend A, B, <allZeros>) -> A; 3) fold(blend A, B, <allOnes>) -> B. Added two new tests to verify that the new folding rules work for all the optimized blend intrinsics. llvm-svn: 208895	2014-05-15 15:18:15 +00:00
Alp Toker	beaca19c7c	Fix typos llvm-svn: 208839	2014-05-15 01:52:21 +00:00
Jay Foad	a0653a3e6c	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Reid Kleckner	7a59e0845f	Try to fix an SDAG dependence issue with sret r208453 added support for having sret on the second parameter. In that change, the code for copying sret into a virtual register was hoisted into the loop that lowers formal parameters. This caused a "Wrong topological sorting" assertion failure during scheduling when a parameter is passed in memory. This change undoes that by creating a second loop that deals with sret. I'm worried that this fix is incomplete. I don't fully understand the dependence issues. However, with this change we produce the same DAGs we used to produce, so if they are broken, they are just as broken as they have always been. llvm-svn: 208637	2014-05-12 22:01:27 +00:00
Aaron Ballman	29fd7b9b20	Silencing an MSVC warning about not all control paths returning a value (even though the switch is fully covered). No functional change. llvm-svn: 208565	2014-05-12 14:22:58 +00:00
Benjamin Kramer	3b36b72a9c	X86: Make sure that we have SSE4.1 before we generate insertps nodes. PR19721. llvm-svn: 208552	2014-05-12 13:12:08 +00:00
NAKAMURA Takumi	6c383021c9	X86ISelLowering.cpp:LowerINTRINSIC_W_CHAIN(): Prune impossible "default:" [-Wcovered-switch-default] llvm-svn: 208533	2014-05-12 10:16:46 +00:00
Elena Demikhovsky	4f591c0d45	Fixed compilation issue llvm-svn: 208524	2014-05-12 07:45:41 +00:00
Elena Demikhovsky	8e8fde8e93	AVX-512: changes in intrinsics 1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed. 2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012). 3) I added gather/scatter prefetch intrinsics. 4) I fixed MRMm encoding for masked instructions. llvm-svn: 208522	2014-05-12 07:18:51 +00:00
Hal Finkel	f0e086a0bc	Pass the value type to TLI::getRegisterByName We must validate the value type in TLI::getRegisterByName, because if we don't and the wrong type was used with the IR intrinsic, then we'll assert (because we won't be able to find a valid register class with which to construct the requested copy operation). For PPC64, additionally, the type information is necessary to decide between the 64-bit register and the 32-bit subregister. No functionality change. llvm-svn: 208508	2014-05-11 19:29:07 +00:00
Filipe Cabecinhas	0e3d1cb5d6	Fixed a bug when lowering build_vector (PR19694) When lowering build_vector to an insertps, we would still lower it, even if the source vectors weren't v4x32. This would break on avx if the source was a v8x32. We now check the type of the source vectors. llvm-svn: 208487	2014-05-11 08:12:56 +00:00
Reid Kleckner	7941856445	Allow sret on the second parameter as well as the first MSVC always places the implicit sret parameter after the implicit this parameter of instance methods. We used to handle this for x86_thiscallcc by allocating the sret parameter on the stack and leaving the this pointer in ecx, but that doesn't handle alternative calling conventions like cdecl, stdcall, fastcall, or the win64 convention. Instead, change the verifier to allow sret on the second parameter. This also requires changing the Mips and X86 backends to return the argument with the sret parameter, instead of assuming that the sret parameter comes first. The Sparc backend also returns sret parameters in a register, but I wasn't able to update it to handle secondary sret parameters. It currently calls report_fatal_error if you feed it an sret in the second parameter. Reviewers: rafael.espindola, majnemer Differential Revision: http://reviews.llvm.org/D3617 llvm-svn: 208453	2014-05-09 22:32:13 +00:00
Andrea Di Biagio	932ff1ddd7	Fix 80 col violation. No functional change intended. llvm-svn: 208405	2014-05-09 11:08:23 +00:00
Filipe Cabecinhas	e4b482b3ed	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 208372	2014-05-08 23:16:08 +00:00
Andrea Di Biagio	e85ba4df52	[X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics. This patch teaches the backend how to combine packed SSE2/AVX2 arithmetic shift intrinsics. The rules are: - Always fold a packed arithmetic shift by zero to its first operand; - Convert a packed arithmetic shift intrinsic dag node into a ISD::SRA only if the shift count is known to be smaller than the vector element size. This patch also teaches to function 'getTargetVShiftByConstNode' how fold target specific vector shifts by zero. Added two new tests to verify that the DAGCombiner is able to fold sequences of SSE2/AVX2 packed arithmetic shift calls. llvm-svn: 208342	2014-05-08 17:44:04 +00:00
Filipe Cabecinhas	095d9d573a	Lower certain build_vectors to insertps instructions Summary: Vectors built with zeros and elements in the same order as another (source) vector are optimized to be built using a single insertps instruction. Also optimize when we move one element in a vector to a different place in that vector while zeroing out some of the other elements. Further optimizations are possible, described in TODO comments. I will be implementing at least some of them in the near future. Added some tests for different cases where this optimization triggers. Reviewers: nadav, delena, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3521 llvm-svn: 208271	2014-05-08 00:25:16 +00:00
Andrea Di Biagio	c14ccc9184	[X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa). Before this patch, the backend always emitted a store+load sequence to bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting i64 node was then used to build a v2i32 vector. With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free" bitcast to type MVT::v4i32. The elements of the resulting v4i32 are then extracted to build a v2i32 vector (which is illegal and therefore promoted to MVT::v2i64). This is in general cheaper than emitting a stack store+load sequence to bitconvert the operand from type f64 to type i64. llvm-svn: 208107	2014-05-06 17:09:03 +00:00
Renato Golin	c7aea40ec6	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Reid Kleckner	4a406d32e9	Fix i128 div/mod on mingw64 The Win64 docs are very clear that anything larger than 8 bytes is passed by reference, and GCC MinGW64 honors that for __modti3 and friends. Patch by Jameson Nash! llvm-svn: 208029	2014-05-06 01:20:42 +00:00
Filipe Cabecinhas	fe59062b75	Revert "Optimize shufflevector that copies an i64/f64 and zeros the rest." This reverts commit 207992. I misread the phab number on the LGTM. llvm-svn: 207993	2014-05-05 19:40:36 +00:00
Filipe Cabecinhas	263d98c19f	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 207992	2014-05-05 19:36:28 +00:00
Craig Topper	2d2aa0ca1f	Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently. llvm-svn: 207616	2014-04-30 07:17:30 +00:00
Reid Kleckner	fb69308568	Implement X86 code generation for musttail Currently, musttail codegen is relying on sibcall optimization, and reporting a fatal error if fails. Sibcall optimization fails when stack arguments need to be modified, which is insufficient for musttail. The logic for moving arguments in memory safely is already implemented for GuaranteedTailCallOpt. This change merely arranges for musttail calls to use it. No functional change for GuaranteedTailCallOpt. Reviewers: espindola Differential Revision: http://reviews.llvm.org/D3493 llvm-svn: 207598	2014-04-29 23:55:41 +00:00
Elena Demikhovsky	299cf511c4	AVX-512: optimized a shuffle pattern to VINSERTI64x4. Added intrinsics for VPERMT2PS/PD/D/Q instructions. llvm-svn: 207513	2014-04-29 09:09:15 +00:00
Quentin Colombet	50efe87e5b	[X86] Add more details in the comments of X86TargetLowering::getScalingFactorCost. llvm-svn: 207432	2014-04-28 18:39:57 +00:00
Craig Topper	e73658ddbb	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Craig Topper	dd5e16dd34	Convert one last signature of getNode to take an ArrayRef of SDUse. llvm-svn: 207376	2014-04-27 19:21:06 +00:00
Craig Topper	64941d9786	Convert SelectionDAG::getMergeValues to use ArrayRef. llvm-svn: 207374	2014-04-27 19:20:57 +00:00
Benjamin Kramer	3693e77cb4	X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts. This is more expensive than pmuldq but still cheaper than scalarizing the whole thing. llvm-svn: 207370	2014-04-27 18:47:41 +00:00
Craig Topper	206fcd450a	Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size. llvm-svn: 207329	2014-04-26 19:29:41 +00:00
Craig Topper	48d114bed1	Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>. llvm-svn: 207327	2014-04-26 18:35:24 +00:00
Benjamin Kramer	c2ad8f3ef1	Print X86ISD::PMULDQ nodes properly in debug output. llvm-svn: 207322	2014-04-26 16:26:41 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Benjamin Kramer	c9827ab103	X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317	2014-04-26 13:01:03 +00:00
Benjamin Kramer	ad0168702a	Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors. llvm-svn: 207316	2014-04-26 13:00:53 +00:00
Benjamin Kramer	29139d5cb5	X86: Custom lower v4i32 UMUL_LOHI into 2 pmuludqs. Test will follow soon. llvm-svn: 207314	2014-04-26 12:06:11 +00:00
Quentin Colombet	ea18933d97	[X86] Implement TargetLowering::getScalingFactorCost hook. Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301	2014-04-26 01:11:26 +00:00
Filipe Cabecinhas	363b570d2a	Optimization for certain shufflevector by using insertps. Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291	2014-04-25 23:51:17 +00:00
Craig Topper	062a2baef0	[C++] Use 'nullptr'. Target edition. llvm-svn: 207197	2014-04-25 05:30:21 +00:00
Benjamin Kramer	76f753e9a9	X86: Don't transform shifts into ands when the sign bit is tested. Should unbreak MultiSource/Benchmarks/mediabench/g721/g721encode/encode. llvm-svn: 207145	2014-04-24 20:51:37 +00:00
Reid Kleckner	5772b77789	Add 'musttail' marker to call instructions This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143	2014-04-24 20:14:34 +00:00
Andrea Di Biagio	d1ab866868	[X86] Add support for Read Time Stamp Counter x86 builtin intrinsics. This patch: - Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and 'int_x86_rdtscp') as GCCBuiltin intrinsics; - Teaches the backend how to lower the two new builtins; - Introduces a common function to lower READCYCLECOUNTER dag nodes and the two new rdtsc/rdtscp intrinsics; - Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll' correctly verifies that both READCYCLECOUNTER and the two new intrinsics work fine for both 64bit and 32bit Subtargets. llvm-svn: 207127	2014-04-24 17:18:27 +00:00
Benjamin Kramer	f4575db2fd	X86: Emit test instead of constant shift + compare if the shift result is unused. This allows us to compile return (mask & 0x8 ? a : b); into testb $8, %dil cmovnel %edx, %esi instead of andl $8, %edi shrl $3, %edi cmovnel %edx, %esi which we formed previously because dag combiner canonicalizes setcc of and into shift. llvm-svn: 207088	2014-04-24 08:15:31 +00:00
Elena Demikhovsky	acc5c9e83e	AVX-512: store and truncstore for i1 values llvm-svn: 206897	2014-04-22 14:13:10 +00:00
Lang Hames	3067ab2344	[X86] Use tablegen instead of DAG combines to match BZHI instructions, as suggested by Ben Kramer in review of r206738. Thanks again Ben! llvm-svn: 206879	2014-04-22 10:41:56 +00:00
Lang Hames	f6f42cac3f	[X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the review. llvm-svn: 206869	2014-04-22 07:40:34 +00:00
Chandler Carruth	84e68b2994	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842	2014-04-22 02:41:26 +00:00
Lang Hames	5aa6ee80b6	[X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available. Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)), was already supported, but we were missing the constant-mask case. This patch fixes that. <rdar://problem/15480077> llvm-svn: 206738	2014-04-21 08:18:53 +00:00
Adam Nemet	ee7a3e38c9	[X86] Improve buildFromShuffleMostly for AVX For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors, both the BUILD_VECTOR and its operands may need to be legalized in multiple steps. Consider: (v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>), (extract_vector_elt %vreg0, Constant<2>), (extract_vector_elt %vreg0, Constant<3>), (extract_vector_elt %vreg0, Constant<4>), (extract_vector_elt %vreg0, Constant<5>), (extract_vector_elt %vreg0, Constant<6>), (extract_vector_elt %vreg0, Constant<7>), %vreg1)) a. We can't build a 256-bit vector efficiently so, we need to split it into two 128-bit vecs and combine them with VINSERTX128. b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be split into a VEXTRACTX128 and a further extract_vector_elt from the resulting 128-bit vector. c. The extract_vector_elt from b. is lowered into a shuffle to the first element and a movss. Depending on the order in which we legalize the BUILD_VECTOR and its operands[1], buildFromShuffleMostly may be faced with: (v4f32 (BUILD_VECTOR (extract_vector_elt (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), (extract_vector_elt (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), (extract_vector_elt (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), %vreg1)) In order to figure out the underlying vector and their identity we need to see through the shuffles. [1] Note that the order in which operations and their operands are legalized is only guaranteed in the first iteration of LegalizeDAG. Fixes <rdar://problem/16296956> llvm-svn: 206634	2014-04-18 19:44:16 +00:00
Andrea Di Biagio	aac2eac4c2	[X86] Improve the lowering of packed shifts by constant build_vector. This patch teaches the backend how to efficiently lower logical and arithmetic packed shifts on both SSE and AVX/AVX2 machines. When possible, instead of scalarizing a vector shift, the backend should try to expand the shift into a sequence of two packed shifts by immedate count followed by a MOVSS/MOVSD. Example (v4i32 (srl A, (build_vector < X, Y, Y, Y>))) Can be rewritten as: (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>))) [with X and Y ConstantInt] The advantage is that the two new shifts from the example would be lowered into X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into four scalar shifts plus four pairs of vector insert/extract. llvm-svn: 206316	2014-04-15 19:30:48 +00:00
Nick Lewycky	aad475b324	Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255	2014-04-15 07:22:52 +00:00
David Blaikie	9027abae53	Change argument order and add explanatory comment to r206130 Changes requested in code review by Eric Christopher of r206130. llvm-svn: 206219	2014-04-14 22:23:06 +00:00
David Blaikie	269e0fb2e4	Fix instruction debug info location during legalization I found this from a particular GDB test suite case of inlining (something similar is provided as a test case) but came across a few other related cases (other callers of the same functions, and one other instance of the same coding mistake in a separate function). I'm not sure what the best way to test this is (let alone to cover the other cases I discovered), so hopefully this sufficies - open to ideas. llvm-svn: 206130	2014-04-13 06:39:55 +00:00
Reid Kleckner	9c6582129a	Move the segmented stack switch to a function attribute This removes the -segmented-stacks command line flag in favor of a per-function "split-stack" attribute. Patch by Luqman Aden and Alex Crichton! llvm-svn: 205997	2014-04-10 22:58:43 +00:00
Elena Demikhovsky	cf0b9bafc3	AVX-512: insert element to mask vector; store i1 data Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors; Implemented "store" for i1 type llvm-svn: 205850	2014-04-09 12:37:50 +00:00
Elena Demikhovsky	3dcfbdfa54	AVX-512: Added fp_to_uint and uint_to_fp patterns. llvm-svn: 205754	2014-04-08 07:24:02 +00:00
Matt Arsenault	cf6f688a40	Add DAG parameter to ComputeNumSignBitsForTargetNode This way, you can check the number of sign bits in the operands. The depth parameter it already has is pretty useless without this. llvm-svn: 205649	2014-04-04 20:13:13 +00:00
Craig Topper	840beec2d0	Make consistent use of MCPhysReg instead of uint16_t throughout the tree. llvm-svn: 205610	2014-04-04 05:16:06 +00:00
Yaron Keren	2895496852	Added isTargetWindowsMSVC(), renamed isTargetMingw() to isTargetWindowsGNU() and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the four Windows environments in Triple.h. Suggestion by Saleem Abdulrasool! llvm-svn: 205393	2014-04-02 04:27:51 +00:00
Yaron Keren	48d68d439a	If isKnownWindowsMSVCEnvironment then getOS == Triple::Win32 and Environment == Triple::MSVC so it will never be MinGW or Cygwin. llvm-svn: 205349	2014-04-01 18:52:55 +00:00
Yaron Keren	136fe7db46	isTargetWindows() renamed to isTargetKnownWindowsMSVC() to reflect its current functionality. Based on Takumi NAKAMURA suggestion. llvm-svn: 205338	2014-04-01 18:15:34 +00:00
Aaron Ballman	8bf5a548ea	Attempting to fix r205124, which had failed asserts when built with MSVC. Suggestion from Yaron Keren. llvm-svn: 205313	2014-04-01 13:56:35 +00:00
Alexey Volkov	1328b28dc6	[x86] Do not convert to cmp32 for Atom arch by Sergey Okunev Differential Revision: http://llvm-reviews.chandlerc.com/D2824 llvm-svn: 205288	2014-04-01 08:13:07 +00:00
Rafael Espindola	24a669d225	Prevent alias from pointing to weak aliases. This adds back r204781. Original message: Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204934	2014-03-27 15:26:56 +00:00
Rafael Espindola	65481d7b97	Revert "Prevent alias from pointing to weak aliases." This reverts commit r204781. I will follow up to with msan folks to see what is what they were trying to do with aliases to weak aliases. llvm-svn: 204784	2014-03-26 06:14:40 +00:00
Rafael Espindola	3b712a84a9	Prevent alias from pointing to weak aliases. Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204781	2014-03-26 04:48:47 +00:00
Adam Nemet	4beef4c90d	[X86] Generate VPSHUFB for in-place v16i16 shuffles This used to resort to splitting the 256-bit operation into two 128-bit shuffles and then recombining the results. Fixes <rdar://problem/16167303> llvm-svn: 204735	2014-03-25 17:47:06 +00:00
Adam Nemet	ac6d6383a3	[X86] Factor out new helper getPSHUFB I found three implementations of this. This splits it out into a new function and uses it from the three places. My plan is to add a fourth use when lowering a vector_shuffle:v16i16. Compared the assembly output of test/CodeGen/X86 before and after. The only change is due to how the first PSHUFB was generated in LowerVECTOR_SHUFFLEv8i16. If the shuffle mask specified undef (i.e. -1), the old implementation would write -1 * 2 and -1 * 2 + 1 (254 and 255) in the control mask. Now we write 0x80. These are of course interchangeable since bit 7 decides if a constant zero is written in the result byte. The other instances of this code use 0x80 consistently. Related to <rdar://problem/16167303> llvm-svn: 204734	2014-03-25 17:47:03 +00:00
Adam Nemet	b47372f555	[X86] Fix non-determinism in LowerVectorAllZeroTest This can be observed with the old testcase of CodeGen/X86/pr12312.ll: 47c47 < vorps %ymm0, %ymm1, %ymm0 --- > vorps %ymm1, %ymm0, %ymm0 97c97 < vorps %ymm1, %ymm0, %ymm0 --- > vorps %ymm0, %ymm1, %ymm0 The vector VecIns is populated with all the values from VecInMap. This is done while iterating VecInMap. VecInMap uses a hash of pointer values so the resulting order can vary depending on the memory layout. The fix is to populate the vector VecIns earlier as VecInMap is populated. This is done in DAG traversal order. Fixes <rdar://problem/16398806> llvm-svn: 204623	2014-03-24 16:52:08 +00:00
Craig Topper	c6d4efa1e5	Prune includes in X86 target. llvm-svn: 204216	2014-03-19 06:53:25 +00:00
Adam Nemet	8a130a5f86	[X86] Fix unused variable warning with NDEBUG from r204058 llvm-svn: 204063	2014-03-17 17:32:53 +00:00
Adam Nemet	24381f1cb7	[VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16 Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX. For that to work properly, we also need to teach the legalizer about the specific promotion required here. The default vector promotion uses bitcasting to a vector type of the same total size. We want to promote the vector element type, effectively widening the operation and then truncating the result. This is analogous to the current logic of how int_to_fp is promoted. The change also factors out some code from the int_to_fp promotion code to ValueType::widenIntegerVectorElementType. This is now shared between int_to_fp and fp_to_int. There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in X86. It can now go through the new target-independent fp_to_*int promotion logic. I also checked that no other target uses Promote for these ops yet, so there shouldn't be any unexpected change in behavior. Fixes <rdar://problem/16202247> llvm-svn: 204058	2014-03-17 17:06:14 +00:00
Arnaud A. de Grandmaison	75c9e6dedf	Remove some dead assignements found by scan-build llvm-svn: 204013	2014-03-15 22:13:15 +00:00
Owen Anderson	16c6bf49b7	Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing operator* on the by-operand iterators to return a MachineOperand& rather than a MachineInstr&. At this point they almost behave like normal iterators! Again, this requires making some existing loops more verbose, but should pave the way for the big range-based for-loop cleanups in the future. llvm-svn: 203865	2014-03-13 23:12:04 +00:00
Hans Wennborg	6c37f8b985	X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059) This fixes the bug where we would bitcast the 64-bit floating point result of cmpneqsd to a 64-bit integer even on 32-bit targets. Differential Revision: http://llvm-reviews.chandlerc.com/D3009 llvm-svn: 203581	2014-03-11 15:49:24 +00:00
Tim Northover	e94a518a22	IR: add a second ordering operand to cmpxhg for failure The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559	2014-03-11 10:48:52 +00:00
Jim Grosbach	c94d993adf	X86: Enable ISel of 16-bit MOVBE instructions. When the MOVBE instructions are available, use them for 16-bit endian swapping as well as for 32 and 64 bit. The patterns were already present on the instructions, but weren't being matched because the operation was unconditionally marked to 'Expand.' Change that to be conditional on whether the MOVBE instructions are available. Use 'rolw' to implement the in-register version (32 and 64 bit have the dedicated 'bswap' instruction for that). Patch by Louis Gerbarg <lgg@apple.com>. rdar://15479984 llvm-svn: 203524	2014-03-11 00:44:14 +00:00
Cameron McInally	791ae9927c	Lower AVX v4i64->v4i32 truncate to one shuffle. llvm-svn: 202996	2014-03-05 19:41:16 +00:00
Chandler Carruth	219b89b987	[Modules] Move CallSite into the IR library where it belogs. It is abstracting between a CallInst and an InvokeInst, both of which are IR concepts. llvm-svn: 202816	2014-03-04 11:01:28 +00:00
Benjamin Kramer	b6d0bd48bd	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Elena Demikhovsky	9737e3886b	AVX-512: Fixed extract_vector_elt for v8i1 vector llvm-svn: 202624	2014-03-02 09:19:44 +00:00
Quentin Colombet	85c9e16291	Lower unsigned vsetcc to psubus in certain cases The current approach to lower a vsetult is to flip the sign bit of the operands, swap the operands and then use a (signed) pcmpgt. psubus (unsigned saturating subtract) can be used to emulate a vsetult more efficiently: + case ISD::SETULT: { + // If the comparison is against a constant we can turn this into a + // setule. With psubus, setule does not require a swap. This is + // beneficial because the constant in the register is no longer + // destructed as the destination so it can be hoisted out of a loop. I also enable lowering via psubus in a few other cases where it's clearly beneficial: setule and setuge if minu/maxu cannot be used. rdar://problem/14338765 Patch by Adam Nemet <anemet@apple.com>. llvm-svn: 202301	2014-02-26 21:39:12 +00:00
Tim Northover	aeb8e06d4c	X86 CodeGenPrep: sink shufflevectors before shifts On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift is really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655	2014-02-19 10:02:43 +00:00
Tim Northover	f06df5866f	X86: use vpsllvd (& friends) for 16-bit shifts on Haswell llvm-svn: 201558	2014-02-18 11:15:32 +00:00
Elena Demikhovsky	1fad075974	AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence llvm-svn: 201487	2014-02-16 11:34:23 +00:00
Andrea Di Biagio	386d566395	[X86] Teach the backend how to lower vector shift left into multiply rather than scalarizing it. Instead of expanding a packed shift into a sequence of scalar shifts, the backend now tries (when possible) to convert the vector shift into a vector multiply. Before this change, a shift of a MVT::v8i16 vector by a build_vector of constants was always scalarized into a long sequence of "vector extracts + scalar shifts + vector insert". With this change, if there is SSE2 support, we emit a single vector multiply. This change also affects SSE4.1, AVX, AVX2 shifts: - A shift of a MVT::v4i32 vector by a build_vector of non uniform constants is now lowered when possible into a single SSE4.1 vector multiply. - Packed v16i16 shift left by constant build_vector are now expanded when possible into a single AVX2 vpmullw. This change also improves the lowering of AVX512f vector shifts. Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected by this change. llvm-svn: 201271	2014-02-12 23:42:28 +00:00
Elena Demikhovsky	1f32c313f1	AVX: fixed a bug in LowerVECTOR_SHUFFLE llvm-svn: 201140	2014-02-11 10:21:53 +00:00
Elena Demikhovsky	2aafc22ed9	AVX-512: Optimized BUILD_VECTOR pattern; fixed encoding of VEXTRACTPS instruction. llvm-svn: 201134	2014-02-11 07:25:59 +00:00
Elena Demikhovsky	9f423d6f25	AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors. llvm-svn: 201066	2014-02-10 07:02:39 +00:00
Tim Northover	546b57b011	X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918	2014-02-06 09:54:51 +00:00
Matt Arsenault	25793a3f22	Add address space argument to allowsUnalignedMemoryAccess. On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887	2014-02-05 23:15:53 +00:00
Elena Demikhovsky	0b79be8ab2	AVX-512: optimized icmp -> sext -> icmp pattern llvm-svn: 200849	2014-02-05 16:17:36 +00:00
Craig Topper	7ee163842f	Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns instead of DAG combine. This weakens the ability to fold loads with them because we aren't able to match patterns that load the same thing twice. But maybe we should fix that if we care. The peephole optimizer will be able to fold some loads in its absense. llvm-svn: 200824	2014-02-05 07:09:40 +00:00
Reid Kleckner	f5b76518c9	Implement inalloca codegen for x86 with the new inalloca design Calls with inalloca are lowered by skipping all stores for arguments passed in memory and the initial stack adjustment to allocate argument memory. Now the frontend is responsible for the memory layout, and the backend doesn't have to do any work. As a result these changes are pretty minimal. Reviewers: echristo Differential Revision: http://llvm-reviews.chandlerc.com/D2637 llvm-svn: 200596	2014-01-31 23:50:57 +00:00
Reid Kleckner	c5d9e159eb	x86: Rename NumBytesForCalleeToPush to ...Pop for accuracy If we have a callee cleanup convention, the callee is going to pop the arguments off the stack, not push them on. llvm-svn: 200566	2014-01-31 19:07:18 +00:00
Andrea Di Biagio	2ea61f17ad	[X86] Add extra rules for combining vselect dag nodes into movsd. This improves the fix committed at revision 199683 adding the following new target specific combine rules: 1) fold (v4i32: vselect <0,0,-1,-1>, A, B) -> (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) )) 2) fold (v4f32: vselect <0,0,-1,-1>, A, B) -> (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) )) 3) fold (v4i32: vselect <-1,-1,0,0>, A, B) -> (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) )) 4) fold (v4f32: vselect <-1,-1,0,0>, A, B) -> (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) )) llvm-svn: 200324	2014-01-28 18:14:21 +00:00
Juergen Ributzka	659ce00d60	[TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself. Before this patch we used getIntImmCost from TargetTransformInfo to determine if a load of a constant should be converted to just a constant, but the threshold for this was set to an arbitrary value. This value works well for the two targets (X86 and ARM) that implement this target-hook, but it isn't target-independent at all. Now targets have the possibility to decide directly if this optimization should be performed. The default value is set to false to preserve the current behavior. The target hook has been moved to TargetLowering, which removed the last use and need of TargetTransformInfo in SelectionDAG. llvm-svn: 200271	2014-01-28 01:20:14 +00:00
Juergen Ributzka	e758ddcd16	[X86] Prevent the creation of redundant ops for sadd and ssub with overflow. This commit teaches the X86 backend to create the same X86 instructions when it lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses that overflow result. This allows SelectionDAG to recognize and remove one of the redundant operations. This fixes <rdar://problem/15874016> and <rdar://problem/15661073>. Reviewed by Nadav llvm-svn: 199976	2014-01-24 06:47:57 +00:00
Lang Hames	b1ce33379a	Add a few missing cases from r199933. Testcase coming shortly. llvm-svn: 199938	2014-01-23 21:27:27 +00:00
Lang Hames	23de211c5d	Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator loops. Writing back to the accumulator (231-type) allows the coalescer to eliminate an extra copy. llvm-svn: 199933	2014-01-23 20:23:36 +00:00
Elena Demikhovsky	a5d38a39a0	AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions, they give better sequences than VPERMI llvm-svn: 199893	2014-01-23 14:27:26 +00:00
Andrea Di Biagio	450d1661be	[X86] Teach how to combine a vselect into a movss/movsd Add target specific rules for combining vselect dag nodes into movss/movsd when possible. If the vector type of the vselect dag node in input is either MVT::v4i13 or MVT::v4f32, then try to fold according to rules: 1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B) 2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A) If the vector type of the vselect dag node in input is either MVT::v2i64 or MVT::v2f64 (and we have SSE2), then try to fold according to rules: 3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B) 4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A) llvm-svn: 199683	2014-01-20 19:35:22 +00:00
Michael Gottesman	8347c34e67	Move the retrieval of VT after all of the early exits from PerformOrCombine that do not use VT. NFC. llvm-svn: 199612	2014-01-19 21:06:00 +00:00
Elena Demikhovsky	d1487261a0	AVX-512: fixed a compare pattern llvm-svn: 199366	2014-01-16 08:45:54 +00:00
David Majnemer	dee105772c	WinCOFF: Transform IR expressions featuring __ImageBase into image relative relocations MSVC on x64 requires that we create image relative symbol references to refer to RTTI data. Seeing as how there is no way to explicitly make reference to a given relocation type in LLVM IR, pattern match expressions of the form &foo - &__ImageBase. Differential Revision: http://llvm-reviews.chandlerc.com/D2523 llvm-svn: 199312	2014-01-15 09:16:42 +00:00
Elena Demikhovsky	79b75d9048	Fixed identation. llvm-svn: 199301	2014-01-15 07:18:11 +00:00
Lang Hames	06234ec147	Add FPExt option to CCValAssign::LocInfo. When generating calling-convention promotion code, Tablegen will now select FPExt for floating point promotions (previously it had returned AExt, which is not valid for floating point types). Any out-of-tree targets that were relying on AExt being returned for FP promotions will need to update their code check for FPExt instead. llvm-svn: 199252	2014-01-14 19:56:36 +00:00
Nico Rieck	7157bb765e	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199218	2014-01-14 15:22:47 +00:00
Elena Demikhovsky	767fc967b4	AVX-512: optimized scalar compare patterns removed AVX512SI format, since it is similar to AVX512BI. llvm-svn: 199217	2014-01-14 15:10:08 +00:00
Andrea Di Biagio	5448a3c771	[X86] Fix assertion failure caused by a wrong folding of vector shifts by immediate count. This fixes a regression intruced by r198113. Revision r198113 introduced an algorithm that tries to fold a vector shift by immediate count into a build_vector if the input vector is a known vector of constants. However the algorithm only worked under the assumption that the input vector type and the shift type are exactly the same. This patch disables the folding of vector shift by immediate count if the input vector type and the shift value type are not the same. llvm-svn: 199213	2014-01-14 13:17:12 +00:00
Nico Rieck	9d2e0df049	Revert "Decouple dllexport/dllimport from linkage" Revert this for now until I fix an issue in Clang with it. This reverts commit r199204. llvm-svn: 199207	2014-01-14 12:38:32 +00:00
Nico Rieck	e43aaf7967	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199204	2014-01-14 11:55:03 +00:00
Elena Demikhovsky	172a27c750	AVX-512: Added more intrinsics for pmin/pmax, pabs, blend, pmuldq. llvm-svn: 198745	2014-01-08 10:54:22 +00:00
Bill Wendling	13199b17f8	Remove unnecessary #includes. llvm-svn: 198585	2014-01-06 06:00:00 +00:00
Bill Wendling	908bf814e7	Refactor function that checks that __builtin_returnaddress's argument is constant. This moves the check up into the parent class so that all targets can use it without having to copy (and keep in sync) the same error message. llvm-svn: 198579	2014-01-06 00:43:20 +00:00
Elena Demikhovsky	f404e054a1	AVX-512: changed property name from "neverHasSideEffects=1" to "hasSideEffects=0", added this property to VMOVSS/VMOVSD; Optimized a truncate pattern. llvm-svn: 198562	2014-01-05 14:21:07 +00:00
Elena Demikhovsky	52e4a0e109	AVX-512: Added more intrinsics for convert and min/max. Removed vzeroupper from AVX-512 mode - our optimization gude does not recommend to insert vzeroupper at all. llvm-svn: 198557	2014-01-05 10:46:09 +00:00
Bill Wendling	df7dd28dc8	Emit an error message if the value passed to __builtin_returnaddress isn't a constant __builtin_returnaddress requires that the value passed into is be a constant. However, at -O0 even a constant expression may not be converted to a constant. Emit an error message intead of crashing. llvm-svn: 198531	2014-01-05 01:47:20 +00:00
Craig Topper	a448bd868f	Make more of the x86 lowering helper functions static. llvm-svn: 198146	2013-12-29 01:48:38 +00:00
Craig Topper	059e8e0da1	Switch from EVT to MVT in more of the x86 instruction lowering code. llvm-svn: 198144	2013-12-29 01:10:06 +00:00
Craig Topper	bf096926c9	Use getSimpleValueType in a few spots where the type should be simple. llvm-svn: 198117	2013-12-28 18:35:48 +00:00
Craig Topper	e829fe42af	Minor indentation fix to match other switch statements. Change llvm_unreachable text to match similar places. llvm-svn: 198116	2013-12-28 17:37:32 +00:00
Andrea Di Biagio	eaceba0ed0	[X86] Teach the backend how to fold target specific dag node for packed vector shift by immedate count (VSHLI/VSRLI/VSRAI) into a build_vector when the vector in input to the shift is a build_vector of all constants or UNDEFs. Target specific nodes for packed shifts by immediate count are in general introduced by function 'getTargetVShiftByConstNode' (in X86ISelLowering.cpp) when lowering shift operations, SSE/AVX immediate shift intrinsics and (only in very few cases) SIGN_EXTEND_INREG dag nodes. This patch adds extra rules for simplifying vector shifts inside function 'getTargetVShiftByConstNode'. Added file test/CodeGen/X86/vec_shift5.ll to verify that packed shifts by immediate are correctly folded into a build_vector when the input vector to the shift dag node is a vector of constants or undefs. llvm-svn: 198113	2013-12-28 11:11:52 +00:00
Elena Demikhovsky	b64d7e8586	AVX-512: Result type of scalar SETCC is MVT::i1 for AVX-512. llvm-svn: 198008	2013-12-25 10:06:40 +00:00
Elena Demikhovsky	64c9548d66	AVX-512: fixed some patterns for MVT::i1 llvm-svn: 197981	2013-12-24 14:24:07 +00:00
Elena Demikhovsky	fe24a30e38	AVX512: SETCC returns i1 for AVX-512 and i8 for all others llvm-svn: 197876	2013-12-22 10:13:18 +00:00
Duncan P. N. Exon Smith	ab5dbebc11	Assert that the last operand is actually EFLAGS This is another follow-up to r197503, after a post-commit review by Andy. <rdar://problem/15627766> llvm-svn: 197520	2013-12-17 20:28:21 +00:00
Duncan P. N. Exon Smith	512601d77f	Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS"" This reverts commit r197481, recommiting r197469 with an extra fix. The vastart_save_xmm_regs pseudo-instruction expands to a test and a branch, so it modifies EFLAGS. Mark it so, or else the scheduler might place it in the middle of another test+branch. This fixes a bug exposed by r192750, which changed the initial scheduler to source-order as part of enabling the MI Scheduler for X86. This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to try to save %flags, and adds a test that catches the bad behavior of r197469. <rdar://problem/15627766> llvm-svn: 197503	2013-12-17 15:54:45 +00:00
Stepan Dyatkovskiy	7f7c2710e0	Fix for PR18045: http://llvm.org/bugs/show_bug.cgi?id=18045 Short issue description: For X86 machines with sse < sse4.1 we got failures for some particular load/store vector sequences: $ clang-trunk -m32 -O2 test-case.c fatal error: error in backend: Cannot select: 0x4200920: v4i32,ch = load 0x41d6ab0, 0x4205850, 0x41dcb10<LD16[getelementptr inbounds ([4 x i32]* @e, i32 0, i32 0)](align=4)> [ORD=82] [ID=58] 0x4205850: i32 = X86ISD::Wrapper 0x41d5490 [ORD=26] [ID=43] 0x41d5490: i32 = TargetGlobalAddress<[4 x i32]* @e> 0 [ORD=26] [ID=23] 0x41dcb10: i32 = undef [ID=2] The reason is that EltsFromConsecutiveLoads could emit such load instruction both before and after legalize stage. Though this instruction is not legal for machines with SSSE3 and lower. The fix: In EltsFromConsecutiveLoads, if we have passed legalize stage, we check whether nodes it emits are legal. P.S.: If you get failure in time from 12:00 and till 22:00 (UTC-8), perhaps I'll slow with response, so you better reject this commit. Thanks! llvm-svn: 197492	2013-12-17 12:07:33 +00:00
Elena Demikhovsky	c5f6726a24	AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482	2013-12-17 08:33:15 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00
Benjamin Kramer	e723bb10b0	X86: When lowering shl_parts, don't emit shift amounts larger than the bit width. While it's safe for the X86-specific shift nodes, dag combining will kill generic nodes. Insert an AND to make it safe, isel will nuke it as x86's shift instructions have an implicit AND. Fixes PR16108, which contains a contraption to hit this case in between constant folders. llvm-svn: 197228	2013-12-13 13:40:24 +00:00
Rafael Espindola	32cb5ac904	Switch to the new MingW ABI. GCC 4.7 changed the MingW ABI. On the LLVM side it means that sret functions don't pop the stack. llvm-svn: 197163	2013-12-12 16:06:58 +00:00
Tim Northover	9653eb5759	Make Triple's isOSBinFormatXXX functions partition triple-space. Most users would be surprised if "isCOFF" and "isMachO" were simultaneously true, unless they'd put the compiler in a box with a gun attached to a photon detector. This makes sure precisely one of the three formats is true for any triple and simplifies some target logic based on that. llvm-svn: 196934	2013-12-10 16:57:43 +00:00
Elena Demikhovsky	e382c3fdcd	AVX-512: changed intrinsics for mask operations llvm-svn: 196918	2013-12-10 13:53:10 +00:00
Cameron McInally	c5f420e129	Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers. Patch by Aleksey Bader. llvm-svn: 196386	2013-12-04 14:52:33 +00:00
Lang Hames	39609996d9	Refactor a lot of patchpoint/stackmap related code to simplify and make it target independent. Most of the x86 specific stackmap/patchpoint handling was necessitated by the use of the native address-mode format for frame index operands. PEI has now been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing us to use a simple, platform independent register/offset pair for frame indexes on stackmap/patchpoints. Notes: - Folding is now platform independent and automatically supported. - Emiting patchpoints with direct memory references now just involves calling the TargetLoweringBase::emitPatchPoint utility method from the target's XXXTargetLowering::EmitInstrWithCustomInserter method. (See X86TargetLowering for an example). - No more ugly platform-specific operand parsers. This patch shouldn't change the generated output for X86. llvm-svn: 195944	2013-11-29 03:07:54 +00:00
Michael Liao	d617a3015d	Fix PR18054 - Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG lowering where we need to check whether x is a vector type (in-reg type) of i8, i16 or i32; otherwise, that optimization is not valid. llvm-svn: 195779	2013-11-26 20:31:31 +00:00

... 4 5 6 7 8 ...

2993 Commits