llvm-project

Commit Graph

Author	SHA1	Message	Date
Evgeniy Stepanov	ce2e16f00c	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). llvm-svn: 248357	2015-09-23 01:03:51 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Reid Kleckner	5b8a46e771	[WinEH] Make funclet return instrs pseudo instrs This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936	2015-09-17 20:43:47 +00:00
Ahmed Bougacha	5246867384	[CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit. We used to have this magic "hasLoadLinkedStoreConditional()" callback, which really meant two things: - expand cmpxchg (to ll/sc). - expand atomic loads using ll/sc (rather than cmpxchg). Remove it, and, instead, introduce explicit callbacks: - bool shouldExpandAtomicCmpXchgInIR(inst) - AtomicExpansionKind shouldExpandAtomicLoadInIR(inst) Differential Revision: http://reviews.llvm.org/D12557 llvm-svn: 247429	2015-09-11 17:08:28 +00:00
Ahmed Bougacha	9d677131c4	[CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind. This lets us generalize its usage to the other atomic instructions. llvm-svn: 247428	2015-09-11 17:08:17 +00:00
Reid Kleckner	7878391208	[WinEH] Add codegen support for cleanuppad and cleanupret All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219	2015-09-10 00:25:23 +00:00
Igor Breger	0dcd8bcf24	AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750	2015-09-03 09:05:31 +00:00
Igor Breger	1e58e8adf6	AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642	2015-09-02 11:18:55 +00:00
Igor Breger	5ea0a68115	AVX512: ktest implemantation Added tests for encoding. Differential Revision: http://reviews.llvm.org/D11979 llvm-svn: 246439	2015-08-31 13:30:19 +00:00
Igor Breger	f3ded811b2	AVX512: Implemented encoding and intrinsics for vdbpsadbw Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12491 llvm-svn: 246436	2015-08-31 13:09:30 +00:00
Reid Kleckner	0e2882345d	[WinEH] Add some support for code generating catchpad We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235	2015-08-27 23:27:47 +00:00
Michael Kuperstein	6e3fee07f7	[X86] Remove references to _ftol2 As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925	2015-08-25 07:58:33 +00:00
Michael Kuperstein	8515893be8	[X86] Fix fptoui conversions This fixes two issues in x86 fptoui lowering. 1) Makes conversions from f80 go through the right path on AVX-512. 2) Implements an inline sequence for fptoui i64 instead of a library call. This improves performance by 6X on SSE3+ and 3X otherwise. Incidentally, it also removes the use of ftol2 for fptoui, which was wrong to begin with, as ftol2 converts to a signed i64, producing wrong results for values >= 2^63. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D11316 llvm-svn: 245924	2015-08-25 07:42:09 +00:00
Steve King	5cdbd20cc3	Pass function attributes instead of boolean in isIntDivCheap(). llvm-svn: 245921	2015-08-25 02:31:21 +00:00
Michael Kuperstein	9fe42604aa	[X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize There are some cases where the mul sequence is smaller, but for the most part, using a div is preferable. This does not apply to vectors, since x86 doesn't have vector idiv, and a vector mul/shifts sequence ought to be smaller than a scalarized division. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245431	2015-08-19 11:21:43 +00:00
JF Bastien	8662083770	x86 atomic: optimize a.store(reg op a.load(acquire), release) Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations. Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11382 llvm-svn: 244128	2015-08-05 21:04:59 +00:00
Craig Topper	e3dcce9700	De-constify pointers to Type since they can't be modified. NFC This was already done in most places a while ago. This just fixes the ones that crept in over time. llvm-svn: 243842	2015-08-01 22:20:21 +00:00
Sanjay Patel	1dd15598cf	fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold This fix was suggested as part of D11345 and is part of fixing PR24141. With this change, we can avoid walking the uses of a divisor node if the target doesn't want the combineRepeatedFPDivisors transform in the first place. There is no NFC-intended other than that. Differential Revision: http://reviews.llvm.org/D11531 llvm-svn: 243498	2015-07-28 23:05:48 +00:00
Igor Breger	074a64e72c	AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 243122	2015-07-24 17:24:15 +00:00
Chandler Carruth	fe414353db	Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..." This commit broke the build. Numerous build bots broken, and it was blocking my progress so reverting. It should be trivial to reproduce -- enable the BPF backend and it should fail when running llvm-tblgen. llvm-svn: 242992	2015-07-23 08:03:44 +00:00
Igor Breger	da1b2ea955	AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 242990	2015-07-23 07:39:21 +00:00
Asaf Badouh	a5b2e5e2a7	[X86][AVX512] add reduce/range/scalef/rndScale include encoding and intrinsics Differential Revision: http://reviews.llvm.org/D11222 llvm-svn: 242896	2015-07-22 12:00:43 +00:00
Igor Breger	f7fd547e27	AVX512 : Implemented VPMADDUBSW and VPMADDWD instruction , Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11351 llvm-svn: 242761	2015-07-21 07:11:28 +00:00
Elena Demikhovsky	0f370936a0	AVX-512: Added all AVX-512 forms of Vector Convert for Float/Double/Int/Long types. In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch. I temporary removed the old intrinsics test (just to split this patch). Half types are not covered here. Differential Revision: http://reviews.llvm.org/D11134 llvm-svn: 242023	2015-07-13 13:26:20 +00:00
Pat Gavlin	a717f255b6	Allow {e,r}bp as the target of {read,write}_register. This patch allows the read_register and write_register intrinsics to read/write the RBP/EBP registers on X86 iff the targeted register is the frame pointer for the containing function. Differential Revision: http://reviews.llvm.org/D10977 llvm-svn: 241827	2015-07-09 17:40:29 +00:00
Mehdi Amini	eaabc51e78	Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user A documentation for this function would be nice by the way. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241807	2015-07-09 15:12:23 +00:00
Mehdi Amini	0cdec1e2ab	Make isLegalAddressingMode() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778	2015-07-09 02:09:40 +00:00
Mehdi Amini	5c183d5239	Make getByValTypeAlignment() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11038 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241777	2015-07-09 02:09:28 +00:00
Mehdi Amini	9639d650bb	Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776	2015-07-09 02:09:20 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Simon Pilgrim	d85cae3d52	[X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it. As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions. From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level. Differential Revision: http://reviews.llvm.org/D10146 llvm-svn: 241508	2015-07-06 20:46:41 +00:00
Simon Pilgrim	8b756596fc	[X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN opcodes and remove the X86 implementation With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations. This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code. Differential Revision: http://reviews.llvm.org/D10947 llvm-svn: 241506	2015-07-06 20:30:47 +00:00
Asaf Badouh	c6f3c82ffc	[X86][AVX512] Multiply Packed Unsigned Integers with Round and Scale pmulhrsw review: http://reviews.llvm.org/D10948 llvm-svn: 241443	2015-07-06 14:03:40 +00:00
Benjamin Kramer	9bfb627a0e	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Sanjay Patel	e4d95c6c9a	fix typos in comment; NFC llvm-svn: 241174	2015-07-01 17:55:07 +00:00
Asaf Badouh	7ec4b7a8bb	[x86][AVX512] Add vscalef support include encoding and intrinsics review: http://reviews.llvm.org/D10730 llvm-svn: 240906	2015-06-28 14:30:39 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Elena Demikhovsky	5e2f8c4231	AVX-512: Added all forms of VPABS instruction Added all intrinsics, tests for encoding, tests for intrinsics. llvm-svn: 240386	2015-06-23 08:19:46 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Asaf Badouh	81f03c30a5	[AVX512] add instructions: VPAVGB and VPAVGW review http://reviews.llvm.org/D10504 llvm-svn: 240012	2015-06-18 12:30:53 +00:00
Simon Pilgrim	cae7b94cbd	[X86][SSE] Vectorize v2i32 to v2f64 conversions This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions. Differential Revision: http://reviews.llvm.org/D10433 llvm-svn: 239855	2015-06-16 21:40:28 +00:00
Igor Breger	abe4a79b75	AVX-512: Implemented cvtsi2ss/d cvtusi2ss/d instructions with round control for KNL. Added intrinsics for cvtsi2ss/d instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D10430 llvm-svn: 239694	2015-06-14 12:44:55 +00:00
Asaf Badouh	402ebb34af	re-apply 238809 AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. CR: http://reviews.llvm.org/D9991 llvm-svn: 238923	2015-06-03 13:41:48 +00:00
Elena Demikhovsky	9e38086534	AVX-512: Implemented SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions for SKX and KNL. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238917	2015-06-03 10:56:40 +00:00
Simon Pilgrim	452252e6c8	[X86] Removed (unused) FSRL x86 operation This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....). Since the refactoring of the shuffle lowering code this no longer has any use. Differential Revision: http://reviews.llvm.org/D10169 llvm-svn: 238906	2015-06-03 08:32:36 +00:00
Asaf Badouh	8d897dd05f	revert 238809 llvm-svn: 238810	2015-06-02 07:45:19 +00:00
Asaf Badouh	17de10f37e	AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. llvm-svn: 238809	2015-06-02 07:18:14 +00:00
Elena Demikhovsky	3582eb3b39	AVX-512: Implemented VRANGEPD and VRANGEPD instructions for SKX. Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238738	2015-06-01 11:05:34 +00:00
Elena Demikhovsky	42c96d9c0a	AVX-512: Implemented VFIXUPIMMPD and VFIXUPIMMPS instructions for KNL and SKX Implemented DAG lowering for all these forms. Added tests for encoding. by Igor Breger (igor.breger@intel.com) llvm-svn: 238728	2015-06-01 06:50:49 +00:00
Matt Arsenault	bd7d80a4a6	Add address space argument to isLegalAddressingMode This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723	2015-06-01 05:31:59 +00:00
Chandler Carruth	6ba9730a4e	[x86] Implement a faster vector population count based on the PSHUFB in-register LUT technique. Summary: A description of this technique can be found here: http://wm.ite.pl/articles/sse-popcount.html The core of the idea is to use an in-register lookup table and the PSHUFB instruction to compute the population count for the low and high nibbles of each byte, and then to use horizontal sums to aggregate these into vector population counts with wider element types. On x86 there is an instruction that will directly compute the horizontal sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily. Various tricks are used to get vNi32 and vNi16 from the vNi8 that the LUT computes. The base implemantion of this, and most of the work, was done by Bruno in a follow up to D6531. See Bruno's detailed post there for lots of timing information about these changes. I have extended Bruno's patch in the following ways: 0) I committed the new tests with baseline sequences so this shows a diff, and regenerated the tests using the update scripts. 1) Bruno had noticed and mentioned in IRC a redundant mask that I removed. 2) I introduced a particular optimization for the i32 vector cases where we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD + PSADBW to compute doubled high i32 popcounts. This takes advantage of the fact that to line up the high i32 popcounts we have to shift them anyways, and we can shift them by one fewer bit to effectively divide the count by two. While the PSHUFD based horizontal add is no faster, it doesn't require registers or load traffic the way a mask would, and provides more ILP as it happens on different ports with high throughput. 3) I did some code cleanups throughout to simplify the implementation logic. 4) I refactored it to continue to use the parallel bitmath lowering when SSSE3 is not available to preserve the performance of that version on SSE2 targets where it is still much better than scalarizing as we'll still do a bitmath implementation of popcount even in scalar code there. With #1 and #2 above, I analyzed the result in IACA for sandybridge, ivybridge, and haswell. In every case I measured, the throughput is the same or better using the LUT lowering, even v2i64 and v4i64, and even compared with using the native popcnt instruction! The latency of the LUT lowering is often higher than the latency of the scalarized popcnt instruction sequence, but I think those latency measurements are deeply misleading. Keeping the operation fully in the vector unit and having many chances for increased throughput seems much more likely to win. With this, we can lower every integer vector popcount implementation using the LUT strategy if we have SSSE3 or better (and thus have PSHUFB). I've updated the operation lowering to reflect this. This also fixes an issue where we were scalarizing horribly some AVX lowerings. Finally, there are some remaining cleanups. There is duplication between the two techniques in how they perform the horizontal sum once the byte population count is computed. I'm going to factor and merge those two in a separate follow-up commit. Differential Revision: http://reviews.llvm.org/D10084 llvm-svn: 238636	2015-05-30 03:20:59 +00:00
Elena Demikhovsky	ad9c396838	AVX-512: Added VBROADCASTF64X4, VBROADCASTF64X2, VBROADCASTI32X8, and other instructions from this set Added encoding tests. llvm-svn: 237557	2015-05-18 06:42:57 +00:00
Daniel Sanders	d049669546	[x86] Distinguish the 'o', 'v', 'X', and 'i' inline assembly memory constraints. Summary: But still handle them the same way since I don't know how they differ on this target. Of these, 'o' and 'v' are not tested but were already implemented. I'm not sure why 'i' is required for X86 since it's supposed to be an immediate constraint rather than a memory constraint. A test asserts without it so I've included it for now. No functional change intended. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8254 llvm-svn: 237517	2015-05-16 12:09:54 +00:00
Eric Christopher	824f42f209	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Elena Demikhovsky	0d7e9364d1	AVX-512: Added SKX instructions and intrinsics: {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971	2015-05-11 06:05:05 +00:00
Pat Gavlin	cc0431d1c0	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Matthias Braun	d04893fa36	Change getTargetNodeName() to produce compiler warnings for missing cases, fix them llvm-svn: 236775	2015-05-07 21:33:59 +00:00
Elena Demikhovsky	29792e9a80	AVX-512: Added all forms of FP compare instructions for KNL and SKX. Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714	2015-05-07 11:24:42 +00:00
Elena Demikhovsky	60eb9db7bb	AVX-512: added calling convention for i1 vectors in 32-bit mode. Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420	2015-05-04 12:40:50 +00:00
Elena Demikhovsky	52266388f8	AVX-512: added integer "add" and "sub" instructions with saturation for SKX with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418	2015-05-04 12:35:55 +00:00
Sanjay Patel	7024b8121a	[x86] Implement combineRepeatedFPDivisors Set the transform bar at 2 divisions because the fastest current x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle latency (best case) relative to a 5 cycle multiplier. So that's the worst case for this transform (no latency win), but multiplies are obviously pipelined while divisions are not, so there's still a big throughput win which we would expect to show up in typical FP code. These are the sequences I'm comparing: divss %xmm2, %xmm0 mulss %xmm1, %xmm0 divss %xmm2, %xmm0 Becomes: movss LCPI0_0(%rip), %xmm3 ## xmm3 = mem[0],zero,zero,zero divss %xmm2, %xmm3 mulss %xmm3, %xmm0 mulss %xmm1, %xmm0 mulss %xmm3, %xmm0 [Ignore for the moment that we don't optimize the chain of 3 multiplies into 2 independent fmuls followed by 1 dependent fmul...this is the DAG version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that, then the transform becomes even more profitable on all targets.] Differential Revision: http://reviews.llvm.org/D8941 llvm-svn: 235012	2015-04-15 15:22:55 +00:00
Daniel Sanders	bf5b80f5f9	Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC. Summary: This is instead of doing this in target independent code and is the last non-functional change before targets begin to distinguish between different memory constraints when selecting code for the ISD::INLINEASM node. Next, each target will individually move away from the idea that all memory constraints behave like 'm'. Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8173 llvm-svn: 232373	2015-03-16 13:13:41 +00:00
JF Bastien	f14889ee34	Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250	2015-03-04 15:47:57 +00:00
Sanjay Patel	36a2dc895f	remove enum value names from comments; NFC llvm-svn: 231129	2015-03-03 20:58:35 +00:00
Eric Christopher	11e4df73c8	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699	2015-02-26 22:38:43 +00:00
Eric Christopher	23a3a7c871	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583	2015-02-26 00:00:24 +00:00
Elena Demikhovsky	52e81bc499	AVX-512: recommitted 229837 + bugfix + test llvm-svn: 230223	2015-02-23 15:12:31 +00:00
Benjamin Kramer	cd8e4ebadb	Remove dead prototype. llvm-svn: 230137	2015-02-21 14:35:00 +00:00
Eric Christopher	0d94fa98e5	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942	2015-02-20 00:45:28 +00:00
Elena Demikhovsky	69e8b45b13	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. llvm-svn: 229837	2015-02-19 10:48:04 +00:00
Elena Demikhovsky	714f23bcdb	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645	2015-02-18 07:59:20 +00:00
Ahmed Bougacha	e892d13d90	[CodeGen] Add hook/combine to form vector extloads, enabled on X86. The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325	2015-02-05 18:31:02 +00:00
Bruno Cardoso Lopes	ab9ae87623	[X86][MMX] Handle i32->mmx conversion using movd Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293	2015-02-05 13:23:07 +00:00
Eric Christopher	05b819718c	Reuse a bunch of cached subtargets and remove getSubtarget calls without a Function argument. llvm-svn: 227814	2015-02-02 17:38:43 +00:00
Eric Christopher	295596a0a7	Remove the last vestiges of resetOperationActions. llvm-svn: 227648	2015-01-31 00:21:17 +00:00
Elena Demikhovsky	7b0dd39db6	AVX-512: Added FMA intrinsics with rounding mode By Asaf Badouh and Elena Demikhovsky Added special nodes for rounding: FMADD_RND, FMSUB_RND.. It will prevent merge between nodes with rounding and other standard nodes. llvm-svn: 227303	2015-01-28 10:21:27 +00:00
David Majnemer	29c52f7449	X86: Don't make illegal GOTTPOFF relocations "ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF relocation target a movq or addq instruction. Prohibit the truncation of such loads to movl or addl. This fixes PR22083. Differential Revision: http://reviews.llvm.org/D6839 llvm-svn: 225250	2015-01-06 07:12:52 +00:00
Andrea Di Biagio	22ee3f63b9	[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899	2014-12-28 11:07:35 +00:00
Michael Kuperstein	047b1a0400	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle. This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429	2014-12-17 12:32:17 +00:00
Elena Demikhovsky	908dbf48c8	AVX-512: Added all forms of COMPRESS instruction + intrinsics + tests llvm-svn: 224019	2014-12-11 15:02:24 +00:00
Elena Demikhovsky	be8808dc3f	AVX-512: Intrinsics for ERI 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774	2014-11-12 07:31:03 +00:00
Sanjay Patel	e2e589288f	Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385). This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ). For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput. We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate. Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips. More background here: http://llvm.org/bugs/show_bug.cgi?id=21385 Differential Revision: http://reviews.llvm.org/D6175 llvm-svn: 221706	2014-11-11 20:51:00 +00:00
Quentin Colombet	dbe33e7aa4	[X86] Lower VSELECT into SHRUNKBLEND when we shrink the bits used into the condition to match a blend. This prevents optimizations that work on VSELECT to perform invalid transformations. Indeed, the optimized condition does not match the vector boolean content that is expected and bad things may happen. This patch yields the exact same code on the whole test-suite + specs (-O3 and -O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a bug reduced in vselect-avx.ll. <rdar://problem/18819506> llvm-svn: 221429	2014-11-06 02:25:03 +00:00
Ahmed Bougacha	12eb558bd9	[X86] 8bit divrem: Improve codegen for AH register extraction. For 8-bit divrems where the remainder is used, we used to generate: divb %sil shrw $8, %ax movzbl %al, %eax That was to avoid an H-reg access, which is problematic mainly because it isn't possible in REX-prefixed instructions. This patch optimizes that to: divb %sil movzbl %ah, %eax To do that, we explicitly extend AH, and extract the L-subreg in the resulting register. The extension is done using the NOREX variants of MOVZX. To support signed operations, MOVSX_NOREX is also added. Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is then lowered to a sequence containing a single zext (rather than 2). Differential Revision: http://reviews.llvm.org/D6064 llvm-svn: 221176	2014-11-03 20:26:35 +00:00
Sanjay Patel	957efc23bb	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Ahmed Bougacha	5175bcf43a	[X86] Improve mul w/ overflow codegen, to MUL8+SETO. Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516	2014-10-23 21:55:31 +00:00
Eric Christopher	12f4a78581	constify TargetMachine parameter for X86TargetLowering. llvm-svn: 218804	2014-10-01 20:38:22 +00:00
Sanjay Patel	0e4a83e89c	Don't repeat function/variable name in comment. NFC. llvm-svn: 218791	2014-10-01 19:39:32 +00:00
Robin Morisset	810739d174	Lower idempotent RMWs to fence+load Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455	2014-09-25 17:27:43 +00:00
Chandler Carruth	d355369dbb	[x86] Remove the defunct X86ISD::BLENDV entry -- we use vector selects for this now. Should prevent folks from running afoul of this and not knowing why their code won't instruction select the way I just did... llvm-svn: 218436	2014-09-25 01:16:01 +00:00
Chandler Carruth	6d5916a2d7	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300	2014-09-23 10:08:29 +00:00
Chandler Carruth	ed5dfff865	[x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282	2014-09-22 22:29:42 +00:00
Pavel Chupin	be9f12102f	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247	2014-09-22 13:11:35 +00:00
Robin Morisset	25c8e318e4	[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. llvm-svn: 217928	2014-09-17 00:06:58 +00:00
Chandler Carruth	204ad4c613	[x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions by introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). llvm-svn: 217819	2014-09-15 20:09:47 +00:00
Adam Nemet	50b83f0bb8	[AVX512] Add enum for the static rounding types No functional change. This will be used by the new FMA intrinsic lowering code. We can probably add NO_EXC here as well, I am just not too familiar with this part of AVX512 yet. We can add that later. llvm-svn: 215662	2014-08-14 17:13:26 +00:00
Benjamin Kramer	a7c40ef022	Canonicalize header guards into a common format. Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558	2014-08-13 16:26:38 +00:00
Patrik Hagglund	b0e86ec814	[pr19635] Revert most of r170537, and add new testcase. Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190	2014-08-08 08:21:19 +00:00
Adam Nemet	2f10cc699d	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887	2014-08-05 17:22:55 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Akira Hatanaka	e5b6e0d231	[stack protector] Fix a potential security bug in stack protector where the address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967	2014-07-25 19:31:34 +00:00
Chandler Carruth	49a8b10d82	[x86] Based on a long conversation between myself, Jim Grosbach, Hal Finkel, Eric Christopher, and a bunch of other people I'm probably forgetting (sorry), add an option to the x86 backend to widen vectors during type legalization rather than promote them. This still would promote vNi1 vectors to get the masks right, but would widen other vectors. A lot of experiments are piling up right now showing that widening should probably be the default legalization strategy outside of vNi1 cases, but it is very hard to test the rammifications of that and fix bugs in widening-based legalization without an option that enables it. I'll be checking in tests shortly that use this option to exercise cases where widening doesn't work well and hopefully we'll be able to switch fully to this soon. llvm-svn: 212249	2014-07-03 02:11:29 +00:00
Tim Northover	277066ab43	X86: expand atomics in IR instead of as MachineInstrs. The logic for expanding atomics that aren't natively supported in terms of cmpxchg loops is much simpler to express at the IR level. It also allows the normal optimisations and CodeGen improvements to help out with atomics, instead of using a limited set of possible instructions.. rdar://problem/13496295 llvm-svn: 212119	2014-07-01 18:53:31 +00:00
Andrea Di Biagio	53b6830069	[X86] Add support for builtin to read performance monitoring counters. This patch adds support for a new builtin instruction called __builtin_ia32_rdpmc. Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can be used to read performance monitoring counters. It takes as input the index of the performance counter to read, and returns the value of the specified performance counter as a 64-bit number. Calls to this new builtin will map to instruction RDPMC. The index in input to the builtin call is moved to register %ECX. The result of the builtin call is the value of the specified performance counter (RDPMC would return that quantity in registers RDX:RAX). This patch: - Adds builtin int_x86_rdpmc as a GCCBuiltin; - Adds a new x86 DAG node called 'RDPMC_DAG'; - Teaches how to lower this new builtin; - Adds an ISel pattern to select instruction RDPMC; - Fixes the definition of instruction RDPMC adding %RAX and %RDX as implicit definitions, and adding %ECX as implicit use; - Adds a LLVM test to verify that the new builtin is correctly selected. llvm-svn: 212049	2014-06-30 17:14:21 +00:00
Chandler Carruth	8366cebeb5	[x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDW instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332	2014-06-20 01:05:28 +00:00
Tim Northover	7b9f86da5d	Revert "X86: elide comparisons after cmpxchg instructions." This reverts commit r210523. It was committed prematurely without waiting for review. llvm-svn: 210524	2014-06-10 10:50:11 +00:00
Tim Northover	84ad29ca1f	X86: elide comparisons after cmpxchg instructions. The C++ and C semantics of the compare_and_swap operations actually require us to return a boolean "success" value. In LLVM terms this means a second comparison of the output of "cmpxchg" against the input desired value. However, x86's "cmpxchg" instruction sets all flags for the comparison formed, so we can skip any secondary comparison. (N.b. this isn't true for cmpxchg8b/16b, which only set ZF). rdar://problem/13201607 llvm-svn: 210523	2014-06-10 10:49:07 +00:00
Eric Christopher	a08f30bd40	Move all of the x86 subtarget initialized variables down into the x86 subtarget from the x86 target machine. Should be no functional change. llvm-svn: 210479	2014-06-09 17:08:19 +00:00
Filipe Cabecinhas	17254aaead	Implemented LowerVSELECT to custom lower some instructions. No functionality change intended. The types that previously were set to lower as Expand or Legal are doing the same thing with this lowering function. llvm-svn: 209042	2014-05-16 22:47:43 +00:00
Jay Foad	a0653a3e6c	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Hal Finkel	f0e086a0bc	Pass the value type to TLI::getRegisterByName We must validate the value type in TLI::getRegisterByName, because if we don't and the wrong type was used with the IR intrinsic, then we'll assert (because we won't be able to find a valid register class with which to construct the requested copy operation). For PPC64, additionally, the type information is necessary to decide between the 64-bit register and the 32-bit subregister. No functionality change. llvm-svn: 208508	2014-05-11 19:29:07 +00:00
Hal Finkel	b33e9872a0	Add 'override' to getRegisterByName in *ISelLowering.h No functionality change intended. llvm-svn: 208507	2014-05-11 19:28:55 +00:00
Renato Golin	c7aea40ec6	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Reid Kleckner	4a406d32e9	Fix i128 div/mod on mingw64 The Win64 docs are very clear that anything larger than 8 bytes is passed by reference, and GCC MinGW64 honors that for __modti3 and friends. Patch by Jameson Nash! llvm-svn: 208029	2014-05-06 01:20:42 +00:00
Craig Topper	9d74a5a5f1	[C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. llvm-svn: 207511	2014-04-29 07:58:41 +00:00
Craig Topper	e73658ddbb	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Quentin Colombet	ea18933d97	[X86] Implement TargetLowering::getScalingFactorCost hook. Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301	2014-04-26 01:11:26 +00:00
Andrea Di Biagio	d1ab866868	[X86] Add support for Read Time Stamp Counter x86 builtin intrinsics. This patch: - Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and 'int_x86_rdtscp') as GCCBuiltin intrinsics; - Teaches the backend how to lower the two new builtins; - Introduces a common function to lower READCYCLECOUNTER dag nodes and the two new rdtsc/rdtscp intrinsics; - Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll' correctly verifies that both READCYCLECOUNTER and the two new intrinsics work fine for both 64bit and 32bit Subtargets. llvm-svn: 207127	2014-04-24 17:18:27 +00:00
Lang Hames	3067ab2344	[X86] Use tablegen instead of DAG combines to match BZHI instructions, as suggested by Ben Kramer in review of r206738. Thanks again Ben! llvm-svn: 206879	2014-04-22 10:41:56 +00:00
David Blaikie	9027abae53	Change argument order and add explanatory comment to r206130 Changes requested in code review by Eric Christopher of r206130. llvm-svn: 206219	2014-04-14 22:23:06 +00:00
David Blaikie	269e0fb2e4	Fix instruction debug info location during legalization I found this from a particular GDB test suite case of inlining (something similar is provided as a test case) but came across a few other related cases (other callers of the same functions, and one other instance of the same coding mistake in a separate function). I'm not sure what the best way to test this is (let alone to cover the other cases I discovered), so hopefully this sufficies - open to ideas. llvm-svn: 206130	2014-04-13 06:39:55 +00:00
Elena Demikhovsky	cf0b9bafc3	AVX-512: insert element to mask vector; store i1 data Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors; Implemented "store" for i1 type llvm-svn: 205850	2014-04-09 12:37:50 +00:00
Matt Arsenault	cf6f688a40	Add DAG parameter to ComputeNumSignBitsForTargetNode This way, you can check the number of sign bits in the operands. The depth parameter it already has is pretty useless without this. llvm-svn: 205649	2014-04-04 20:13:13 +00:00
Craig Topper	840beec2d0	Make consistent use of MCPhysReg instead of uint16_t throughout the tree. llvm-svn: 205610	2014-04-04 05:16:06 +00:00
Yaron Keren	136fe7db46	isTargetWindows() renamed to isTargetKnownWindowsMSVC() to reflect its current functionality. Based on Takumi NAKAMURA suggestion. llvm-svn: 205338	2014-04-01 18:15:34 +00:00
Craig Topper	26eec09d84	Mark a couple of the X86 target classes as final. Allows the compiler to de-virtualize some internal calls. llvm-svn: 205165	2014-03-31 06:22:15 +00:00
Renato Golin	c0a3c1d66b	Add @llvm.clear_cache builtin Implementing the LLVM part of the call to __builtin___clear_cache which translates into an intrinsic @llvm.clear_cache and is lowered by each target, either to a call to __clear_cache or nothing at all incase the caches are unified. Updating LangRef and adding some tests for the implemented architectures. Other archs will have to implement the method in case this builtin has to be compiled for it, since the default behaviour is to bail unimplemented. A Clang patch is required for the builtin to be lowered into the llvm intrinsic. This will be done next. llvm-svn: 204802	2014-03-26 12:52:28 +00:00
Craig Topper	c6d4efa1e5	Prune includes in X86 target. llvm-svn: 204216	2014-03-19 06:53:25 +00:00
Craig Topper	2d9361e325	[C++11] Add 'override' keyword to virtual methods that override their base class. llvm-svn: 203378	2014-03-09 07:44:38 +00:00
Elena Demikhovsky	9737e3886b	AVX-512: Fixed extract_vector_elt for v8i1 vector llvm-svn: 202624	2014-03-02 09:19:44 +00:00
Craig Topper	73156025e0	Switch all uses of LLVM_OVERRIDE to just use 'override' directly. llvm-svn: 202621	2014-03-02 09:09:27 +00:00
Tim Northover	aeb8e06d4c	X86 CodeGenPrep: sink shufflevectors before shifts On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift is really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655	2014-02-19 10:02:43 +00:00
Elena Demikhovsky	9f423d6f25	AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors. llvm-svn: 201066	2014-02-10 07:02:39 +00:00
Tim Northover	546b57b011	X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918	2014-02-06 09:54:51 +00:00
Matt Arsenault	25793a3f22	Add address space argument to allowsUnalignedMemoryAccess. On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887	2014-02-05 23:15:53 +00:00
Craig Topper	7ee163842f	Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns instead of DAG combine. This weakens the ability to fold loads with them because we aren't able to match patterns that load the same thing twice. But maybe we should fix that if we care. The peephole optimizer will be able to fold some loads in its absense. llvm-svn: 200824	2014-02-05 07:09:40 +00:00
Elena Demikhovsky	a30e437659	AVX-512: Added intrinsic for cvtph2ps. Added VPTESTNM instruction. Added a pattern to vselect (lit tests will follow). llvm-svn: 200823	2014-02-05 07:05:03 +00:00
Juergen Ributzka	659ce00d60	[TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself. Before this patch we used getIntImmCost from TargetTransformInfo to determine if a load of a constant should be converted to just a constant, but the threshold for this was set to an arbitrary value. This value works well for the two targets (X86 and ARM) that implement this target-hook, but it isn't target-independent at all. Now targets have the possibility to decide directly if this optimization should be performed. The default value is set to false to preserve the current behavior. The target hook has been moved to TargetLowering, which removed the last use and need of TargetTransformInfo in SelectionDAG. llvm-svn: 200271	2014-01-28 01:20:14 +00:00
Lang Hames	23de211c5d	Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator loops. Writing back to the accumulator (231-type) allows the coalescer to eliminate an extra copy. llvm-svn: 199933	2014-01-23 20:23:36 +00:00
Elena Demikhovsky	a5d38a39a0	AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions, they give better sequences than VPERMI llvm-svn: 199893	2014-01-23 14:27:26 +00:00
Craig Topper	a448bd868f	Make more of the x86 lowering helper functions static. llvm-svn: 198146	2013-12-29 01:48:38 +00:00
Elena Demikhovsky	64c9548d66	AVX-512: fixed some patterns for MVT::i1 llvm-svn: 197981	2013-12-24 14:24:07 +00:00
Elena Demikhovsky	c5f6726a24	AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482	2013-12-17 08:33:15 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00
Lang Hames	39609996d9	Refactor a lot of patchpoint/stackmap related code to simplify and make it target independent. Most of the x86 specific stackmap/patchpoint handling was necessitated by the use of the native address-mode format for frame index operands. PEI has now been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing us to use a simple, platform independent register/offset pair for frame indexes on stackmap/patchpoints. Notes: - Folding is now platform independent and automatically supported. - Emiting patchpoints with direct memory references now just involves calling the TargetLoweringBase::emitPatchPoint utility method from the target's XXXTargetLowering::EmitInstrWithCustomInserter method. (See X86TargetLowering for an example). - No more ugly platform-specific operand parsers. This patch shouldn't change the generated output for X86. llvm-svn: 195944	2013-11-29 03:07:54 +00:00
Andrew Trick	391dbadb51	StackMap: Implement support for DirectMemRefOp. A Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from IndirectMemRefOp locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can directly communicate the address if an alloca, while IndirectMemRefOp handle register spills. For example: entry: %a = alloca i64... llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a) Since both the alloca and stackmap intrinsic are in the entry block, and the intrinsic takes the address of the alloca, the runtime can assume that LLVM will not substitute alloca with any intervening value. This must be verified by the runtime by checking that the stack map's location is a Direct location type. The runtime can then determine the alloca's relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map. llvm-svn: 195712	2013-11-26 02:03:25 +00:00
Matt Arsenault	b03bd4d96b	Add addrspacecast instruction. Patch by Michele Scandale! llvm-svn: 194760	2013-11-15 01:34:59 +00:00
Juergen Ributzka	87ed906b2e	[Stackmap] Materialize the jump address within the patchpoint noop slide. This patch moves the jump address materialization inside the noop slide. This enables patching of the materialization itself or its complete removal. This patch also adds the ability to define scratch registers that can be used safely by the code called from the patchpoint intrinsic. At least one scratch register is required, because that one is used for the materialization of the jump address. This patch depends on D2009. Differential Revision: http://llvm-reviews.chandlerc.com/D2074 Reviewed by Andy llvm-svn: 194306	2013-11-09 01:51:33 +00:00
Elena Demikhovsky	8952974e29	AVX-512: implemented extractelement with variable index. Added parsing of mask register and "zeroing" semantic, like {%k1} {z}. llvm-svn: 190595	2013-09-12 08:55:00 +00:00
Craig Topper	b25f0f5538	Create BEXTR instructions for (and ((sra or srl) x, imm), (2**size - 1)). Fixes PR17028. llvm-svn: 189742	2013-09-02 07:53:17 +00:00
Craig Topper	0bccad2d43	Teach X86 backend to create BMI2 BZHI instructions from (and X, (add (shl 1, Y), -1)). Fixes PR17038. llvm-svn: 189653	2013-08-30 06:52:21 +00:00
Elena Demikhovsky	980c6b08b1	AVX-512: added extend and truncate instructions. llvm-svn: 189580	2013-08-29 11:56:53 +00:00
Elena Demikhovsky	33d447a2d6	AVX-512: Added SHIFT instructions. llvm-svn: 188899	2013-08-21 09:36:02 +00:00
Craig Topper	e6861c9ce5	Make more of the lowering helpers static. Also use MVT instead of EVT in a couple places. llvm-svn: 188629	2013-08-18 08:53:01 +00:00
Craig Topper	d03748cf5e	Make more helper methods into static functions. llvm-svn: 188366	2013-08-14 07:53:41 +00:00
Craig Topper	d905fded68	Make some helper methods static. llvm-svn: 188364	2013-08-14 07:34:43 +00:00
Elena Demikhovsky	60b1f289f2	AVX-512: Added CMP and BLEND instructions. Lowering for SETCC. llvm-svn: 188265	2013-08-13 13:24:07 +00:00
Elena Demikhovsky	cf5b1458e6	AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions. Added a test for shuffles using VPERM. llvm-svn: 188147	2013-08-11 07:55:09 +00:00
Jakub Staszak	b5ab81d5d0	Fix the comment. llvm-svn: 187984	2013-08-08 15:19:25 +00:00
Elena Demikhovsky	45c54ad8dc	AVX-512 set: Added BROADCAST instructions with lowering logic and a test. llvm-svn: 187884	2013-08-07 12:34:55 +00:00
Tim Northover	a4415854db	Refactor isInTailCallPosition handling This change came about primarily because of two issues in the existing code. Niether of: define i64 @test1(i64 %val) { %in = trunc i64 %val to i32 tail call i32 @ret32(i32 returned %in) ret i64 %val } define i64 @test2(i64 %val) { tail call i32 @ret32(i32 returned undef) ret i32 42 } should be tail calls, and the function sameNoopInput is responsible. The main problem is that it is completely symmetric in the "tail call" and "ret" value, but in reality different things are allowed on each side. For these cases: 1. Any truncation should lead to a larger value being generated by "tail call" than needed by "ret". 2. Undef should only be allowed as a source for ret, not as a result of the call. Along the way I noticed that a mismatch between what this function treats as a valid truncation and what the backends see can lead to invalid calls as well (see x86-32 test case). This patch refactors the code so that instead of being based primarily on values which it recurses into when necessary, it starts by inspecting the type and considers each fundamental slot that the backend will see in turn. For example, given a pathological function that returned {{}, {{}, i32, {}}, i32} we would consider each "real" i32 in turn, and ask if it passes through unchanged. This is much closer to what the backend sees as a result of ComputeValueVTs. Aside from the bug fixes, this eliminates the recursion that's going on and, I believe, makes the bulk of the code significantly easier to understand. The trade-off is the nasty iterators needed to find the real types inside a returned value. llvm-svn: 187787	2013-08-06 09:12:35 +00:00
Elena Demikhovsky	40864b690b	AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types. Added intrinsics and tests. llvm-svn: 187717	2013-08-05 08:52:21 +00:00
Benjamin Kramer	5bc180c14f	X86: Turn fp selects into mask operations. double test(double a, double b, double c, double d) { return a<b ? c : d; } before: _test: ucomisd %xmm0, %xmm1 ja LBB0_2 movaps %xmm3, %xmm2 LBB0_2: movaps %xmm2, %xmm0 after: _test: cmpltsd %xmm1, %xmm0 andpd %xmm0, %xmm2 andnpd %xmm3, %xmm0 orpd %xmm2, %xmm0 Small speedup on Benchmarks/SmallPT llvm-svn: 187706	2013-08-04 12:05:16 +00:00
Elena Demikhovsky	67b05fc0b3	Added INSERT and EXTRACT intructions from AVX-512 ISA. All insertf/extractf functions replaced with insert/extract since we have insertf and inserti forms. Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors. Added lowering for EXTRACT/INSERT subvector for 512-bit vectors. Added a test. llvm-svn: 187491	2013-07-31 11:35:14 +00:00
Stephen Lin	73de7bf5de	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956	2013-07-09 18:16:56 +00:00
Chad Rosier	295bd43adb	The getRegForInlineAsmConstraint function should only accept MVT value types. llvm-svn: 184642	2013-06-22 18:37:38 +00:00
Bill Wendling	8f26840c5a	Don't cache the instruction and register info from the TargetMachine, because the internals of TargetMachine could change. No functionality change intended. llvm-svn: 183571	2013-06-07 21:00:34 +00:00
Andrew Trick	ef9de2a739	Track IR ordering of SelectionDAG nodes 2/4. Change SelectionDAG::getXXXNode() interfaces as well as call sites of these functions to pass in SDLoc instead of DebugLoc. llvm-svn: 182703	2013-05-25 02:42:55 +00:00
Matt Arsenault	75865923c9	Add LLVMContext argument to getSetCCResultType llvm-svn: 182180	2013-05-18 00:21:46 +00:00
Bill Wendling	eb108bad50	Use the target options specified on a function to reset the back-end. During LTO, the target options on functions within the same Module may change. This would necessitate resetting some of the back-end. Do this for X86, because it's a Friday afternoon. llvm-svn: 178917	2013-04-05 21:52:40 +00:00
Michael Liao	a486a11dcf	Add support of RDSEED defined in AVX2 extension llvm-svn: 178314	2013-03-28 23:41:26 +00:00
Michael Liao	03f9ad0e67	Add XTEST codegen support llvm-svn: 178083	2013-03-26 22:47:01 +00:00
Michael Liao	6af16fc3b7	Fix PR10475 - ISD::SHL/SRL/SRA must have either both scalar or both vector operands but TLI.getShiftAmountTy() so far only return scalar type. As a result, backend logic assuming that breaks. - Rename the original TLI.getShiftAmountTy() to TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to return target-specificed scalar type or the same vector type as the 1st operand. - Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar type. llvm-svn: 176364	2013-03-01 18:40:30 +00:00
Eli Bendersky	a1c6635ca3	The operand listing is very much outdated. llvm-svn: 175220	2013-02-14 23:17:03 +00:00
Evan Cheng	0e88c7d897	Teach SDISel to combine fsin / fcos into a fsincos node if the following conditions are met: 1. They share the same operand and are in the same BB. 2. Both outputs are used. 3. The target has a native instruction that maps to ISD::FSINCOS node or the target provides a sincos library call. Implemented the generic optimization in sdisel and enabled it for Mac OSX. Also added an additional optimization for x86_64 Mac OSX by using an alternative entry point __sincos_stret which returns the two results in xmm0 / xmm1. rdar://13087969 PR13204 llvm-svn: 173755	2013-01-29 02:32:37 +00:00
Craig Topper	8fb09f0abb	Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction. llvm-svn: 173667	2013-01-28 06:48:25 +00:00
Craig Topper	2cd375896a	Make helper method static. llvm-svn: 173005	2013-01-21 06:13:28 +00:00
Craig Topper	e65a08be64	Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards. llvm-svn: 172994	2013-01-20 21:34:37 +00:00
Craig Topper	ce61fdf0a3	Make LowerVSETCC a static function and use MVT instead of EVT. llvm-svn: 172969	2013-01-20 09:02:22 +00:00
Craig Topper	9976974cc6	Make some helper methods static. llvm-svn: 172936	2013-01-20 00:50:58 +00:00
Craig Topper	bb772d27a7	Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file. llvm-svn: 172927	2013-01-19 23:14:09 +00:00
Nadav Rotem	977e0be4a0	Efficient lowering of vector sdiv when the divisor is a splatted power of two constant. PR 14848. The lowered sequence is based on the existing sequence the target-independent DAG Combiner creates for the scalar case. Patch by Zvi Rackover. llvm-svn: 171953	2013-01-09 05:14:33 +00:00
Chandler Carruth	664e354de7	Switch TargetTransformInfo from an immutable analysis pass that requires a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis pass, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it always being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. llvm-svn: 171681	2013-01-07 01:37:14 +00:00
Nadav Rotem	e1d5c4b8b9	LoopVectorizer: 1. Add code to estimate register pressure. 2. Add code to select the unroll factor based on register pressure. 3. Add bits to TargetTransformInfo to provide the number of registers. llvm-svn: 171469	2013-01-04 17:48:25 +00:00
Hal Finkel	95de3f3018	Add a subtype parameter to VTTI::getShuffleCost In order to cost subvector insertion and extraction, we need to know the type of the subvector being extracted. No functionality change. llvm-svn: 171453	2013-01-03 02:34:09 +00:00
Nadav Rotem	9785f519b4	CostModel: initial checkin for code that estimates the cost of special shuffles. llvm-svn: 171180	2012-12-28 08:19:03 +00:00
Nadav Rotem	3da9ac72fa	AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend. llvm-svn: 171178	2012-12-28 05:45:24 +00:00
Nadav Rotem	3b34190100	AVX/AVX2: Move the SEXT lowering code from a target specific DAGco to a lowering function. llvm-svn: 171170	2012-12-27 22:47:16 +00:00
Benjamin Kramer	4669d18893	X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics This is very mechanical, no functionality change. Preparation for PR14667. llvm-svn: 170898	2012-12-21 14:04:55 +00:00
Nadav Rotem	eacbb731d3	Add a missing "virtual" keyword. llvm-svn: 170842	2012-12-21 05:02:12 +00:00
Nadav Rotem	6d4fdd6d2c	Improve the X86 cost model for loads and stores. llvm-svn: 170830	2012-12-21 01:33:59 +00:00
Patrik Hagglund	e09cac9a67	Change TargetLowering::getTypeForExtArgOrReturn to take and return MVTs, instead of EVTs. llvm-svn: 170537	2012-12-19 12:02:25 +00:00
Patrik Hagglund	f9eb168ef4	Change TargetLowering::findRepresentativeClass to take an MVT, instead of EVT. llvm-svn: 170532	2012-12-19 11:30:36 +00:00
Craig Topper	f3ff6ae066	Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible. llvm-svn: 170305	2012-12-17 05:12:30 +00:00
Benjamin Kramer	b16ccde7a4	X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. llvm-svn: 170273	2012-12-15 16:47:44 +00:00
Evan Cheng	962711ee71	Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I mention the inline memcpy / memset expansion code is a mess? This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset. The first indicates whether it is expanding a memset or a memcpy / memmove. The later is whether the memset is a memset of zero. It's totally possible (likely even) that targets may want to do different things for memcpy and memset of zero. llvm-svn: 169959	2012-12-12 02:34:41 +00:00
Evan Cheng	c3d1aca657	- Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term. Also added more comments to explain why it is generally ok to return true. - Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to be true for loaded source (memcpy) or zero constants (memset). The poor name choice is probably some kind of legacy issue. llvm-svn: 169954	2012-12-12 01:32:07 +00:00
Evan Cheng	04e5518783	Avoid using lossy load / stores for memcpy / memset expansion. e.g. f64 load / store on non-SSE2 x86 targets. llvm-svn: 169944	2012-12-12 00:42:09 +00:00
Patrik Hagglund	e98b7a0389	Revert EVT->MVT changes, r169836-169851, due to buildbot failures. llvm-svn: 169854	2012-12-11 11:14:33 +00:00
Patrik Hagglund	ad432a8e70	Change TargetLowering::getTypeForExtArgOrReturn to take and return MVTs, instead of EVTs. Accordingly, add bitsLT (and similar) to MVT. llvm-svn: 169850	2012-12-11 10:20:51 +00:00
Patrik Hagglund	8d2e7cf561	Change TargetLowering::findRepresentativeClass to take an MVT, instead of EVT. llvm-svn: 169845	2012-12-11 09:57:18 +00:00
Evan Cheng	79e2ca90bc	Some enhancements for memcpy / memset inline expansion. 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791	2012-12-10 23:21:26 +00:00
Shuxin Yang	95de7c37e2	- Re-enable population count loop idiom recognization - fix a bug which cause sigfault. - add two testing cases which was causing crash llvm-svn: 169687	2012-12-09 03:12:46 +00:00
Chandler Carruth	91e47532fe	Revert the patches adding a popcount loop idiom recognition pass. There are still bugs in this pass, as well as other issues that are being worked on, but the bugs are crashers that occur pretty easily in the wild. Test cases have been sent to the original commit's review thread. This reverts the commits: r169671: Fix a logic error. r169604: Move the popcnt tests to an X86 subdirectory. r168931: Initial commit adding the pass. llvm-svn: 169683	2012-12-08 22:18:29 +00:00
Evan Cheng	9ec512d768	Replace r169459 with something safer. Rather than having computeMaskedBits to understand target implementation of any_extend / extload, just generate zero_extend in place of any_extend for liveouts when the target knows the zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz). rdar://12771555 llvm-svn: 169536	2012-12-06 19:13:27 +00:00
Evan Cheng	5213139f48	Let targets provide hooks that compute known zero and ones for any_extend and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459	2012-12-06 01:28:01 +00:00
Elena Demikhovsky	cd3c1c4a16	Simplified BLEND pattern matching for shuffles. Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2. llvm-svn: 169366	2012-12-05 09:24:57 +00:00
Chandler Carruth	802d755533	Sort includes for all of the .h files under the 'lib' tree. These were missed in the first pass because the script didn't yet handle include guards. Note that the script is now able to handle all of these headers without manual edits. =] llvm-svn: 169224	2012-12-04 07:12:27 +00:00
Shuxin Yang	abcc370423	rdar://12100355 (part 1) This revision attempts to recognize following population-count pattern: while(a) { c++; ... ; a &= a - 1; ... }, where <c> and <a>could be used multiple times in the loop body. TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent instruction sequence, which need to be improved in the following commits. Reviewed by Nadav, really appreciate! llvm-svn: 168931	2012-11-29 19:38:54 +00:00
Craig Topper	dd13d3fda1	Move some helper methods to being static functions in the implementation file. llvm-svn: 167696	2012-11-11 22:45:02 +00:00
Craig Topper	2dfc1a4d24	Removed unimplemented method declaration. llvm-svn: 167670	2012-11-10 09:00:12 +00:00
Craig Topper	f5d527401f	Simplify custom emitter code for pcmp(e/i)str(i/m) and make the helper functions static. llvm-svn: 167669	2012-11-10 08:57:41 +00:00
Craig Topper	9268c94b15	Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support. llvm-svn: 167652	2012-11-10 01:23:36 +00:00
Michael Liao	73cffddb95	Add support of RTM from TSX extension - Add RTM code generation support throught 3 X86 intrinsics: xbegin()/xend() to start/end a transaction region, and xabort() to abort a tranaction region llvm-svn: 167573	2012-11-08 07:28:54 +00:00
Nadav Rotem	0914f0b262	Cost Model: add tables for some avx type-conversion hacks. llvm-svn: 167480	2012-11-06 19:33:53 +00:00
Nadav Rotem	c378a8067d	CostModel: Add tables for the common x86 compares. llvm-svn: 167421	2012-11-05 23:48:20 +00:00
Nadav Rotem	c2345cbe73	X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. llvm-svn: 167347	2012-11-03 00:39:56 +00:00
Nadav Rotem	23848f8f1d	Add a stub for the x86 cost model impl. Implement a basic cost rule for inserting/extracting from XMM registers. llvm-svn: 167333	2012-11-02 23:27:16 +00:00
Michael Liao	e2d7e4e8e5	Clean up redundant SP register maintained in X86 TLI llvm-svn: 167104	2012-10-31 04:14:09 +00:00
Manman Ren	acb8becc73	X86 MMX: optimize transfer from mmx to i32 We used to generate a store (movq) + a load. Now we use movd. rdar://9946746 llvm-svn: 167056	2012-10-30 22:15:38 +00:00
Michael Liao	c03c03d56e	Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32 - Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce DAG combine overhead. - Extend the support to v4i16/v8i16 as well. llvm-svn: 166487	2012-10-23 17:36:08 +00:00
Michael Liao	1be96bb5ce	Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 llvm-svn: 166486	2012-10-23 17:34:00 +00:00
Michael Liao	4b7ccfcaad	Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 - If INSERT_VECTOR_ELT is supported (above SSE2, either by custom sequence of legal insn), transform BUILD_VECTOR into SHUFFLE + INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few (so far 1) elements being inserted. llvm-svn: 166288	2012-10-19 17:15:18 +00:00
Michael Liao	02ca34541e	Support v8f32 to v8i8/vi816 conversion through custom lowering - Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to vector element-wise widening) to reduce DAG combiner and its overhead added in X86 backend. llvm-svn: 166036	2012-10-16 18:14:11 +00:00
Michael Liao	97bf363a9e	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989	2012-10-15 22:39:43 +00:00
Michael Liao	e999b865dd	Add support for FP_ROUND from v2f64 to v2f32 - Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631	2012-10-10 16:53:28 +00:00
Michael Liao	effae0c8e1	Add alternative support for FP_ROUND from v2f32 to v2f64 - Due to the current matching vector elements constraints in ISD::FP_EXTEND, rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPEXT to work around this constraints. This patch also reverts a previous attempt to fix this issue by recovering the scalarized ISD::FP_EXTEND pattern and thus significantly reduces the overhead of supporting non-power-2 vector FP extend. llvm-svn: 165625	2012-10-10 16:32:15 +00:00
Micah Villmow	cdfe20b97f	Move TargetData to DataLayout. llvm-svn: 165402	2012-10-08 16:38:25 +00:00
Michael Liao	de51caf2a0	Add missing i64 max/min/umax/umin on 32-bit target - Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616	2012-09-25 18:08:13 +00:00
Evan Cheng	446ff28df1	Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588	2012-09-25 05:32:34 +00:00
Michael Liao	3237662b65	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Michael Liao	137f8aedea	Add wider vector/integer support for PR12312 - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832	2012-09-13 20:24:54 +00:00
Craig Topper	a29ed865d0	Make a bunch of lowering helper functions static instead of member functions. No functional change. llvm-svn: 163596	2012-09-11 06:15:32 +00:00
Nadav Rotem	178250ad87	When unsafe math is used, we can use commutative FMAX and FMIN. In some cases this allows for better code generation. Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and FMINC, which are commutative. For example: movaps %xmm0, %xmm1 movsd LC(%rip), %xmm0 minsd %xmm1, %xmm0 becomes: minsd LC(%rip), %xmm0 llvm-svn: 162187	2012-08-19 13:06:16 +00:00
Craig Topper	602e1abe0d	Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to reduce to only a single call to it thus allowing it to be inlined by the compiler. llvm-svn: 162088	2012-08-17 06:55:11 +00:00
Michael Liao	34107b9177	fix PR11334 - FP_EXTEND only support extending from vectors with matching elements. This results in the scalarization of extending to v2f64 from v2f32, which will be legalized to v4f32 not matching with v2f64. - add X86-specific VFPEXT supproting extending from v4f32 to v2f64. - add BUILD_VECTOR lowering helper to recover back the original extending from v4f32 to v2f64. - test case is enhanced to include different vector width. llvm-svn: 161894	2012-08-14 21:24:47 +00:00
Craig Topper	a7aaa62d54	Remove the LowerMMXCONCAT_VECTORS function. It could never execute because there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result. llvm-svn: 161742	2012-08-13 01:23:55 +00:00
Craig Topper	ab47fe4e16	Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305. llvm-svn: 161318	2012-08-06 06:22:36 +00:00
Bob Wilson	3e6fa462f3	Fall back to selection DAG isel for calls to builtin functions. Fast isel doesn't currently have support for translating builtin function calls to target instructions. For embedded environments where the library functions are not available, this is a matter of correctness and not just optimization. Most of this patch is just arranging to make the TargetLibraryInfo available in fast isel. <rdar://problem/12008746> llvm-svn: 161232	2012-08-03 04:06:28 +00:00
Elena Demikhovsky	3cb3b0045c	Added FMA functionality to X86 target. llvm-svn: 161110	2012-08-01 12:06:00 +00:00
Bill Wendling	318f03f56f	Remove tabs. llvm-svn: 160479	2012-07-19 00:15:11 +00:00
Evan Cheng	f579beca6d	This is another case where instcombine demanded bits optimization created large immediates. Add dag combine logic to recover in case the large immediates doesn't fit in cmp immediate operand field. int foo(unsigned long l) { return (l>> 47) == 1; } we produce %shr.mask = and i64 %l, -140737488355328 %cmp = icmp eq i64 %shr.mask, 140737488355328 %conv = zext i1 %cmp to i32 ret i32 %conv which codegens to movq $0xffff800000000000,%rax andq %rdi,%rax movq $0x0000800000000000,%rcx cmpq %rcx,%rax sete %al movzbl %al,%eax ret TargetLowering::SimplifySetCC would transform (X & -256) == 256 -> (X >> 8) == 1 if the immediate fails the isLegalICmpImmediate() test. For x86, that's immediates which are not a signed 32-bit immediate. Based on a patch by Eli Friedman. PR10328 rdar://9758774 llvm-svn: 160346	2012-07-17 06:53:39 +00:00
Benjamin Kramer	0ab2794eda	Add intrinsics for Ivy Bridge's rdrand instruction. The rdrand/cmov sequence is the same that is emitted by both GCC and ICC. Fixes PR13284. llvm-svn: 160117	2012-07-12 09:31:43 +00:00
Craig Topper	a54893c662	Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type. llvm-svn: 158279	2012-06-09 17:02:24 +00:00
Hans Wennborg	789acfb63d	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Justin Holewinski	aa58397b3c	Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall to pass around a struct instead of a large set of individual values. This cleans up the interface and allows more information to be added to the struct for future targets without requiring changes to each and every target. NV_CONTRIB llvm-svn: 157479	2012-05-25 16:35:28 +00:00
Benjamin Kramer	913da4b261	X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM, FNSTSW and SAHF instructions. During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! llvm-svn: 155704	2012-04-27 12:07:43 +00:00
Craig Topper	b86fa404d3	Merge vpermps/vpermd and vpermpd/vpermq SD nodes. llvm-svn: 154782	2012-04-16 00:41:45 +00:00
Elena Demikhovsky	779a72b49e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
Richard Smith	3e8f1f6aea	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. llvm-svn: 154705	2012-04-13 22:47:00 +00:00
Nadav Rotem	9bc178ac5c	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Eric Christopher	65ada95b84	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Nadav Rotem	f934f91709	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Evan Cheng	f8bad08001	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Nadav Rotem	b801ca3976	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Rafael Espindola	ba0a6cabb8	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Evan Cheng	65f9d19c4f	Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call. llvm-svn: 151645	2012-02-28 18:51:51 +00:00
Daniel Dunbar	ee7b899343	Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part. llvm-svn: 151630	2012-02-28 15:36:07 +00:00
Evan Cheng	87c7b09d8d	Some ARM implementaions, e.g. A-series, does return stack prediction. That is, the processor keeps a return addresses stack (RAS) which stores the address and the instruction execution state of the instruction after a function-call type branch instruction. Calling a "noreturn" function with normal call instructions (e.g. bl) can corrupt RAS and causes 100% return misprediction so LLVM should use a unconditional branch instead. i.e. mov lr, pc b _foo The "mov lr, pc" is issued in order to get proper backtrace. rdar://8979299 llvm-svn: 151623	2012-02-28 06:42:03 +00:00
NAKAMURA Takumi	bdf94879df	Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff. [Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems: Clang raised a warning, and X86 LowerOperation would assert out for fptoui f64 to i32 because it improperly lowered to an illegal BUILD_PAIR. Here's a patch that addresses these issues. Let me know if any other changes are necessary. Thanks. llvm-svn: 151432	2012-02-25 03:37:25 +00:00
Michael J. Spencer	248d65e78b	Add WIN_FTOL_* psudo-instructions to model the unique calling convention used by the Win32 _ftol2 runtime function. Patch by Joe Groff! llvm-svn: 151382	2012-02-24 19:01:22 +00:00
Craig Topper	760b134ffa	Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified. llvm-svn: 151134	2012-02-22 05:59:10 +00:00
Craig Topper	3e5c04e432	Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel. llvm-svn: 150908	2012-02-19 02:53:47 +00:00
Craig Topper	1d471e31ba	Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies. llvm-svn: 149807	2012-02-05 03:14:49 +00:00
Elena Demikhovsky	fb44980b41	Optimization for SIGN_EXTEND operation on AVX. Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32 extensions. llvm-svn: 149600	2012-02-02 09:10:43 +00:00
Elena Demikhovsky	0e48c70ba7	Optimization for "truncate" operation on AVX. Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles. llvm-svn: 149485	2012-02-01 07:56:44 +00:00
Craig Topper	ca29bcfc10	Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes. llvm-svn: 149216	2012-01-30 01:10:15 +00:00
Craig Topper	0b7ad76bd0	Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching. llvm-svn: 148670	2012-01-22 23:36:02 +00:00
Craig Topper	bd4884371b	Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching. llvm-svn: 148667	2012-01-22 22:42:16 +00:00
Craig Topper	094626414d	Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead. llvm-svn: 148664	2012-01-22 19:15:14 +00:00
Craig Topper	cb3433cd58	Remove unused X86 ISD node type defines. llvm-svn: 148644	2012-01-22 01:15:56 +00:00
Craig Topper	80576e8d1f	Merge 128-bit and 256-bit SHUFPS/SHUFPD handling. llvm-svn: 148466	2012-01-19 08:19:12 +00:00
Victor Umansky	540651cf59	Reverted commit #147601 upon Evan's request. llvm-svn: 147748	2012-01-08 17:20:33 +00:00
Victor Umansky	9255b6d9fe	Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601	2012-01-05 08:46:19 +00:00
Craig Topper	6e54ba7eee	Merge X86 SHUFPS and SHUFPD node types. llvm-svn: 147394	2011-12-31 23:50:21 +00:00
Chandler Carruth	7e9453e916	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Craig Topper	a913dde0ef	Remove an unused X86ISD node type. llvm-svn: 146833	2011-12-17 19:16:44 +00:00
Craig Topper	a4d411cb1b	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. llvm-svn: 146726	2011-12-16 08:06:31 +00:00
Craig Topper	1fdfec63a4	Remove some remants of the old palign pattern fragment that were still hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast. llvm-svn: 146344	2011-12-11 19:12:35 +00:00
Craig Topper	8d4ba198d6	Merge floating point and integer UNPCK X86ISD node types. llvm-svn: 145926	2011-12-06 08:21:25 +00:00
Craig Topper	0a672eaf9e	Merge VPERM2F128/VPERM2I128 ISD node types. llvm-svn: 145485	2011-11-30 07:47:51 +00:00
Craig Topper	bafd224c8b	Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. llvm-svn: 145483	2011-11-30 06:25:25 +00:00
Craig Topper	818a983e93	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Craig Topper	51280d565b	Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. llvm-svn: 145153	2011-11-26 22:55:48 +00:00
Craig Topper	7704bd7ac3	Collapse X86ISD node types for PUNPCKH, PUNPCKL, UNPCKLP, and UNPCKHP to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type. llvm-svn: 145148	2011-11-26 20:47:44 +00:00
Craig Topper	d65a444478	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Craig Topper	d26466748b	Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish. llvm-svn: 145125	2011-11-24 22:20:08 +00:00
Craig Topper	6270d072c5	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	669199ca94	Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled. llvm-svn: 145026	2011-11-21 06:57:39 +00:00
Craig Topper	f984efbfce	Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors. llvm-svn: 144989	2011-11-19 09:02:40 +00:00
Craig Topper	81390be00f	Collapse X86 PSIGNB/PSIGNW/PSIGND node types. llvm-svn: 144988	2011-11-19 07:33:10 +00:00
Lang Hames	58dba012b6	Rename NonScalarIntSafe to something more appropriate. llvm-svn: 143080	2011-10-26 23:50:43 +00:00
Craig Topper	039a79067a	Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code. llvm-svn: 142642	2011-10-21 06:55:01 +00:00
Craig Topper	965de2c197	Add X86 ANDN instruction. Including instruction selection. llvm-svn: 141947	2011-10-14 07:06:56 +00:00
Duncan Sands	0e4fcb8e3b	Synthesize SSE3/AVX 128 bit horizontal add/sub instructions from floating point add/sub of appropriate shuffle vectors. Does not synthesize the 256 bit AVX versions because they work differently. llvm-svn: 140332	2011-09-22 20:15:48 +00:00
Nadav Rotem	b873b18721	CR fixes per Bruno's request. Undo the changes from r139285 which added custom lowering to vselect. Add tablegen lowering for vselect. llvm-svn: 139479	2011-09-11 15:02:23 +00:00
Nadav Rotem	de838daefd	Implement vector-select support for avx256. Refactor the vblend implementation to have tablegen match the instruction by the node type llvm-svn: 139400	2011-09-09 20:29:17 +00:00
Nadav Rotem	2550ba2a27	Add X86-SSE4 codegen support for vector-select. llvm-svn: 139285	2011-09-08 08:11:19 +00:00
Rafael Espindola	9d96c94278	Fix comment. Noticed by Duncan. llvm-svn: 139161	2011-09-06 19:29:31 +00:00
Duncan Sands	f2641e1bc1	Add codegen support for vector select (in the IR this means a select with a vector condition); such selects become VSELECT codegen nodes. This patch also removes VSETCC codegen nodes, unifying them with SETCC nodes (codegen was actually often using SETCC for vector SETCC already). This ensures that various DAG combiner optimizations kick in for vector comparisons. Passes dragonegg bootstrap with no testsuite regressions (nightly testsuite as well as "make check-all"). Patch mostly by Nadav Rotem. llvm-svn: 139159	2011-09-06 19:07:46 +00:00
Duncan Sands	a098436b32	Split the init.trampoline intrinsic, which currently combines GCC's init.trampoline and adjust.trampoline intrinsics, into two intrinsics like in GCC. While having one combined intrinsic is tempting, it is not natural because typically the trampoline initialization needs to be done in one function, and the result of adjust trampoline is needed in a different (nested) function. To get around this llvm-gcc hacks the nested function lowering code to insert an additional parent variable holding the adjust.trampoline result that can be accessed from the child function. Dragonegg doesn't have the luxury of tweaking GCC code, so it stored the result of adjust.trampoline in the memory GCC set aside for the trampoline itself (this is always available in the child function), and set up some new memory (using an alloca) to hold the trampoline. Unfortunately this breaks Go which allocates trampoline memory on the heap and wants to use it even after the parent has exited (!). Rather than doing even more hacks to get Go working, it seemed best to just use two intrinsics like in GCC. Patch mostly by Sanjoy Das. llvm-svn: 139140	2011-09-06 13:37:06 +00:00
Rafael Espindola	94d3253626	Adds support for variable sized allocas. For a variable sized alloca, code is inserted to first check if the current stacklet has enough space. If so, space is allocated by simply decrementing the stack pointer. Otherwise a runtime routine (__morestack_allocate_stack_space in libgcc) is called which allocates the required memory from the heap. Patch by Sanjoy Das. llvm-svn: 138818	2011-08-30 19:47:04 +00:00
Rafael Espindola	3353017668	Adds a SelectionDAG node X86SegAlloca which will be custom lowered from DYNAMIC_STACKALLOC. Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which will match X86SegAlloca (based on word size) are also added. They will be custom emitted to inject the actual stack handling code. Patch by Sanjoy Das. llvm-svn: 138814	2011-08-30 19:43:21 +00:00
Eli Friedman	5e5704277f	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. llvm-svn: 138660	2011-08-26 21:21:21 +00:00
Craig Topper	de92622aa5	Break 256-bit vector int add/sub/mul into two 128-bit operations to avoid costly scalarization. Fixes PR10711. llvm-svn: 138427	2011-08-24 06:14:18 +00:00
Bruno Cardoso Lopes	be5e987379	Introduce matching patterns for vbroadcast AVX instruction. The idea is to match splats in the form (splat (scalar_to_vector (load ...))) whenever the load can be folded. All the logic and instruction emission is working but because of PR8156, there are no ways to match loads, cause they can never be folded for splats. Thus, the tests are XFAILed, but I've tested and exercised all the logic using a relaxed version for checking the foldable loads, as if the bug was already fixed. This should work out of the box once PR8156 gets fixed since MayFoldLoad will work as expected. llvm-svn: 137810	2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes	f15dfe5818	The VPERM2F128 is a AVX instruction which permutes between two 256-bit vectors. It operates on 128-bit elements instead of regular scalar types. Recognize shuffles that are suitable for VPERM2F128 and teach the x86 legalizer how to handle them. llvm-svn: 137519	2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes	6aee388423	Cleanup PALIGNR handling and remove the old palign pattern fragment. Also make PALIGNR masks to don't match 256-bits, which isn't supported It's also a step to solve PR10489 llvm-svn: 136448	2011-07-29 01:30:59 +00:00
Eli Friedman	26a484852e	Code generation for 'fence' instruction. llvm-svn: 136283	2011-07-27 22:21:52 +00:00
Bruno Cardoso Lopes	27a30a7792	The vpermilps and vpermilpd have different behaviour regarding the usage of the shuffle bitmask. Both work in 128-bit lanes without crossing, but in the former the mask of the high part is the same used by the low part while in the later both lanes have independent masks. Handle this properly and and add support for vpermilpd. llvm-svn: 136200	2011-07-27 00:56:34 +00:00
Bruno Cardoso Lopes	f8fe47bd2b	Recognize unpckh* masks and match 256-bit versions. The new versions are different from the previous 128-bit because they work in lanes. Update a few comments and add testcases llvm-svn: 136157	2011-07-26 22:03:40 +00:00
Bruno Cardoso Lopes	d77b383199	More movsldup/movshdup cleanup. Rewrite the mask matching function and add support for 256-bit versions (but no instruction selection yet, coming next). llvm-svn: 136050	2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes	612e56174b	-Inspected a AVX code block added by someone in early Feb. This was never used and was actually very wrong, fix it and make it simpler. Also remove the ConcatVectors function, which is unused now. - Fix a introduction of useless nodes in r126664 and r126264. The VUNPCKL* should never be introduced cause we don't want duplicate nodes for 128 AVX and non-AVX modes, the actual instruction difference only exists during isel, but not for target specific DAG nodes. We only introduce V* target nodes when there is no 128-bit version already there. - Fix a fragile test and make it more useful. llvm-svn: 135729	2011-07-22 00:15:07 +00:00
Bruno Cardoso Lopes	b878caa5e2	Add support for 256-bit versions of VPERMIL instruction. This is a new instruction introduced in AVX, which can operate on 128 and 256-bit vectors. It considers a 256-bit vector as two independent 128-bit lanes. It can permute any 32 or 64 elements inside a lane, and restricts the second lane to have the same permutation of the first one. With the improved splat support introduced early today, adding codegen for this instruction enable more efficient 256-bit code: Instead of: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vextractf128 $1, %ymm0, %xmm1 shufps $1, %xmm1, %xmm1 movss %xmm1, 28(%rsp) movss %xmm1, 24(%rsp) movss %xmm1, 20(%rsp) movss %xmm1, 16(%rsp) vextractf128 $0, %ymm0, %xmm0 shufps $1, %xmm0, %xmm0 movss %xmm0, 12(%rsp) movss %xmm0, 8(%rsp) movss %xmm0, 4(%rsp) movss %xmm0, (%rsp) vmovaps (%rsp), %ymm0 We get: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vpermilps $85, %ymm0, %ymm0 llvm-svn: 135662	2011-07-21 01:55:47 +00:00
Chris Lattner	229907cd11	land David Blaikie's patch to de-constify Type, with a few tweaks. llvm-svn: 135375	2011-07-18 04:54:35 +00:00
Nadav Rotem	771f29677f	[VECTOR-SELECT] During type legalization we often use the SIGN_EXTEND_INREG SDNode. When this SDNode is legalized during the LegalizeVector phase, it is scalarized because non-simple types are automatically marked to be expanded. In this patch we add support for lowering SIGN_EXTEND_INREG manually. This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements' flag. llvm-svn: 135144	2011-07-14 11:11:14 +00:00
Bruno Cardoso Lopes	7ba479d22f	The target specific node PANDN name is misleading. That happens because it's later selected to a ANDNPD/ANDNPS instruction instead of the PANDN instruction. Rename it. llvm-svn: 135087	2011-07-13 21:36:47 +00:00
Eric Christopher	7e5f2350d3	Use getRegForInlineAsmConstraint instead of custom defining regclasses via vectors. Part of rdar://9643582 llvm-svn: 134079	2011-06-29 17:23:50 +00:00
Evan Cheng	3a0c5e52ff	Remove TargetOptions.h dependency from X86Subtarget. llvm-svn: 133726	2011-06-23 17:54:54 +00:00
Eric Christopher	0713a9d8fc	Add a parameter to CCState so that it can access the MachineFunction. No functional change. Part of PR6965 llvm-svn: 132763	2011-06-08 23:55:35 +00:00
Stuart Hastings	e0d3426e1a	Followup to 132458, omit unnecessary stack copy when x87 input is a load. rdar://problem/6373334 llvm-svn: 132696	2011-06-06 23:15:58 +00:00
Stuart Hastings	be605494ac	Reapply 132424 with fixes. This fixes PR10068. rdar://problem/5993888 llvm-svn: 132606	2011-06-03 23:53:54 +00:00
Eric Christopher	de9399bf76	Have LowerOperandForConstraint handle multiple character constraints. Part of rdar://9119939 llvm-svn: 132510	2011-06-02 23:16:42 +00:00
Rafael Espindola	aa318ae495	Revert 132424 to fix PR10068. llvm-svn: 132479	2011-06-02 19:57:47 +00:00
Stuart Hastings	7adc95f69e	Recommit 132404 with fixes. rdar://problem/5993888 llvm-svn: 132424	2011-06-01 21:33:14 +00:00
Stuart Hastings	aab130d995	Revert 132404 to appease a buildbot. rdar://problem/5993888 llvm-svn: 132419	2011-06-01 19:52:20 +00:00
Stuart Hastings	7b7c102f2c	Add support for x86 CMPEQSS and friends. These instructions do a floating-point comparison, generate a mask of 0s or 1s, and generally DTRT with NaNs. Only profitable when the user wants a materialized 0 or 1 at runtime. rdar://problem/5993888 llvm-svn: 132404	2011-06-01 17:17:45 +00:00
Stuart Hastings	9f20804216	FGETSIGN support for x86, using movmskps/pd. Will be enabled with a patch to TargetLowering.cpp. rdar://problem/5660695 llvm-svn: 132388	2011-06-01 04:39:42 +00:00
Stuart Hastings	493a12bf5e	Reverting 132105: it broke some LLVM-GCC DejaGNU tests. llvm-svn: 132108	2011-05-26 04:09:49 +00:00
Stuart Hastings	276f231c2f	Correctly handle a one-word struct passed byval on x86_64. rdar://problem/6920088 llvm-svn: 132105	2011-05-26 02:44:56 +00:00
Eli Friedman	d000a2c26e	Clean up the mess created by r131467+r131469. llvm-svn: 131471	2011-05-17 18:02:22 +00:00
Stuart Hastings	c65d8eda7b	Revert 131467 due to buildbot complaint. llvm-svn: 131469	2011-05-17 16:59:46 +00:00
Stuart Hastings	3cf5308890	Fix an obscure issue in X86_64 parameter passing: if a tiny byval is passed as the fifth parameter, insure it's passed correctly (in R9). rdar://problem/6920088 llvm-svn: 131467	2011-05-17 16:45:55 +00:00
Nadav Rotem	8f971c27fb	Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector. llvm-svn: 131179	2011-05-11 08:12:09 +00:00
Eli Friedman	2518f8376d	Make the logic for determining function alignment more explicit. No functionality change. llvm-svn: 131012	2011-05-06 20:34:06 +00:00
Evan Cheng	0663f23bd8	Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified. llvm-svn: 127981	2011-03-21 01:19:09 +00:00
Daniel Dunbar	327cd36f74	Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR", it broke a lot of things. llvm-svn: 127954	2011-03-19 21:47:14 +00:00
Evan Cheng	824a711305	SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953	2011-03-19 17:17:39 +00:00
Cameron Zwarich	2ef0c69df1	Move more logic into getTypeForExtArgOrReturn. llvm-svn: 127809	2011-03-17 14:53:37 +00:00
Cameron Zwarich	34e7b3f77e	Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn(). llvm-svn: 127807	2011-03-17 14:21:56 +00:00
Cameron Zwarich	ac106273d4	The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte rather than an int. Thankfully, this only causes LLVM to miss optimizations, not generate incorrect code. This just fixes the zext at the return. We still insert an i32 ZextAssert when reading a function's arguments, but it is followed by a truncate and another i8 ZextAssert so it is not optimized. llvm-svn: 127766	2011-03-16 22:20:18 +00:00
Cameron Zwarich	df61694417	Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo. llvm-svn: 127175	2011-03-07 21:56:36 +00:00
Owen Anderson	b2c80da4ae	Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS. llvm-svn: 126518	2011-02-25 21:41:48 +00:00
David Greene	9a6040dc86	[AVX] General VUNPCKL codegen support. llvm-svn: 126264	2011-02-22 23:31:46 +00:00
David Greene	653f1eed2d	[AVX] Support VSINSERTF128 with more patterns and appropriate infrastructure. This makes lowering 256-bit vectors to 128-bit vectors simple when 256-bit vector support is not available. llvm-svn: 124868	2011-02-04 16:08:29 +00:00
David Greene	c4da110fd2	[AVX] VEXTRACTF128 support. This commit includes patterns for matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines to examine and translate index values. VINSERTF128 comes next. With these two in place we can begin supporting more AVX operations as INSERT/EXTRACT can be used as a fallback when 256-bit support is not available. llvm-svn: 124797	2011-02-03 15:50:00 +00:00
David Greene	bab5e6ed0e	[AVX] Add INSERT_SUBVECTOR and support it on x86. This provides a default implementation for x86, going through the stack in a similr fashion to how the codegen implements BUILD_VECTOR. Eventually this will get matched to VINSERTF128 if AVX is available. llvm-svn: 124307	2011-01-26 19:13:22 +00:00
David Greene	b6f1611928	[AVX] Support EXTRACT_SUBVECTOR on x86. This provides a default implementation of EXTRACT_SUBVECTOR for x86, going through the stack in a similr fashion to how the codegen implements BUILD_VECTOR. Eventually this will get matched to VEXTRACTF128 if AVX is available. llvm-svn: 124292	2011-01-26 15:38:49 +00:00
Nate Begeman	4b9db07b02	Implement feedback from Bruno on making pblendvb an x86-specific ISD node in addition to being an intrinsic, and convert lowering to use it. Hopefully the pattern fragment is doing the right thing with XMM0, looks correct in testing. llvm-svn: 122277	2010-12-20 22:04:24 +00:00

... 5 6 7 8 9 ...

1025 Commits