llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	63e2cd6caa	[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005	2017-01-14 07:50:52 +00:00
Craig Topper	09b7e0f01d	[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004	2017-01-14 07:29:24 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Craig Topper	1ec84c2a18	[AVX-512] Remove unmasked BLENDM instructions from the wrong load folding table. The unmasked versions read memory from operand 2, but were in the operand 3 table. These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch. llvm-svn: 291885	2017-01-13 07:28:56 +00:00
Craig Topper	46b6ecf41e	[X86] Move some entries in the load folding tables to move appropriate grouping. NFC llvm-svn: 291884	2017-01-13 07:28:53 +00:00
Craig Topper	6393afce97	[AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415	2017-01-09 02:44:34 +00:00
Craig Topper	42b848a683	[X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size. llvm-svn: 291338	2017-01-07 06:56:54 +00:00
Craig Topper	e77e901130	[AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables. llvm-svn: 290591	2016-12-27 06:51:09 +00:00
Michael LeMay	20565e2aa1	[TargetInstrInfo] replace redundant expression in getMemOpBaseRegImmOfs Summary: The expression for computing the return value of getMemOpBaseRegImmOfs has only one possible value. The other value would result in a return earlier in the function. This patch replaces the expression with its only possible value. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27437 llvm-svn: 290133	2016-12-19 21:02:41 +00:00
Craig Topper	add9cc697a	[AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057	2016-12-18 06:23:14 +00:00
Simon Pilgrim	7522f54feb	[X86][SSE] Fix domains for scalar store instructions As discussed on D27692 llvm-svn: 289834	2016-12-15 17:09:24 +00:00
Simon Pilgrim	ba46422694	[X86][AVX512] Moved instruction domain lookups to the right table. NFCI. Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830	2016-12-15 16:38:51 +00:00
Simon Pilgrim	d7518896ff	[X86][SSE] Fix domains for VZEXT_LOAD type instructions Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825	2016-12-15 16:05:29 +00:00
Philip Reames	1f1bbac8da	[peephole] Enhance folding logic to work for STATEPOINTs The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510	2016-12-13 01:38:41 +00:00
Craig Topper	081c0e2864	[X86] Remove some intrinsic instructions from hasPartialRegUpdate Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419	2016-12-12 05:07:17 +00:00
Craig Topper	c4f2b0996d	[X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables. llvm-svn: 289186	2016-12-09 05:20:11 +00:00
Craig Topper	2aeb456425	[AVX-512] Add vpermilps/pd to load folding tables. llvm-svn: 289173	2016-12-09 02:18:11 +00:00
Michael Kuperstein	18092cf2c3	[X86] Do not assume "ri" instructions always have an immediate operand The second operand of an "ri" instruction may be an immediate, but it may also be a globalvariable, so we should make any assumptions. This fixes PR31271. Differential Revision: https://reviews.llvm.org/D27481 llvm-svn: 288964	2016-12-07 19:29:18 +00:00
Craig Topper	6413f8a8f2	[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771	2016-12-06 04:58:39 +00:00
Michael Kuperstein	e3036abcf9	[X86] Fix non-intrinsic roundss/roundsd to not read the destination register This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703	2016-12-05 20:57:37 +00:00
Craig Topper	9d16bfa0f5	[AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table. llvm-svn: 288591	2016-12-03 19:37:39 +00:00
Craig Topper	c210827b53	[AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables. llvm-svn: 288587	2016-12-03 17:19:15 +00:00
Craig Topper	4961fa9bba	[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables. llvm-svn: 288484	2016-12-02 07:57:11 +00:00
Craig Topper	17ddb521ef	[AVX-512] Add EVEX PSHUFB instructions to load folding tables. llvm-svn: 288482	2016-12-02 07:06:30 +00:00
Craig Topper	f7866fad54	[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables. llvm-svn: 288481	2016-12-02 06:24:38 +00:00
Matthias Braun	115efcd3d1	MachineScheduler: Export function to construct "default" scheduler. This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load\|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057	2016-11-28 20:11:54 +00:00
Craig Topper	ff9d45875a	[X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions. llvm-svn: 288009	2016-11-27 21:37:00 +00:00
Craig Topper	3674f44e40	[X86] Add SHL by 1 to the load folding tables. I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships. llvm-svn: 288007	2016-11-27 21:36:54 +00:00
Craig Topper	4fab487265	[AVX-512] Add integer and fp unpck instructions to load folding tables. llvm-svn: 288004	2016-11-27 19:51:41 +00:00
Craig Topper	7ad961cc70	[X86] Add TB_NO_REVERSE to entries in the load folding table where the instruction's load size is smaller than the register size. If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist. I probably missed some instructions, but this should be a large portion of them. llvm-svn: 288001	2016-11-27 18:51:13 +00:00
Craig Topper	c3b3926f8b	[AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287995	2016-11-27 08:55:31 +00:00
Craig Topper	fb64a25ba1	[X86] Remove alignment restrictions from load folding table for some instructions that don't have a restriction. Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load. llvm-svn: 287991	2016-11-27 01:52:51 +00:00
Craig Topper	10d5eec1a1	[AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287975	2016-11-26 08:21:52 +00:00
Craig Topper	97169ea5f9	[AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables. llvm-svn: 287974	2016-11-26 08:21:48 +00:00
Craig Topper	53b33de1e3	[AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables. llvm-svn: 287972	2016-11-26 07:21:00 +00:00
Craig Topper	39265bb1ce	[AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables. llvm-svn: 287970	2016-11-26 07:20:53 +00:00
Craig Topper	516fd7abfe	[X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding tables for consistency. Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete. llvm-svn: 287960	2016-11-26 02:13:58 +00:00
Craig Topper	a363d42973	[AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC llvm-svn: 287956	2016-11-25 23:21:34 +00:00
Craig Topper	1e48829747	[AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables. llvm-svn: 287937	2016-11-25 16:33:53 +00:00
Michael Kuperstein	47eb85a003	[X86] Allow folding of stack reloads when loading a subreg of the spilled reg We did not support subregs in InlineSpiller:foldMemoryOperand() because targets may not deal with them correctly. This adds a target hook to let the spiller know that a target can handle subregs, and actually enables it for x86 for the case of stack slot reloads. This fixes PR30832. Differential Revision: https://reviews.llvm.org/D26521 llvm-svn: 287792	2016-11-23 18:33:49 +00:00
Craig Topper	3dcf45f08d	[X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version. I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX. There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table. llvm-svn: 287622	2016-11-22 05:31:43 +00:00
Craig Topper	cada9f2275	[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621	2016-11-22 04:57:34 +00:00
Michael Zuckerman	8462faeaba	Fixing a small typo (A->U). This seem to fixes PR30992. - HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX + HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX llvm-svn: 287532	2016-11-21 11:52:11 +00:00
Craig Topper	9f2d632ee7	[AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX. llvm-svn: 287523	2016-11-21 07:51:31 +00:00
Sanjay Patel	7f3d51f840	[x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of compilers, but logically equivalent int, float, and double variants of bitwise-logic instructions are reality in x86, and the float variant may be a shorter instruction depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all the time. This is a preliminary step towards solving PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 Differential Revision: https://reviews.llvm.org/D26712 llvm-svn: 287122	2016-11-16 17:42:40 +00:00
Craig Topper	b8596e4d1d	[X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787	2016-11-14 01:53:29 +00:00
Peter Collingbourne	32ab3a817d	Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420	2016-11-09 23:53:43 +00:00
Zvi Rackover	85bc64c734	[X86] Broadcast from memory intructions aren't unfoldable Broadcast from memory instructions should be treated as moves. They can't be unfolded. Fixes pr30693. llvm-svn: 285998	2016-11-04 15:15:19 +00:00
Craig Topper	b7781a95fd	[X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the legacy intrinsics to select EVEX encoded instructions when available. This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type. llvm-svn: 285515	2016-10-30 06:56:16 +00:00
Craig Topper	defe9ffbb5	[X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy intrinsics can select EVEX encoded instructions when available. This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated. llvm-svn: 285501	2016-10-29 18:41:45 +00:00
Peter Collingbourne	5c924d7117	Target: Remove unused entities. llvm-svn: 283690	2016-10-09 04:38:57 +00:00
Craig Topper	e30cb00dc0	[AVX-512] Add subvector insert and extract to load/store folding tables. llvm-svn: 283689	2016-10-09 03:54:13 +00:00
Craig Topper	4262d53024	[AVX-512] Add the vector down convert instructions to the store folding tables. llvm-svn: 283687	2016-10-09 03:54:05 +00:00
Simon Pilgrim	a5d019ee95	[X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS commutation MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors. This patch inserts a vreg copy mi to ensure that the registers are correct. Fix for PR30607 Differential Revision: https://reviews.llvm.org/D25280 llvm-svn: 283539	2016-10-07 11:18:38 +00:00
Hans Wennborg	c26c03d911	Revert r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)" This is suspected to cause a miscompile in Chromium. Reverting while investigating. llvm-svn: 283329	2016-10-05 15:39:27 +00:00
Craig Topper	ee2d995661	[X86] Add MOV8rm_NOREX to switch in isReallyTriviallyReMaterializable to match MOV8rm. llvm-svn: 283184	2016-10-04 03:11:44 +00:00
Craig Topper	4e7b888ea4	[X86] Mark all sizes of (V)MOVUPD as trivially rematerializable. I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files. llvm-svn: 283083	2016-10-03 02:00:29 +00:00
Simon Pilgrim	ccdd1ff49b	[X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements. We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets llvm-svn: 283037	2016-10-01 14:26:11 +00:00
Mehdi Amini	117296c0a0	Use StringRef in Pass/PassManager APIs (NFC) llvm-svn: 283004	2016-10-01 02:56:57 +00:00
Hans Wennborg	b5643b47b6	X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302) We can't use Jcc to leave a Win64 function in general, because that confuses the unwinder. However, for "leaf" functions, that is, functions where the return address is always on top of the stack and which don't have unwind info, it's OK. Differential Revision: https://reviews.llvm.org/D24836 llvm-svn: 282920	2016-09-30 20:07:35 +00:00
Craig Topper	1c01cbe9ee	[AVX-512] Add the special stack spilling pseudos for XMM16-31 and YMM16-31 without VLX to teh isFrameLoadOpcode and isFrameStoreOpcode. llvm-svn: 282842	2016-09-30 05:35:45 +00:00
Craig Topper	d875d6b9b4	[AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available. This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used. Fixes one of the cases from PR29112. llvm-svn: 282690	2016-09-29 06:07:09 +00:00
Craig Topper	e7f2611160	[X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table. llvm-svn: 282687	2016-09-29 05:54:39 +00:00
Craig Topper	816a1d7783	[X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables. llvm-svn: 282684	2016-09-29 05:54:28 +00:00
Craig Topper	789888002a	[X86] Use std::max to calculate alignment instead of assuming RC->getSize() will not return a value greater than 32. I think it theoretically could be 64 for AVX-512. llvm-svn: 282471	2016-09-27 06:44:25 +00:00
Craig Topper	0cc188d979	[AVX-512] Replace get512BitSuperRegister with calls to TargetRegisterInfo::getMatchingSuperReg. llvm-svn: 282359	2016-09-25 16:34:06 +00:00
Craig Topper	3c9faa32c1	[AVX-512] Add rounding versions of instructions to hasUndefRegUpdate. llvm-svn: 282357	2016-09-25 16:33:59 +00:00
Craig Topper	d8b2bd492c	[AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate. llvm-svn: 282356	2016-09-25 16:33:57 +00:00
Craig Topper	ac941b9736	[AVX-512] Remove duplicate instructions for converting integer to scalar floating point. We can use patterns to point to the other instructions instead. llvm-svn: 282355	2016-09-25 16:33:53 +00:00
Craig Topper	202b453a8a	[AVX-512] Add support for commuting VPTERNLOG instructions. VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector. We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value. This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction. llvm-svn: 282132	2016-09-22 03:00:50 +00:00
Craig Topper	67882bd94e	[AVX-512] Teach X86InstrInfo::copyPhysReg to use a 512-bit move if XMM16-XMM31 or YMM16-YMM31 are the source or dest of the copy and VLX is not supported. This can happen with SUBREG_TO_REG of ZMM16-ZMM31. Fixes PR30430. llvm-svn: 281959	2016-09-20 06:49:17 +00:00
Matt Arsenault	1b9fc8ed65	Finish renaming remaining analyzeBranch functions llvm-svn: 281535	2016-09-14 20:43:16 +00:00
Matt Arsenault	e8e0f5cac6	Make analyzeBranch family of instruction names consistent analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506	2016-09-14 17:24:15 +00:00
Matt Arsenault	a2b036e88b	AArch64: Use TTI branch functions in branch relaxation The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505	2016-09-14 17:23:48 +00:00
Ahmed Bougacha	45bfa8772f	[X86] Copy imp-uses when folding tailcall into conditional branch. r280832 added 32-bit support for emitting conditional tail-calls, but dropped imp-used parameter registers. This went unnoticed until r281113, which added 64-bit support, as this is only exposed with parameter passing via registers. Don't drop the imp-used parameters. llvm-svn: 281223	2016-09-12 16:05:27 +00:00
Duncan P. N. Exon Smith	1872096f1e	CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MI Now that MachineBasicBlock::reverse_instr_iterator knows when it's at the end (since r281168 and r281170), implement MachineBasicBlock::reverse_iterator directly on top of an ilist::reverse_iterator by adding an IsReverse template parameter to MachineInstrBundleIterator. This replaces another hard-to-reason-about use of std::reverse_iterator on list iterators, matching the changes for ilist::reverse_iterator from r280032 (see the "out of scope" section at the end of that commit message). MachineBasicBlock::reverse_iterator now has a handle to the current node and has obvious invalidation semantics. r280032 has a more detailed explanation of how list-style reverse iterators (invalidated when the pointed-at node is deleted) are different from vector-style reverse iterators like std::reverse_iterator (invalidated on every operation). A great motivating example is this commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp. Note: If your out-of-tree backend deletes instructions while iterating on a MachineBasicBlock::reverse_iterator or converts between MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator, you'll need to update your code in similar ways to r280032. The following table might help: [Old] ==> [New] delete &RI, RE = end() delete &RI++ RI->erase(), RE = end() RI++->erase() reverse_iterator(I) std::prev(I).getReverse() reverse_iterator(I) ++I.getReverse() --reverse_iterator(I) I.getReverse() reverse_iterator(std::next(I)) I.getReverse() RI.base() std::prev(RI).getReverse() RI.base() ++RI.getReverse() --RI.base() RI.getReverse() std::next(RI).base() RI.getReverse() (For more details, have a look at r280032.) llvm-svn: 281172	2016-09-11 18:51:28 +00:00
Craig Topper	fb4564cf21	[AVX-512] Add VPTERNLOG to load folding tables. llvm-svn: 281156	2016-09-11 05:33:40 +00:00
Craig Topper	69be1bd352	[X86] Make a helper method into a static function local to the cpp file. llvm-svn: 281154	2016-09-11 05:33:35 +00:00
Justin Lebar	adbf09e8cf	[CodeGen] Split out the notions of MI invariance and MI dereferenceability. Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151	2016-09-11 01:38:58 +00:00
Justin Lebar	d98cf00c95	[CodeGen] Rename MachineInstr::isInvariantLoad to isDereferenceableInvariantLoad. NFC Summary: I want to separate out the notions of invariance and dereferenceability at the MI level, so that they correspond to the equivalent concepts at the IR level. (Currently an MI load is MI-invariant iff it's IR-invariant and IR-dereferenceable.) First step is renaming this function. Reviewers: chandlerc Subscribers: MatzeB, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D23370 llvm-svn: 281125	2016-09-10 01:03:20 +00:00
Hans Wennborg	6ecf619be9	X86: Fold tail calls into conditional branches also for 64-bit (PR26302) This extends the optimization in r280832 to also work for 64-bit. The only quirk is that we can't do this for 64-bit Windows (yet). Differential Revision: https://reviews.llvm.org/D24423 llvm-svn: 281113	2016-09-09 22:37:27 +00:00
Craig Topper	149e6bdc16	[AVX-512] Add VPCMP instructions to the load folding tables and make them commutable. llvm-svn: 281013	2016-09-09 01:36:10 +00:00
Hans Wennborg	c39ef776fc	Win64: Don't use REX prefix for direct tail calls The REX prefix should be used on indirect jmps, but not direct ones. For direct jumps, the unwinder looks at the offset to determine if it's inside the current function. Differential Revision: https://reviews.llvm.org/D24359 llvm-svn: 281003	2016-09-08 23:35:10 +00:00
Hans Wennborg	75e25f6812	X86: Fold tail calls into conditional branches where possible (PR26302) When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte. Example: define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void } before: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar after: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter. This patch only does 32-bit, but 64-bit would work similarly. Differential Revision: https://reviews.llvm.org/D24108 llvm-svn: 280832	2016-09-07 17:52:14 +00:00
Craig Topper	b880ad3a71	[AVX-512] Add support for commuting masked instructions in findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input. llvm-svn: 280781	2016-09-07 04:46:11 +00:00
Craig Topper	93f7b5699b	[AVX-512] Integrate mask register copying more completely into X86InstrInfo::copyPhysReg and simplify. No functional change intended. The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set. llvm-svn: 280673	2016-09-05 20:34:50 +00:00
Craig Topper	d9ca3d97ef	[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648	2016-09-05 06:43:06 +00:00
Craig Topper	e3807febd8	[X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected. The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD. I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were. llvm-svn: 280644	2016-09-05 02:20:49 +00:00
Craig Topper	040b10784e	[AVX-512] Add EVEX encoded scalar FMA intrinsic instructions to isNonFoldablePartialRegisterLoad. llvm-svn: 280636	2016-09-04 19:33:47 +00:00
Craig Topper	907b580d72	[AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test. llvm-svn: 280593	2016-09-03 17:20:07 +00:00
Craig Topper	892ce56901	[AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables. llvm-svn: 280581	2016-09-03 04:37:50 +00:00
Craig Topper	00aecd97bf	[AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same. llvm-svn: 280464	2016-09-02 05:29:09 +00:00
Andrey Turetskiy	cde38b6a99	[X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions. According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned to 16 bytes. This patch removes this requirement from the memory folding table. Differential Revision: https://reviews.llvm.org/D23919 llvm-svn: 280402	2016-09-01 18:50:02 +00:00
Dean Michael Berris	40e6ba16a1	[XRay][NFC] Promote isTailCall() as virtual in TargetInstrInfo. This change is broken out from D23986, where XRay detects tail call exits. llvm-svn: 280331	2016-09-01 01:03:22 +00:00
Craig Topper	8877a026e4	[X86] Rename PABSB/D/W instructions to be consistent with SSE/AVX instructions instead of ending 128/256. NFC llvm-svn: 279927	2016-08-28 06:06:21 +00:00
Craig Topper	225da2cb84	[AVX-512] Allow EVEX encoding unordered/ordered/equal/notequal VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts. llvm-svn: 279914	2016-08-27 05:22:15 +00:00
Craig Topper	144fdef66b	[X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted. llvm-svn: 279913	2016-08-27 05:22:12 +00:00
Craig Topper	4891c724aa	[AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd. llvm-svn: 279912	2016-08-27 05:22:08 +00:00
Craig Topper	8f27f51192	[X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support. llvm-svn: 279806	2016-08-26 07:08:00 +00:00
Craig Topper	969e56a2cc	[X86] Fix indentation per coding standards. NFC llvm-svn: 279719	2016-08-25 04:16:08 +00:00

1 2 3 4 5 ...

1133 Commits