llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	4ca81a9b99	[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728	2019-07-01 07:09:31 +00:00
Craig Topper	fc233c9108	[X86] Add some additional load folding tests to vec_int_to_fp.ll/vec_int_to_fp-widen.ll and disable the peephole pass. Also copy some missing test cases from vec_int_to_fp.ll to vec_int_to_fp-widen.ll llvm-svn: 364727	2019-07-01 07:09:26 +00:00
Simon Pilgrim	cb7e4e8193	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205) We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397	2019-06-03 13:02:07 +00:00
Craig Topper	31d00d80a2	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203	2019-05-31 07:38:26 +00:00
Craig Topper	67d43e0744	[X86] Add test cases for a volatile load shrinking bug involving cvtdq2pd. NFC Similar to PR42079 llvm-svn: 362201	2019-05-31 07:38:18 +00:00
Craig Topper	d10a200ceb	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085	2019-05-06 21:39:51 +00:00
Simon Pilgrim	ccb71b2985	Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057	2019-03-27 10:25:02 +00:00
Simon Pilgrim	87d4ab8b92	[X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864	2019-03-24 19:06:35 +00:00
Sanjay Patel	a9e289174a	[x86] allow narrowing of vector UINT_TO_FP As discussed in: D56864 D58197 Always use the narrow (128-bit) instruction when possible. We already had the signed int version of this transform. llvm-svn: 354675	2019-02-22 15:47:45 +00:00
Sanjay Patel	234a5e8ea4	[x86] vectorize more cast ops in lowering to avoid register file transfers This is a follow-up to D56864. If we're extracting from a non-zero index before casting to FP, then shuffle the vector and optionally narrow the vector before doing the cast: cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0 This might be enough to close PR39974: https://bugs.llvm.org/show_bug.cgi?id=39974 Differential Revision: https://reviews.llvm.org/D58197 llvm-svn: 354619	2019-02-21 20:40:39 +00:00
Sanjay Patel	e84fbb67a1	[x86] vectorize cast ops in lowering to avoid register file transfers The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch. There are several potential follow-ups to enhance this: 1. Allow extraction from non-zero element index. 2. Peek through extends of smaller width integers. 3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P Differential Revision: https://reviews.llvm.org/D56864 llvm-svn: 353302	2019-02-06 14:59:39 +00:00
Sanjay Patel	997b2aba58	[x86] add tests for extract+sitofp; NFC llvm-svn: 353249	2019-02-06 00:19:56 +00:00
Sanjay Patel	546c9f6d8f	[x86] add tests for extracted scalar casts (PR39974); NFC https://bugs.llvm.org/show_bug.cgi?id=39974 llvm-svn: 351354	2019-01-16 16:11:30 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	24b346da42	[X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets. Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine. Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll llvm-svn: 347348	2018-11-20 21:21:52 +00:00
Craig Topper	17fa42a69b	[X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types. llvm-svn: 347296	2018-11-20 09:04:01 +00:00
Simon Pilgrim	7f92efa5a9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. llvm-svn: 347177	2018-11-18 22:13:31 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Craig Topper	f7108aef14	[X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050	2018-11-02 22:48:02 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Sanjay Patel	8b207defea	[DAGCombiner] narrow vector binops when extraction is cheap Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602	2018-10-30 14:14:34 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Simon Pilgrim	838eb24014	[TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226) As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well. Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets. The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it. Differential Revision: https://reviews.llvm.org/D53649 llvm-svn: 345256	2018-10-25 11:15:57 +00:00
Simon Pilgrim	0dcf1cea03	[X86][SSE] Add SSE41 vector int2fp tests llvm-svn: 343925	2018-10-06 20:24:27 +00:00
Simon Pilgrim	ad23f270db	[X86] Standardize floating point assembly comments Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562	2018-10-02 09:08:51 +00:00
Craig Topper	fe0b973fbf	[X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64. Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327	2018-09-15 16:23:35 +00:00
Simon Pilgrim	1d181bc992	[X86][AVX] Use extract_subvector to reduce vector op widths (PR36761) We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types. This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types. Differential Revision: https://reviews.llvm.org/D49556 llvm-svn: 337500	2018-07-19 21:52:06 +00:00
Craig Topper	07a1787501	[X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147	2018-07-16 06:56:09 +00:00
Vedant Kumar	cc7b2a55c2	[DAGCombiner] Change the SDLoc on split extloads (2/N) In DAGCombiner, we try to simplify this pattern: ([s\|z]ext (load ...)) Conceptually, a new extload which is created while splitting the load should have the same debug location as the load. Making this change affects the IROrder of the new load, causing some test case churn. In practice, the new location is never different from the location of the [s\|z]ext, at least not during check-llvm or a stage2 build. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46156 llvm-svn: 331301	2018-05-01 19:29:15 +00:00
Simon Pilgrim	ca38c762e4	[TargetLowering] Add vector BITCAST support to SimplifyDemandedVectorElts Notably helps cleanup after legalization of vector types Differential Revision: https://reviews.llvm.org/D43674 llvm-svn: 326838	2018-03-06 22:32:01 +00:00
Geoff Berry	a2b9011290	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208	2018-02-27 16:59:10 +00:00
Quentin Colombet	48abac82b8	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r323991. This commit breaks target that don't model all the register constraints in TableGen. So far the workaround was to set the hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the cases. For instance, when mutating an instruction (like in the lowering of COPYs) the isRenamable flag is not properly updated. The same problem will happen when attaching machine operand from one instruction to another. Geoff Berry is working on a fix in https://reviews.llvm.org/D43042. llvm-svn: 325421	2018-02-17 03:05:33 +00:00
Geoff Berry	94503c7bc3	[MachineCopyPropagation] Extend pass to do COPY source forwarding Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991	2018-02-01 18:54:01 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Craig Topper	198f7d78d3	[X86] Regenerate a test with broadcast comments. NFC llvm-svn: 318640	2017-11-20 08:15:04 +00:00
Craig Topper	a5af4a64d0	[AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860	2017-10-15 16:41:17 +00:00
Geoff Berry	fabedbad11	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. llvm-svn: 314814	2017-10-03 16:59:13 +00:00
Geoff Berry	bfc5fb4571	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729	2017-10-02 22:01:37 +00:00
Simon Pilgrim	ec528b2cb6	Regenerate test (missing broadcast constant comments). NFCI. Still avoiding the floating point comments to prevent linux/windows discrepancies. llvm-svn: 314681	2017-10-02 15:22:35 +00:00
Craig Topper	a6054328e8	[X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509	2017-09-18 04:40:58 +00:00
Craig Topper	87f7381edf	[X86] Teach execution domain fixing to convert between FP and int unpack instructions. llvm-svn: 313508	2017-09-18 03:29:54 +00:00
Elena Demikhovsky	6cab129464	[X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121	2017-09-13 06:40:26 +00:00
Sam McCall	f71bb198ed	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490	2017-09-04 15:47:00 +00:00
Geoff Berry	65528f2991	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328	2017-09-01 14:27:20 +00:00
Hans Wennborg	24775a0a6c	Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178	2017-08-30 22:11:37 +00:00
Geoff Berry	feffb0c8af	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154	2017-08-30 18:41:07 +00:00

1 2 3

117 Commits