Commit Graph

117 Commits

Author SHA1 Message Date
Craig Topper 4ca81a9b99 [X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload.
But only when the load isn't volatile.

This improves load folding during isel where we only have vzload
and scalar_to_vector+load patterns. We can't have full vector load
isel patterns for the same volatile load issue.

Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns.

llvm-svn: 364728
2019-07-01 07:09:31 +00:00
Craig Topper fc233c9108 [X86] Add some additional load folding tests to vec_int_to_fp.ll/vec_int_to_fp-widen.ll and disable the peephole pass.
Also copy some missing test cases from vec_int_to_fp.ll to vec_int_to_fp-widen.ll

llvm-svn: 364727
2019-07-01 07:09:26 +00:00
Simon Pilgrim cb7e4e8193 [SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction

Differential Revision: https://reviews.llvm.org/D62807

llvm-svn: 362397
2019-06-03 13:02:07 +00:00
Craig Topper 31d00d80a2 [X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.

Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.

This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?

llvm-svn: 362203
2019-05-31 07:38:26 +00:00
Craig Topper 67d43e0744 [X86] Add test cases for a volatile load shrinking bug involving cvtdq2pd. NFC
Similar to PR42079

llvm-svn: 362201
2019-05-31 07:38:18 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Simon Pilgrim ccb71b2985 Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)
Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure.

This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain.
........
Causes PR41249

llvm-svn: 357057
2019-03-27 10:25:02 +00:00
Simon Pilgrim 87d4ab8b92 [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)
Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure.

This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain.

llvm-svn: 356864
2019-03-24 19:06:35 +00:00
Sanjay Patel a9e289174a [x86] allow narrowing of vector UINT_TO_FP
As discussed in:
D56864
D58197

Always use the narrow (128-bit) instruction when possible.
We already had the signed int version of this transform.

llvm-svn: 354675
2019-02-22 15:47:45 +00:00
Sanjay Patel 234a5e8ea4 [x86] vectorize more cast ops in lowering to avoid register file transfers
This is a follow-up to D56864.

If we're extracting from a non-zero index before casting to FP,
then shuffle the vector and optionally narrow the vector before doing the cast:

cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0

This might be enough to close PR39974:
https://bugs.llvm.org/show_bug.cgi?id=39974

Differential Revision: https://reviews.llvm.org/D58197

llvm-svn: 354619
2019-02-21 20:40:39 +00:00
Sanjay Patel e84fbb67a1 [x86] vectorize cast ops in lowering to avoid register file transfers
The proposal in D56796 may cross the line because we're trying to avoid vectorization 
transforms in generic DAG combining. So this is an alternate, later, x86-specific 
translation of that patch.

There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P

Differential Revision: https://reviews.llvm.org/D56864

llvm-svn: 353302
2019-02-06 14:59:39 +00:00
Sanjay Patel 997b2aba58 [x86] add tests for extract+sitofp; NFC
llvm-svn: 353249
2019-02-06 00:19:56 +00:00
Sanjay Patel 546c9f6d8f [x86] add tests for extracted scalar casts (PR39974); NFC
https://bugs.llvm.org/show_bug.cgi?id=39974

llvm-svn: 351354
2019-01-16 16:11:30 +00:00
Craig Topper a32e353afa [X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.

Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.

llvm-svn: 350159
2018-12-30 03:05:07 +00:00
Craig Topper 24b346da42 [X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets.
Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.

Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll

llvm-svn: 347348
2018-11-20 21:21:52 +00:00
Craig Topper 17fa42a69b [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef.
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.

As near as I can tell this makes v16i8 behavior consistent with every other VT now.

This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.

llvm-svn: 347296
2018-11-20 09:04:01 +00:00
Simon Pilgrim 7f92efa5a9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
llvm-svn: 347177
2018-11-18 22:13:31 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Craig Topper 9a7e19b8f2 [DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars
It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input.

I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used.

Differential Revision: https://reviews.llvm.org/D54283

llvm-svn: 346530
2018-11-09 18:04:34 +00:00
Craig Topper f7108aef14 [X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL
The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies.

llvm-svn: 346050
2018-11-02 22:48:02 +00:00
Craig Topper 60c202a494 [X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.

I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.

I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.

For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.

Differential Revision: https://reviews.llvm.org/D54024

llvm-svn: 346043
2018-11-02 21:09:49 +00:00
Sanjay Patel 8b207defea [DAGCombiner] narrow vector binops when extraction is cheap
Narrowing vector binops came up in the demanded bits discussion in D52912.

I don't think we're going to be able to do this transform in IR as a canonicalization 
because of the risk of creating unsupported widths for vector ops, but we already have 
a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is 
currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression 
test diffs).

This is artificially limited to not look through bitcasts because there are so many 
test diffs already, but that's marked with a TODO and is a small follow-up.

Differential Revision: https://reviews.llvm.org/D53784

llvm-svn: 345602
2018-10-30 14:14:34 +00:00
Craig Topper aa5eb2fbaa [X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers.
When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.

Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.

llvm-svn: 345488
2018-10-29 04:52:04 +00:00
Simon Pilgrim 838eb24014 [TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226)
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.

Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.

The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.

Differential Revision: https://reviews.llvm.org/D53649

llvm-svn: 345256
2018-10-25 11:15:57 +00:00
Simon Pilgrim 0dcf1cea03 [X86][SSE] Add SSE41 vector int2fp tests
llvm-svn: 343925
2018-10-06 20:24:27 +00:00
Simon Pilgrim ad23f270db [X86] Standardize floating point assembly comments
Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well.

Differential Revision: https://reviews.llvm.org/D52702

llvm-svn: 343562
2018-10-02 09:08:51 +00:00
Craig Topper fe0b973fbf [X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64.
Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back?

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52134

llvm-svn: 342327
2018-09-15 16:23:35 +00:00
Simon Pilgrim 1d181bc992 [X86][AVX] Use extract_subvector to reduce vector op widths (PR36761)
We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types.

This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types.

Differential Revision: https://reviews.llvm.org/D49556

llvm-svn: 337500
2018-07-19 21:52:06 +00:00
Craig Topper 07a1787501 [X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics.
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.

The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.

llvm-svn: 337147
2018-07-16 06:56:09 +00:00
Vedant Kumar cc7b2a55c2 [DAGCombiner] Change the SDLoc on split extloads (2/N)
In DAGCombiner, we try to simplify this pattern:

  ([s|z]ext (load ...))

Conceptually, a new extload which is created while splitting the load
should have the same debug location as the load.

Making this change affects the IROrder of the new load, causing some
test case churn.

In practice, the new location is never different from the location of
the [s|z]ext, at least not during check-llvm or a stage2 build.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46156

llvm-svn: 331301
2018-05-01 19:29:15 +00:00
Simon Pilgrim ca38c762e4 [TargetLowering] Add vector BITCAST support to SimplifyDemandedVectorElts
Notably helps cleanup after legalization of vector types

Differential Revision: https://reviews.llvm.org/D43674

llvm-svn: 326838
2018-03-06 22:32:01 +00:00
Geoff Berry a2b9011290 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.

llvm-svn: 326208
2018-02-27 16:59:10 +00:00
Quentin Colombet 48abac82b8 Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
This reverts commit r323991.

This commit breaks target that don't model all the register constraints
in TableGen. So far the workaround was to set the
hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the
cases.
For instance, when mutating an instruction (like in the lowering of
COPYs) the isRenamable flag is not properly updated. The same problem
will happen when attaching machine operand from one instruction to
another.

Geoff Berry is working on a fix in https://reviews.llvm.org/D43042.

llvm-svn: 325421
2018-02-17 03:05:33 +00:00
Geoff Berry 94503c7bc3 [MachineCopyPropagation] Extend pass to do COPY source forwarding
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.

This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).

This change is a continuation of the work started in D30751.

Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar

Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits

Differential Revision: https://reviews.llvm.org/D41835

llvm-svn: 323991
2018-02-01 18:54:01 +00:00
Puyan Lotfi 43e94b15ea Followup on Proposal to move MIR physical register namespace to '$' sigil.
Discussed here:

http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html

In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.

llvm-svn: 323922
2018-01-31 22:04:26 +00:00
Francis Visoiu Mistrih a8a83d150f [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih 25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Craig Topper 198f7d78d3 [X86] Regenerate a test with broadcast comments. NFC
llvm-svn: 318640
2017-11-20 08:15:04 +00:00
Craig Topper a5af4a64d0 [AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering.
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.

There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.

Reviewers: RKSimon, zvi, delena

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38714

llvm-svn: 315860
2017-10-15 16:41:17 +00:00
Geoff Berry fabedbad11 Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This reverts commit r314729.

Another bug has been encountered in an out-of-tree target reported by Quentin.

llvm-svn: 314814
2017-10-03 16:59:13 +00:00
Geoff Berry bfc5fb4571 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
  in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
  explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 314729
2017-10-02 22:01:37 +00:00
Simon Pilgrim ec528b2cb6 Regenerate test (missing broadcast constant comments). NFCI.
Still avoiding the floating point comments to prevent linux/windows discrepancies.

llvm-svn: 314681
2017-10-02 15:22:35 +00:00
Craig Topper a6054328e8 [X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain.
MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter.

llvm-svn: 313509
2017-09-18 04:40:58 +00:00
Craig Topper 87f7381edf [X86] Teach execution domain fixing to convert between FP and int unpack instructions.
llvm-svn: 313508
2017-09-18 03:29:54 +00:00
Elena Demikhovsky 6cab129464 [X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector
Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions.

llvm-svn: 313121
2017-09-13 06:40:26 +00:00
Sam McCall f71bb198ed Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)

This reverts commit r312328.

llvm-svn: 312490
2017-09-04 15:47:00 +00:00
Geoff Berry 65528f2991 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 312328
2017-09-01 14:27:20 +00:00
Hans Wennborg 24775a0a6c Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")

> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
>   doing so can break code that implicitly relies on the physical
>   register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
>   can break the machine verifier by creating LiveRanges that don't
>   end on a use (since the undef operand is not considered a use).
>
>   [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
>   This change extends MachineCopyPropagation to do COPY source forwarding.
>
>   This change also extends the MachineCopyPropagation pass to be able to
>   be run during register allocation, after physical registers have been
>   assigned, but before the virtual registers have been re-written, which
>   allows it to remove virtual register COPY LiveIntervals that become dead
>   through the forwarding of all of their uses.

llvm-svn: 312178
2017-08-30 22:11:37 +00:00
Geoff Berry feffb0c8af Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 312154
2017-08-30 18:41:07 +00:00