Commit Graph

23 Commits

Author SHA1 Message Date
Chandler Carruth 41ed4034dd [x86] Revert the X86FoldTablesEmitter due to more miscompiles.
In testing, we've found yet another miscompile caused by the new tables.
And this one is even less clear how to fix (we could teach it to fold
a 16-bit load instead of the 32-bit load it wants, or block folding
entirely).

Also, the approach to excluding instructions seems increasingly to not
scale well.

I have left a more detailed analysis on the review log for the original
patch (https://reviews.llvm.org/D32684) along with suggested path
forward. I will land an additional test case that I wrote which covers
the code that was miscompiling (folding into the output of `pextrw`) in
a subsequent commit to keep this a pure revert.

For each commit reverted here, I've restricted the revert to the
non-test code touching the x86 fold table emission until the last commit
where I did revert the test updates. This means the *new* test cases
added for `insertps` and `xchg` remain untouched (and continue to pass).

Reverted commits:
r304540: [X86] Don't fold into memory operands into insertps in the ...
r304347: [TableGen] Adapt more places to getValueAsString now ...
r304163: [X86] Don't fold away the memory operand of an xchg.
r304123: Don't capture a temporary std::string in a StringRef.
r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..."

Original commit was in r304088, and after a string of fixes was reverted
previously in r304121 to fix build bots, and then re-landed in r304122.

llvm-svn: 304762
2017-06-06 02:15:31 +00:00
Benjamin Kramer 19092d783c [X86] Don't fold into memory operands into insertps in the generated folding tables.
insertps behaves differently, the register form selects from an input
register based on the immediate operand while the memory form just loads
the given address. We have custom code to change the immediate in cases
where that's legal, so completely remove insertps from the generated
tables.

llvm-svn: 304540
2017-06-02 10:50:22 +00:00
Zachary Turner df1832cf86 Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This was reverted due to buildbot breakages and I was not familiar
with this code to investigate it.  But while trying to get a
useful backtrace for the author, it turns out the fix was very
obvious.  Resubmitting this patch as is, and will submit the
fix in a followup so that the fix is not hidden in the larger
CL.

llvm-svn: 304122
2017-05-29 02:19:37 +00:00
Zachary Turner 5b199be769 Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well
as all subsequent followups.  llvm-tblgen currently segfaults with
this change, and it seems it has been broken on the bots all
day with no fixes in preparation.  See, for example:

http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/

llvm-svn: 304121
2017-05-29 01:48:53 +00:00
Ayman Musa d9f1fe43a8 [X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.
X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.

Differential Revision: https://reviews.llvm.org/D32684

llvm-svn: 304088
2017-05-28 12:55:36 +00:00
Ayman Musa 4b2c968c43 [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).

llvm-svn: 295970
2017-02-23 13:15:44 +00:00
Simon Pilgrim 4b1ebf97fc [X86][SSE] Force execution domain of 32-bit extractps/pextrd in the stack folding tests
llvm-svn: 288910
2016-12-07 15:06:14 +00:00
Craig Topper 5fc7bc91f9 [X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined.
The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands.

I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded.

llvm-svn: 288779
2016-12-06 08:07:58 +00:00
Simon Pilgrim 54c32ddf55 [X86][SSE] Fix memory folding of (v)roundsd / (v)roundss
We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier.

This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481.

Differential Revision: https://reviews.llvm.org/D23276

llvm-svn: 278106
2016-08-09 09:32:34 +00:00
Craig Topper 49841c3812 [X86] Add commutable floating point max/min instructions to the load folding tables.
llvm-svn: 277949
2016-08-07 05:39:51 +00:00
Craig Topper 83613bb436 [X86] Fix test checks to include leading 'v' on avx mnemonic names.
llvm-svn: 275774
2016-07-18 06:49:29 +00:00
Simon Pilgrim 8cfcf586bb [X86][SSE] Added cvtdq2pd/cvtps2pd generic IR tests
Added D20528 implementations as well as existing x86 intrinsics versions

llvm-svn: 270494
2016-05-23 21:45:02 +00:00
Simon Pilgrim 7e6606f4f1 [X86][SSE] Add general memory folding for (V)INSERTPS instruction
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.

The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.

This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.

It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.

Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll

Differential Revision: http://reviews.llvm.org/D13988

llvm-svn: 252074
2015-11-04 20:48:09 +00:00
Simon Pilgrim b0d860a394 [X86][AVX] Tweaked shuffle stack folding tests
To avoid alternative lowerings.

llvm-svn: 251986
2015-11-03 21:58:35 +00:00
Simon Pilgrim 5e79ea8281 [X86] Stack folding tests - just use mtriple - no need for mcpu in these tests
llvm-svn: 251229
2015-10-25 11:42:46 +00:00
Stephen Canon 8216d88511 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
Simon Pilgrim 752de5dff2 [X86][SSE] Added (V)ROUNDSD + (V)ROUNDSS stack folding support
llvm-svn: 241671
2015-07-08 08:07:57 +00:00
Simon Pilgrim e68c8606ed [X86][SSE] Force execution domain on float/double unpack shuffle tests.
llvm-svn: 235259
2015-04-18 18:50:55 +00:00
Simon Pilgrim 0238b96c06 [X86] Force fp stack folding tests to keep to specific domain.
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default).

llvm-svn: 228495
2015-02-07 16:14:55 +00:00
Craig Topper 0271d10d35 [x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense.
llvm-svn: 226902
2015-01-23 08:00:59 +00:00
Simon Pilgrim 7e6d573e87 [X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests.
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.

Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.

llvm-svn: 226873
2015-01-22 22:39:59 +00:00
Simon Pilgrim f5dcc1cbe6 [X86][AVX] Simplified diff between AVX1 and SSE42 fp stack folding tests. NFC.
Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves.

llvm-svn: 226623
2015-01-21 00:02:13 +00:00
Simon Pilgrim 0d68d98bd5 [X86][AVX] Renamed AVX1 fp stack folding tests. NFC.
The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync.

llvm-svn: 226620
2015-01-20 23:45:50 +00:00