See D66309 for context.
This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time.
Sorry for the lack of tests. As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific.
Differential Revision: https://reviews.llvm.org/D66322
llvm-svn: 371547
As reported in post-commit review of r370327,
there is some case where the code crashes.
As discussed with Craig Topper, the problem is that getConstant()
internally calls getSplatBuildVector(), so we don't insert
the constant itself.
If we do that manually we're good.
llvm-svn: 371346
We can use a MOVSX16 here then rely on FixupBWInst to change to
MOVSX32 if the upper bits are dead. With a special case to
not promote if it could be turned into CBW.
Then we can rely on X86MCInstLower to turn the MOVSX into CBW
very late if register allocation worked out.
Using MOVSX gives an opportunity to use the MOVSX as a both a
copy and a sign extend since the input and output register aren't
tied together.
Differential Revision: https://reviews.llvm.org/D67192
llvm-svn: 371243
We can rely on X86FixupBWInsts to turn these into MOVZX32. This
simplifies a follow up commit to use MOVSX for i8 sdivrem with
a late optimization to use CBW when register allocation works out.
llvm-svn: 371242
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...
I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62327
llvm-svn: 370327
LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.
In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.
As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.
Differential Revision: https://reviews.llvm.org/D64707
llvm-svn: 366431
Summary:
We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202.
Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern.
To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND.
This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild.
I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong.
I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations.
Fixes PR27202
Reviewers: spatel, RKSimon, andreadb
Reviewed By: RKSimon
Subscribers: jsji, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59909
llvm-svn: 365163
Summary:
The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic.
Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality
we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones".
And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself.
Last pattern D does not seem to exhibit any of these truncation issues.
Although it has the opposite problem - if we extract low bits (no shift) from i64,
and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62806
llvm-svn: 364419
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62793
llvm-svn: 364418
Summary:
Finally tying up loose ends here.
The problem is quite simple:
If we have pattern `(x >> start) & (1 << nbits) - 1`,
and then truncate the result, that truncation will be propagated upwards,
into the `and`. And that isn't currently handled.
I'm only fixing pattern `a` here,
the same fix will be needed for patterns `b`/`c` too.
I *think* this isn't missing any extra legality checks,
since we only look past truncations. Similary, i don't think
we can get any other truncation there other than i64->i32.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62786
llvm-svn: 364417
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.
In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......
Differential Revision: https://reviews.llvm.org/D63326
llvm-svn: 363655
I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Craig Topper, Kevin P. Neal
Approved by: Craig Topper
Differential Revision: https://reviews.llvm.org/D63271
llvm-svn: 363417
Previously we did the equivalent operation in isel patterns with
COPY_TO_REGCLASS operations to transition. By inserting
scalar_to_vetors and extract_vector_elts before isel we can
allow each piece to be selected individually and accomplish the
same final result.
I ideally we'd use vector operations earlier in lowering/combine,
but that looks to be more difficult.
The scalar-fp-to-i64.ll changes are because we have a pattern for
using movlpd for store+extract_vector_elt. While an f64 store
uses movsd. The encoding sizes are the same.
llvm-svn: 362914
We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table.
This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K.
There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need.
We can probably do something similar for scalar, but I haven't looked at it yet.
Differential Revision: https://reviews.llvm.org/D62757
llvm-svn: 362535
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.
Differential Revision: https://reviews.llvm.org/D61047
llvm-svn: 360823
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.
This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.
Differential Revision: https://reviews.llvm.org/D61164
llvm-svn: 359582
The IndexReg will always be non-null at this point. Earlier in the function, if
IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it
non-null.
llvm-svn: 359170
ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes.
One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match.
Differential Revision: https://reviews.llvm.org/D61048
llvm-svn: 359121
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.
llvm-svn: 358805
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.
llvm-svn: 358804
Summary:
There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code.
This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead.
matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere.
Reviewers: niravd, spatel, RKSimon, bkramer
Reviewed By: niravd
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60843
llvm-svn: 358735
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ
%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0
...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).
We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.
Differential Revision: https://reviews.llvm.org/D60789
llvm-svn: 358622
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.
This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.
The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.
We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.
There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.
llvm-svn: 358359
Summary:
The Linux kernel uses PC-relative mode, so allow that when the code model is
"kernel".
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits, kees, nickdesaulniers
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60643
llvm-svn: 358343
We know all our values are limited to 64 bits here so we don't need an APInt.
This should save some generated code checking between large and small size.
llvm-svn: 358338
foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work.
This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part.
This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded.
Differential Revision: https://reviews.llvm.org/D60532
llvm-svn: 358139
These ifs were ensuring we don't have to handle types larger than 64 bits probably because we use getZExtValue in several places below them.
None of the callers of this code pass types larger than 64-bits so we can just assert instead of branching in release code.
I've also moved them earlier since we're just looking through operations that don't effect bit width.
This is prep work for some refactoring I plan to do to the (and (shl)) handling code.
llvm-svn: 358123
This function reorders AND and SHL to enable the SHL to fold into an LEA. The
upper bits of the AND will be shifted out by the SHL so it doesn't matter what
mask value we use for these bits. By using sign bits from the original mask in
these upper bits we might enable a shorter immediate encoding to be used.
llvm-svn: 357846
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon
Reviewed By: RKSimon
Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60228
llvm-svn: 357802
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri
Reviewed By: andreadb
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60138
llvm-svn: 357801
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.
This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.
Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.
This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.
I plan to make similar changes for SETcc and Jcc.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60041
llvm-svn: 357800