Commit Graph

986 Commits

Author SHA1 Message Date
Craig Topper d26f11108b [X86] Split X86ISD::CMP into an integer and FP opcode. 2020-02-16 10:10:19 -08:00
Craig Topper e5b3ae4b34 [X86] Merge two switches together to simplify some code. NFC 2020-02-15 12:55:51 -08:00
Craig Topper 3f7649799b [X86] Move combineIncDecVector logic from Select to PreprocessISelDAG.
This allows it to work properly with masked inc/dec for avx512. Those
would have a vselect as the root node so didn't get a chance to call
combineIncDecVector.

This also simplifies the logic because we don't have to manage
the topological ordering.
2020-02-15 09:59:12 -08:00
Fangrui Song 0dce409cee [AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile 2020-02-13 13:22:49 -08:00
Craig Topper 656d66f5fc [X86] Use custom isel for (X86sbb_flag 0, 0) so we can use 32-bit SBB for i8/i16.
We were using MOV32r0 and an extract_subreg as an input. By using
custom isel we can move the extract_subreg to after the SBB instead
of on the input.
2020-02-09 13:19:35 -08:00
Craig Topper e1cbfecdb8 [X86] Add flag result VT to a MOV32r0 created in X86DAGToDAGISel::Select
The flag isn't used, but I believe this matches the MOV32r0 that
would be created by the table emitter. This should allow this node
to be CSEed with any others created by the table.
2020-02-09 13:19:21 -08:00
Craig Topper dd262222b4 [X86] Use MVT::i32 for the type of a MOV32r0 created in X86DAGToDAGISel::Select.
Not sure if this really matters. The VT isn't really used after
this point. At best it might affect CSE.
2020-02-09 11:57:42 -08:00
Craig Topper dbcc1392b3 [X86] Remove isel patterns that include a vselect/X86selects and a strict FP node.
A vselect+strictfp node is not equivalent to a masked operation.
The exceptions of the strictfp node are not masked by a vselect
after it so we can't match it to a masked operation.

We already had a hack in IsLegalToFold to prevent these patterns from
matching. This patch removes that hack and removes the patterns.
2020-02-09 11:45:54 -08:00
Craig Topper ae4e49868a [X86] Turn vXi1 any_extends into sign_extends in PreprocessISelDAG and remove some isel patterns.
Similar to what we do for other vector any_extends, but instead
of zero_extend we need to use sign_extend.
2020-02-06 21:32:53 -08:00
Craig Topper 4175d7e22e [X86] Custom isel floating point X86ISD::CMP on pre-CMOV targets. Eliminate ConvertCmpIfNecessary
If we don't have cmov, X87 compares write to FPSW and we need to
move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions.

Previously this was done by calling ConvertCmpIfNecessary in
multiple places which would emit the extra code for the FNSTSW,
a shift, a truncate, and a SAHF instructions. Isel would then
select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW.

This patch centralizes all of the handling into a single custom
isel handler. This allows us to remove ConvertCmpIfNecessary and
a couple target specific ISD opcodes.

Differential Revision: https://reviews.llvm.org/D73863
2020-02-06 10:43:06 -08:00
Craig Topper 600f2e1c4d [X86] Remove SETB_C8r/SETB_C16r pseudo instructions. Use SETB_C32r and EXTRACT_SUBREG instead.
Only 32 and 64 bit SBB are dependency breaking instructons on some
CPUs. The 8 and 16 bit forms have to preserve upper bits of the GPR.

This patch removes the smaller forms and selects the wider form
instead. I had to do this with custom code as the tblgen generated
code glued the eflags copytoreg to the extract_subreg instead of
to the SETB pseudo.

Longer term I think we can remove X86ISD::SETCC_CARRY and use
(X86ISD::SBB zero, zero). We'll want to keep the pseudo and select
(X86ISD::SBB zero, zero) to either a MOV32r0+SBB for targets where
there is no dependency break and SETB_C32/SETB_C64 for targets
that have a dependency break. May want some way to avoid the MOV32r0
if the instruction that produced the carry flag happened to def a
register that we can use for the dependency.

I think the flag copy lowering should be using NEG instead of SUB to
handle SETB. That would avoid the MOV32r0 there. Or maybe it should
use a ADC with -1 to recreate the carry flag and keep the SETB?
That would avoid a MOVZX on the input of the SUB.

Differential Revision: https://reviews.llvm.org/D74024
2020-02-06 10:22:24 -08:00
Craig Topper d975910c50 [X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero.
This is an alternate fix for the issue D73606 was trying to
solve.

The main issue here is that we bailed out of
foldOffsetIntoAddress if Offset is 0. But if we just found a
symbolic displacement and AM.Disp became non-zero
earlier, we still need to validate that AM.Disp with the symbolic
displacement.

This is my second attempt at committing this after failing
build bots previously. One thing I realized about the previous
attempt is that its possible that AM.Disp is already non-zero
and the new Offset changes it back to zero. In that case my
previous attempt failed to update AM.Disp to zero. So this patch
removes the early out for 0 and appropriately handle the 0 case
in each check so we still update AM.Disp at the end.
2020-02-01 11:26:17 -08:00
Craig Topper 007a6a155c Revert "[X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero."
Possibly causing build bot failures.
2020-01-29 22:59:05 -08:00
Craig Topper 1ef8e8b414 [X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero.
This is an alternate fix for the issue D73606 was trying to
solve.

The main issue here is that we bailed out of
foldOffsetIntoAddress if Offset is 0. But if we just found a
symbolic displacement and AM.Disp became non-zero
earlier, we still need to validate that AM.Disp with the symbolic
displacement.

This passes fold-add-pcrel.ll.

Differential Revision: https://reviews.llvm.org/D73608
2020-01-29 21:32:16 -08:00
Fangrui Song bc15bf66dc [X86] matchAdd: don't fold a large offset into a %rip relative address
For `ret i64 add (i64 ptrtoint (i32* @foo to i64), i64 1701208431)`,

```
X86DAGToDAGISel::matchAdd
  ...
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
  if (!matchAddressRecursively(N.getOperand(0), AM, Depth+1) &&
// Try folding offset but fail; there is a symbolic displacement, so offset cannot be too large
      !matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1))
    return false;
  ...
  // Try again after commuting the operands.
// AM.Disp = Val; foldOffsetIntoAddress() does not know there will be a symbolic displacement
  if (!matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1) &&
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
      !matchAddressRecursively(Handle.getValue().getOperand(0), AM, Depth+1))
// Succeeded! Produced leaq sym+disp(%rip),...
    return false;
```

`foldOffsetIntoAddress()` currently does not know there is a symbolic
displacement and can fold a large offset.

The produced `leaq sym+disp(%rip), %rax` instruction is relocated by
an R_X86_64_PC32. If disp is large and sym+disp-rip>=2**31, there
will be a relocation overflow.

This approach is still not elegant. Unfortunately the isRIPRelative
interface is a bit clumsy. I tried several solutions and eventually
picked this one.

Differential Revision: https://reviews.llvm.org/D73606
2020-01-28 22:30:52 -08:00
Craig Topper 9cc9120969 [X86] Turn FP_ROUND/STRICT_FP_ROUND into X86ISD::VFPROUND/STRICT_VFPROUND during PreprocessISelDAG to remove some duplicate isel patterns. 2020-01-11 11:06:52 -08:00
Craig Topper 81a3d987ce [X86] Remove dead code from X86DAGToDAGISel::Select that is no longer needed now that we don't mutate strict fp nodes. NFC 2020-01-11 00:27:14 -08:00
Craig Topper c2ddfa876f [X86] Simplify code by removing an unreachable condition. NFCI
For X87<->SSE conversions, the SSE type is always smaller than
the X87 type. So we can always use the smallest type for the
memory type.
2020-01-10 23:41:06 -08:00
Craig Topper 5fe5c0a60f [X86] Preserve fpexcept property when turning strict_fp_extend and strict_fp_round into stack operations.
We use the stack for X87 fp_round and for moving from SSE f32/f64 to
X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64.

Note for the SSE<->X87 conversions the conversion always happens in the
X87 domain. The load/store ops in the X87 instructions are able
to signal exceptions.
2020-01-10 23:41:06 -08:00
Craig Topper 69806808b9 [X86] Use ReplaceAllUsesWith instead of ReplaceAllUsesOfValueWith to simplify some code. NFCI 2020-01-10 20:31:21 -08:00
Amara Emerson df3f4e0d77 [X86] Fix an 8 bit testb being selected when folding a volatile i32 load pattern.
Differential Revision: https://reviews.llvm.org/D71581
2020-01-06 11:46:42 -08:00
Liu, Chen3 8af492ade1 add strict float for round operation
Differential Revision: https://reviews.llvm.org/D72026
2020-01-01 20:42:12 +08:00
Fangrui Song 5edb40c022 [SelectionDAG] Disallow indirect "i" constraint
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.

They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
2019-12-29 16:50:42 -08:00
Craig Topper a21beccea2 [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.
Differential Revision: https://reviews.llvm.org/D71850
2019-12-24 10:07:04 -08:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
Liu, Chen3 2f932b5729 Enable STRICT_FP_TO_SINT/UINT on X86 backend
This patch is mainly for custom lowering the vector operation.

Differential Revision: https://reviews.llvm.org/D71592
2019-12-19 14:49:13 +08:00
Craig Topper f0df4218b6 [X86] Add a simple hack to IsProfitableToFold to prevent vselect+strict fp operations from being folded into masked instructions.
We really need to update the isel patterns to prevent this, but
that requires some tablegen de-tangling. So this hack will work
for correctness in the short term.
2019-12-18 14:42:56 -08:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Liu, Chen3 bbf7860b93 add support for strict operation fpextend/fpround/fsqrt on X86 backend
Differential Revision: https://reviews.llvm.org/D71184
2019-12-10 09:04:28 +08:00
Liu, Chen3 3041434450 Add strict fp support for instructions fadd/fsub/fmul/fdiv
Differential Revision: https://reviews.llvm.org/D68757
2019-12-06 09:44:33 +08:00
Amy Huang 9e978bb01c Add support for lowering 32-bit/64-bit pointers
Summary:
This follows a previous patch that changes the X86 datalayout to represent
mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces
(https://reviews.llvm.org/D64931)

This patch implements the address space cast lowering to the corresponding
sign extension, zero extension, or truncate instructions.

Related to https://bugs.llvm.org/show_bug.cgi?id=42359

Reviewers: rnk, craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69639
2019-12-04 11:39:03 -08:00
Craig Topper cfce8f2cfb [X86] Add strict fp support for operations of X87 instructions
This is the following patch of D68854.

This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations.

Patch by Chen Liu(LiuChen3)

Differential Revision: https://reviews.llvm.org/D68857
2019-11-26 10:59:41 -08:00
Craig Topper 95f44cf44a [X86] Mark vector STRICT_FADD/STRICT_FSUB as Legal and add mutation to X86ISelDAGToDAG
The prevents LegalizeVectorOps from scalarizing them. We'll need
to remove the X86 mutation code when we add isel patterns.
2019-11-21 16:19:18 -08:00
Hiroshi Yamauchi 52e377497d [PGO][PGSO] DAG.shouldOptForSize part.
Summary:
(Split of off D67120)

SelectionDAG::shouldOptForSize changes for profile guided size optimization.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70095
2019-11-21 14:16:00 -08:00
Craig Topper c9e8e808cf [SelectionDAG][X86] Mutate strictFP nodes to non-strict in DoInstructionSelection when the node is marked Expand rather than when it is not Legal.
This allows operations that are marked Custom, but have some type
combinations that are legal to get past this code.

Add custom mutation code to X86's Select function for the nodes
that don't have isel patterns yet.
2019-11-20 10:36:02 -08:00
Craig Topper fffcd3e48e [X86] Add a 'break;' to the end of the last case in a switch to avoid surprising the next person to add a case after this one. NFC 2019-11-18 12:18:24 -08:00
Craig Topper 0da163a2cf Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops."
This seems to be causing some performance regresions that I'm
trying to investigate.

One thing that stands out is that this transform can increase
the live range of the operands of the earlier logic op. This
can be bad for register allocation. If there are two logic
op inputs we should really combine the one that is closest, but
SelectionDAG doesn't have a good way to do that. Maybe we need
to do this as a basic block transform in Machine IR.

llvm-svn: 373401
2019-10-01 22:40:03 +00:00
Craig Topper 105e82edde [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load broadcasted to a vector.
Summary:
This adds the ISD opcode and a DAG combine to create it. There are
probably some places where we can directly create it, but I'll
leave that for future work.

This updates all of the isel patterns to look for this new node.
I had to add a few additional isel patterns for aligned extloads
which we should probably fix with a DAG combine or something. This
does mean that the broadcast load folding for avx512 can no
longer match a broadcasted aligned extload.

There's still some work to do here for combining a broadcast of
a broadcast_load. We also need to improve extractelement or
demanded vector elements of a broadcast_load. I'll try to get
those done before I submit this patch.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68198

llvm-svn: 373349
2019-10-01 16:28:20 +00:00
Craig Topper 0e3f659137 [X86] Add custom isel logic to match VPTERNLOG from 2 logic ops.
There's room from improvement here, but this is a decent
starting point.

There are a few minor regressions in the vector-rotate tests,
where we are now forming a vpternlog from an and before we get
a chance to form it for a bitselect that we were matching
previously. This results in an AND and an ANDN feeding the
vpternlog where previously we just had an AND after the
vpternlog. I think we can probably DAG combine the AND with
the bitselect to get back to similar codegen.

llvm-svn: 373172
2019-09-29 18:43:08 +00:00
Craig Topper b6a2207ba2 [X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cpp
This allows us to reduce the use count on the condition node before
the match. This enables load folding for that operand without
relying on the peephole pass. This will be improved on for
broadcast load folding in a subsequent commit.

This still requires a bunch of isel patterns for vXi16/vXi8 types
though.

llvm-svn: 373156
2019-09-29 01:24:29 +00:00
Craig Topper 03b5a13ee3 [X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM.
llvm-svn: 372544
2019-09-23 05:35:23 +00:00
Craig Topper 5e26064c40 [X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to prevent an infinite loop.
The attached test case would previous infinite loop after
r365711.

I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc
to match VPTEST in 32-bit mode in a follow up commit.

llvm-svn: 372543
2019-09-23 05:35:20 +00:00
Roman Lebedev 7c3d6f5a1b [X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback to BZHI is profitable (PR43381)
Summary:
PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI,
we only do that if we either have BEXTRI (TBM),
or if BEXTR is marked as being fast (`-mattr=+fast-bextr`).
In all other cases we don't match.

But that is mainly only true for AMD CPU's.
However, for all the CPU's for which we have sched models,
the BZHI is always fast (or the sched models are all bad.)

So if we decide that it's unprofitable to emit BEXTR/BEXTRI,
we should consider falling-back to BZHI if it is available,
and follow-up with the shift.

While it's really tempting to do something because it's cool
it is wise to first think whether it actually makes sense to do.
We shouldn't just use BZHI because we can, but only it it is beneficial.
In particular, it isn't really worth it if the input is a register,
mask is small, or we can fold a load.
But it is worth it if the mask does not fit into 32-bits.

(careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here)
Thus we manage to fold a load:
https://godbolt.org/z/Er0OQz
Or if we'd end up using BZHI anyways because the mask is large:
https://godbolt.org/z/dBJ_5h
But this isn'r actually profitable in general case,
e.g. here we'd increase microop count
(the register renaming is free, mca does not model that there it seems)
https://godbolt.org/z/k6wFoz
Likewise, not worth it if we just get load folding:
https://godbolt.org/z/1M1deG

https://bugs.llvm.org/show_bug.cgi?id=43381

Reviewers: RKSimon, craig.topper, davezarzycki, spatel

Reviewed By: craig.topper, davezarzycki

Subscribers: andreadb, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67875

llvm-svn: 372532
2019-09-22 22:04:29 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Simon Pilgrim f12a3da5a7 [X86] X86DAGToDAGISel::tryFoldLoad - assert root/parent pointers are non-null. NFCI.
Silences a static analyzer warning.

llvm-svn: 372118
2019-09-17 13:27:02 +00:00
Philip Reames a9beacbac8 [X86] Updated target specific selection dag code to conservatively check for isAtomic in addition to isVolatile
See D66309 for context.

This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time.

Sorry for the lack of tests.  As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific.

Differential Revision: https://reviews.llvm.org/D66322

llvm-svn: 371547
2019-09-10 18:43:15 +00:00
Roman Lebedev 94db67f0e1 [X86] X86DAGToDAGISel::combineIncDecVector(): call getSplatBuildVector() manually
As reported in post-commit review of r370327,
there is some case where the code crashes.

As discussed with Craig Topper, the problem is that getConstant()
internally calls getSplatBuildVector(), so we don't insert
the constant itself.

If we do that manually we're good.

llvm-svn: 371346
2019-09-08 19:36:13 +00:00
Craig Topper 7bb433c87b [X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem.
We can use a MOVSX16 here then rely on FixupBWInst to change to
MOVSX32 if the upper bits are dead. With a special case to
not promote if it could be turned into CBW.

Then we can rely on X86MCInstLower to turn the MOVSX into CBW
very late if register allocation worked out.

Using MOVSX gives an opportunity to use the MOVSX as a both a
copy and a sign extend since the input and output register aren't
tied together.

Differential Revision: https://reviews.llvm.org/D67192

llvm-svn: 371243
2019-09-06 19:17:02 +00:00
Craig Topper 22b35c4291 [X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem.
We can rely on X86FixupBWInsts to turn these into MOVZX32. This
simplifies a follow up commit to use MOVSX for i8 sdivrem with
a late optimization to use CBW when register allocation works out.

llvm-svn: 371242
2019-09-06 19:15:04 +00:00
Roman Lebedev cc7495a355 [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...

I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62327

llvm-svn: 370327
2019-08-29 10:50:09 +00:00
Craig Topper 1abe162a9a [X86] Teach -Os immediate sharing code to not count constant uses that will become INC/DEC.
INC/DEC don't use an immediate so we don't need to count it. We
also shouldn't use the custom isel for it.

Fixes PR42998.

llvm-svn: 369863
2019-08-25 05:22:40 +00:00
Craig Topper 120cffccf8 [X86] Manually reimplement getTargetInsertSubreg in X86DAGToDAGISel::matchBitExtract so we can call insertDAGNode on the target constant.
This is needed to maintain the topological sort order.

Fixes PR42992.

llvm-svn: 369084
2019-08-16 04:47:44 +00:00
Craig Topper 8c545168ee [X86] Add llvm_unreachable to a switch that covers all expected values.
llvm-svn: 368857
2019-08-14 14:51:19 +00:00
Sanjay Patel e654785912 [x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483)
LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.

In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.

As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.

Differential Revision: https://reviews.llvm.org/D64707

llvm-svn: 366431
2019-07-18 12:48:01 +00:00
Craig Topper 171732aeb3 [X86] Add custom isel to select ADD/SUB/OR/XOR/AND to their non-immediate forms under optsize when the immediate has additional users.
Summary:
We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202.

Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern.

To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND.

This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild.

I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong.

I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations.

Fixes PR27202

Reviewers: spatel, RKSimon, andreadb

Reviewed By: RKSimon

Subscribers: jsji, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59909

llvm-svn: 365163
2019-07-04 22:53:57 +00:00
Craig Topper 2d306b2d57 [X86] Add PreprocessISelDAG support for turning ISD::FP_TO_SINT/UINT into X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns.
llvm-svn: 364887
2019-07-02 05:53:37 +00:00
Craig Topper 9153501f07 [X86] Remove (vzext_movl (scalar_to_vector (load))) matching code from selectScalarSSELoad.
I think this will be turning into vzext_load during DAG combine.

llvm-svn: 364499
2019-06-27 05:52:00 +00:00
Craig Topper 9ea5a32251 [X86] Teach selectScalarSSELoad to not narrow volatile loads.
llvm-svn: 364498
2019-06-27 05:51:56 +00:00
Roman Lebedev 13889145f0 [X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture lambdas by value
llvm-svn: 364420
2019-06-26 12:19:52 +00:00
Roman Lebedev fbb2e40d5c [X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness
Summary:
The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic.
Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality
we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones".
And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself.

Last pattern D does not seem to exhibit any of these truncation issues.
Although it has the opposite problem - if we extract low bits (no shift) from i64,
and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62806

llvm-svn: 364419
2019-06-26 12:19:47 +00:00
Roman Lebedev b0ecc1cc6b [X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62793

llvm-svn: 364418
2019-06-26 12:19:39 +00:00
Roman Lebedev 8b9a03973a [X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awareness
Summary:
Finally tying up loose ends here.

The problem is quite simple:
If we have pattern `(x >> start) &  (1 << nbits) - 1`,
and then truncate the result, that truncation will be propagated upwards,
into the `and`. And that isn't currently handled.

I'm only fixing pattern `a` here,
the same fix will be needed for patterns `b`/`c` too.

I *think* this isn't missing any extra legality checks,
since we only look past truncations. Similary, i don't think
we can get any other truncation there other than i64->i32.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62786

llvm-svn: 364417
2019-06-26 12:19:11 +00:00
Simon Pilgrim 7dd529e54d [X86] Replace any_extend* vector extensions with zero_extend* equivalents
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.

In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......

Differential Revision: https://reviews.llvm.org/D63326

llvm-svn: 363655
2019-06-18 09:50:13 +00:00
Craig Topper f4284f8a9d [X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI
Preliminary step for D59909

llvm-svn: 363645
2019-06-18 04:23:58 +00:00
Kevin P. Neal fece7c6c83 [FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86.
I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Craig Topper, Kevin P. Neal
Approved by:	Craig Topper
Differential Revision:	https://reviews.llvm.org/D63271

llvm-svn: 363417
2019-06-14 16:28:55 +00:00
Craig Topper f7ba8b808a [X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns.
Previously we did the equivalent operation in isel patterns with
COPY_TO_REGCLASS operations to transition. By inserting
scalar_to_vetors and extract_vector_elts before isel we can
allow each piece to be selected individually and accomplish the
same final result.

I ideally we'd use vector operations earlier in lowering/combine,
but that looks to be more difficult.

The scalar-fp-to-i64.ll changes are because we have a pattern for
using movlpd for store+extract_vector_elt. While an f64 store
uses movsd. The encoding sizes are the same.

llvm-svn: 362914
2019-06-10 00:41:07 +00:00
Craig Topper 7d8494c41c [X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns.
Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table.

llvm-svn: 362894
2019-06-08 23:53:31 +00:00
Craig Topper 137de38009 [X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations
We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table.

This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K.

There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need.

We can probably do something similar for scalar, but I haven't looked at it yet.

Differential Revision: https://reviews.llvm.org/D62757

llvm-svn: 362535
2019-06-04 18:03:07 +00:00
Pengfei Wang 1f67d94279 [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62281

llvm-svn: 362053
2019-05-30 03:59:16 +00:00
Craig Topper e43bdf144c [X86] Delay creating index register negations during address matching until after we know for sure the match will succeed
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.

Differential Revision: https://reviews.llvm.org/D61047

llvm-svn: 360823
2019-05-15 21:59:53 +00:00
Simon Pilgrim 2a0ef0530b [X86] Fix uninitialized members in constructor warnings. NFCI.
Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning.

llvm-svn: 360047
2019-05-06 14:48:02 +00:00
Simon Pilgrim d672d0e246 X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI.
findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this.

llvm-svn: 360037
2019-05-06 11:52:16 +00:00
Craig Topper 3958719dda [X86] If PreprocessISelDAG reorders a load before a call, make sure we remove dead nodes from the graph
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.

This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.

Differential Revision: https://reviews.llvm.org/D61164

llvm-svn: 359582
2019-04-30 17:56:47 +00:00
Craig Topper 354247c08d [X86] Sink NoRegister creation for unused Base/Index registers into getAddressOperands. NFCI
llvm-svn: 359318
2019-04-26 16:39:38 +00:00
Craig Topper ad662cf4c1 [X86] Segment registers should have i16 type not i32.
Probably doesn't really matter, but was inconsistent with the rest of the code.

llvm-svn: 359317
2019-04-26 16:39:35 +00:00
Craig Topper 013503c78d [X86] Remove part of an if condition that should always be true.
The IndexReg will always be non-null at this point. Earlier in the function, if
IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it
non-null.

llvm-svn: 359170
2019-04-25 06:08:02 +00:00
Craig Topper 6932abee2c [X86] Attempt to fix use-after-poison from r359121.
llvm-svn: 359143
2019-04-24 21:48:24 +00:00
Craig Topper af194e9380 [X86] Prevent folding a load into an AND if that AND is really a ZEXT_INREG that should use movzx.
This can save a 32-bit immediate move.

We would shrink the load and fold it if it was non-volatile, but that's trickier to check for.

llvm-svn: 359129
2019-04-24 19:28:38 +00:00
Craig Topper 882ca6d484 [X86] Remove dead nodes left after ReplaceAllUsesWith calls during address matching
ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes.

One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match.

Differential Revision: https://reviews.llvm.org/D61048

llvm-svn: 359121
2019-04-24 18:02:07 +00:00
Craig Topper 4d4b5d952e [X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the original AND can represented by MOVZX.
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.

llvm-svn: 358805
2019-04-20 04:38:53 +00:00
Craig Topper 8b8264828c [X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), (C1 >> C2), C2) if the AND could match a movzx.
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.

llvm-svn: 358804
2019-04-20 04:38:49 +00:00
Craig Topper bb769a2946 [X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the AND could match a movzx.
Could get further improvements by recognizing (i64 and (anyext (i32 shl))).

llvm-svn: 358737
2019-04-19 05:48:13 +00:00
Craig Topper f73caae956 [X86] Make sure we copy the HandleSDNode back to N before executing the default code after the switch in matchAddressRecursively
Summary:
There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code.

This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead.

matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere.

Reviewers: niravd, spatel, RKSimon, bkramer

Reviewed By: niravd

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60843

llvm-svn: 358735
2019-04-19 04:52:21 +00:00
Sanjay Patel fb363a778f [x86] try to widen 'shl' as part of LEA formation
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ

%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0

...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).

We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.

Differential Revision: https://reviews.llvm.org/D60789

llvm-svn: 358622
2019-04-17 22:38:51 +00:00
Craig Topper 3c57976447 [X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG.
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.

This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.

The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.

We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.

There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.

llvm-svn: 358359
2019-04-14 18:26:11 +00:00
Bill Wendling 191f1487b6 [X86] Use PC-relative mode for the kernel code model
Summary:
The Linux kernel uses PC-relative mode, so allow that when the code model is
"kernel".

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits, kees, nickdesaulniers

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60643

llvm-svn: 358343
2019-04-13 21:39:28 +00:00
Craig Topper 55b0d987fd [X86] Use int64_t and isInt<N> instead of APInt operations in foldLoadStoreIntoMemOperand. NFC
We know all our values are limited to 64 bits here so we don't need an APInt.

This should save some generated code checking between large and small size.

llvm-svn: 358338
2019-04-13 18:57:41 +00:00
Craig Topper 61f31cbcb2 [X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl
foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work.

This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part.

This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded.

Differential Revision: https://reviews.llvm.org/D60532

llvm-svn: 358139
2019-04-10 21:42:08 +00:00
Craig Topper 87a8f9761e [X86] Replace some if statements in isel address matching that should never be true with asserts. And move them earlier before we looked through operands that don't change size. NFC
These ifs were ensuring we don't have to handle types larger than 64 bits probably because we use getZExtValue in several places below them.

None of the callers of this code pass types larger than 64-bits so we can just assert instead of branching in release code.

I've also moved them earlier since we're just looking through operations that don't effect bit width.

This is prep work for some refactoring I plan to do to the (and (shl)) handling code.

llvm-svn: 358123
2019-04-10 19:08:59 +00:00
Craig Topper 399102b464 [X86] When converting (x << C1) AND C2 to (x AND (C2>>C1)) << C1 during isel, try using andl over andq by favoring 32-bit unsigned immediates.
llvm-svn: 357848
2019-04-06 19:00:11 +00:00
Craig Topper f9b9f8d2e4 [X86] Use a signed mask in foldMaskedShiftToScaledMask to enable a shorter immediate encoding.
This function reorders AND and SHL to enable the SHL to fold into an LEA. The
upper bits of the AND will be shifted out by the SHL so it doesn't matter what
mask value we use for these bits. By using sign bits from the original mask in
these upper bits we might enable a shorter immediate encoding to be used.

llvm-svn: 357846
2019-04-06 18:00:50 +00:00
Craig Topper 80aa2290fb [X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon

Reviewed By: RKSimon

Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60228

llvm-svn: 357802
2019-04-05 19:28:09 +00:00
Craig Topper 7323c2bf85 [X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri

Reviewed By: andreadb

Subscribers: hiraditya, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60138

llvm-svn: 357801
2019-04-05 19:27:49 +00:00
Craig Topper e0bfeb5f24 [X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate.
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.

This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.

Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.

This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.

I plan to make similar changes for SETcc and Jcc.

Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60041

llvm-svn: 357800
2019-04-05 19:27:41 +00:00
Fangrui Song 2c5c12c041 Change some dyn_cast to more apropriate isa. NFC
llvm-svn: 357773
2019-04-05 16:16:23 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Craig Topper 3649c20884 [X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 during isel.
SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are
zero. But that isn't the case here. We've done no analysis of the inputs.

llvm-svn: 357673
2019-04-04 05:00:18 +00:00