Commit Graph

1495 Commits

Author SHA1 Message Date
Craig Topper 216bf7f03b [X86] Allow 8-bit INC/DEC to be converted to LEA.
We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too.

Differential Revision: https://reviews.llvm.org/D58869

llvm-svn: 355424
2019-03-05 18:37:37 +00:00
Craig Topper 572e94ca02 [X86] Enable 8-bit OR with disjoint bits to convert to LEA
We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64.

Differential Revision: https://reviews.llvm.org/D58863

llvm-svn: 355423
2019-03-05 18:37:33 +00:00
Craig Topper 240315aa64 [X86] Use X86::LAST_VALID_COND instead of assuming X86::COND_S is the last encoding. NFC
llvm-svn: 355059
2019-02-28 01:00:31 +00:00
Simon Pilgrim 398d0b9e96 Fix MSVC constant truncation warnings. NFCI.
llvm-svn: 354731
2019-02-23 18:49:02 +00:00
Craig Topper 75afc0105c [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Craig Topper 61da80584d [X86] Don't prevent load folding for cvtsi2ss/cvtsi2sd based on hasPartialRegUpdate.
Preventing the load fold won't fix the partial register update since the
input we can fold is a GPR. So it will do nothing to prevent a false dependency
on an XMM register.

llvm-svn: 354193
2019-02-16 03:34:54 +00:00
Matt Arsenault 2a5488b877 X86: Replace isSafeToClobberEFLAGS implementation
Also use modifiesRegister instead of looping over operands.

llvm-svn: 354098
2019-02-15 04:01:39 +00:00
Craig Topper 41a1792b15 [X86] Remove isReMaterializable from X87 floating point constant loads and constant pool loads.
Summary: These instructions update FPSW so they aren't generically safe to rematerialize into any location if FPSW is live for a comparison result. They also use FPCW for exception masking control. Though the only exception they can generate is stack overflow and we manage the stack ourselves so that's not really going to occur.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57934

llvm-svn: 353536
2019-02-08 17:07:54 +00:00
Simon Pilgrim e95550f508 [X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping
Noticed in D57514.

Differential Revision: https://reviews.llvm.org/D57519

llvm-svn: 352922
2019-02-01 21:41:30 +00:00
Simon Pilgrim 9e2c2cfcd9 Fix "comparison of unsigned expression >= 0 is always true" warning. NFCI.
llvm-svn: 351816
2019-01-22 13:18:26 +00:00
Simon Pilgrim 180fcff5a7 [X86][SSE] Add selective commutation support for insertps (PR40340)
When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding.

Differential Revision: https://reviews.llvm.org/D56843

llvm-svn: 351807
2019-01-22 12:17:48 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Craig Topper 62ec024d3b [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used.
The BEXTR instruction documents the SF bit as undefined.

The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.

Fixes PR40060

Differential Revision: https://reviews.llvm.org/D55807

llvm-svn: 349956
2018-12-21 21:16:26 +00:00
Craig Topper 18a9d545e1 [X86] Add BSR to isUseDefConvertible.
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.

This addresses one issue from PR40090.

llvm-svn: 349531
2018-12-18 20:03:54 +00:00
Craig Topper 1ff7356f96 [X86] Const correct some helper functions X86InstrInfo.cpp. NFC
llvm-svn: 349440
2018-12-18 04:58:05 +00:00
Craig Topper 8c9d772991 [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.
These seem to have been missed when the other TBM instructions were added.

llvm-svn: 349404
2018-12-17 21:50:06 +00:00
Sanjay Patel 44eaa492b8 [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. 
That allows us to combine add ops with register moves and save some instructions. This is 
another step towards allowing add truncation in generic DAGCombiner (see D54640).

Differential Revision: https://reviews.llvm.org/D55494

llvm-svn: 348946
2018-12-12 17:58:27 +00:00
Sanjay Patel 05e36982dd [x86] clean up code for converting 16-bit ops to LEA; NFC
As discussed in D55494, we want to extend this to handle 8-bit
ops too, but that could be extended further to enable this on
32-bit systems too.

llvm-svn: 348851
2018-12-11 15:29:40 +00:00
Sanjay Patel 9765ba5f86 [x86] remove dead code for 16-bit LEA formation; NFC
As discussed in:
D55494
...this code has been disabled/dead for a long time (the code references
Athlon and Pentium 4), and there's almost no chance that it will be used 
given the last decade of uarch evolution. Also, in SDAG we promote 16-bit 
ops to 32-bit, so there's almost no way to test this code any more.

llvm-svn: 348845
2018-12-11 14:05:03 +00:00
Sanjay Patel 19bc850220 [x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA, 
but it's incomplete because we will crash on the i16 test with the debug output shown below. 
It's better to just give up instead. Really, GlobalIsel should have folded these before we 
could get into trouble.

# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected

bb.0 (%ir-block.0):
  liveins: $edi
  %1:gr32 = COPY killed $edi
  %0:gr16 = COPY %1.sub_16bit:gr32
  %5:gr64_nosp = IMPLICIT_DEF
  %5.sub_16bit:gr64_nosp = COPY %0:gr16
  %6:gr64_nosp = IMPLICIT_DEF
  %6.sub_16bit:gr64_nosp = COPY %2:gr16
  %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
  %3:gr16 = COPY killed %4.sub_16bit:gr32
  $ax = COPY killed %3:gr16
  RET 0, implicit killed $ax

# End machine code for function add_undef_i16.

*** Bad machine code: Reading virtual register without a def ***
- function:    add_undef_i16
- basic block: %bb.0  (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1:   %2:gr16
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D54710

llvm-svn: 348722
2018-12-09 14:40:37 +00:00
Francis Visoiu Mistrih d7eebd6d83 [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Evandro Menezes 9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
Matthias Braun c6613879ce LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.

llvm-svn: 346254
2018-11-06 19:00:11 +00:00
Craig Topper 6c3f1692c8 Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"
Google is reporting regressions on some benchmarks.

llvm-svn: 345785
2018-10-31 21:53:24 +00:00
Andrea Di Biagio 3d2b7176fc [tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands.
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:

  Foo(MI->getOperand(0).getImm()) == ExpectedVal;

Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.

While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.

With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.

This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).

This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.

Differential Revision: https://reviews.llvm.org/D53880

llvm-svn: 345714
2018-10-31 12:28:05 +00:00
Reid Kleckner 49a24278ba [ELF] Fix large code model MIR verifier errors
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.

Fixes some X86 verifier errors tracked in PR27481.

llvm-svn: 345219
2018-10-24 22:57:28 +00:00
Craig Topper 2417273255 [X86] Bring back the MOV64r0 pseudo instruction
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.

My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.

There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.

Differential Revision: https://reviews.llvm.org/D52757

llvm-svn: 345165
2018-10-24 17:32:09 +00:00
Matthias Braun a0beeffeed X86: Do not optimize branches with undef eflags inputs
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR.  I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.

rdar://42122367

llvm-svn: 344970
2018-10-22 22:52:23 +00:00
Matthias Braun 81578e9f77 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Matt Morehouse 4b1ec17fb0 Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions"
This reverts r343520 due to breakage of HWASan tests on Android.

llvm-svn: 343616
2018-10-02 18:35:44 +00:00
Matthias Braun 3e081703c3 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343520
2018-10-01 18:56:39 +00:00
Craig Topper 1d1dca6a6f [X86] Change an llvm_unreachable to a report_fatal_error so the optimizer will stop making us reach the other report_fatal_error in this function.
There's a conditional report_fatal_error just above this llvm_unreachable. The optimizer when seeing the unreachable removes the conditional and just makes any other error trigger the existing report_fatal_error.

llvm-svn: 343428
2018-09-30 23:43:30 +00:00
Sander de Smalen c91b27d9ee Remove FrameAccess struct from hasLoadFromStackSlot
This removes the FrameAccess struct that was added to the interface
in D51537, since the PseudoValue from the MachineMemoryOperand
can be safely casted to a FixedStackPseudoSourceValue.

Reviewers: MatzeB, thegameg, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51617

llvm-svn: 341454
2018-09-05 08:59:50 +00:00
Sander de Smalen 6cab60fa06 Extend hasStoreToStackSlot with list of FI accesses.
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.

This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D51537

llvm-svn: 341301
2018-09-03 09:15:58 +00:00
Alexander Ivchenko af96112ec6 Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructions
..Move all target-dependent checks into new isCopyInstrImpl method.

This change allows us to treat MoveReg-type instructions and generic
COPY instruction in the same way

Differential Revision: https://reviews.llvm.org/D49913

llvm-svn: 341072
2018-08-30 14:32:47 +00:00
Martin Storsjo 489993db94 [MinGW] [X86] Add stubs for references to data variables that might end up imported from a dll
Variables declared with the dllimport attribute are accessed via a
stub variable named __imp_<var>. In MinGW configurations, variables that
aren't declared with a dllimport attribute might still end up imported
from another DLL with runtime pseudo relocs.

For x86_64, this avoids the risk that the target is out of range
for a 32 bit PC relative reference, in case the target DLL is loaded
further than 4 GB from the reference. It also avoids having to make the
text section writable at runtime when doing the runtime fixups, which
makes it worthwhile to do for i386 as well.

Add stub variables for all dso local data references where a definition
of the variable isn't visible within the module, since the DLL data
autoimporting might make them imported even though they are marked as
dso local within LLVM.

Don't do this for variables that actually are defined within the same
module, since we then know for sure that it actually is dso local.

Don't do this for references to functions, since there's no need for
runtime pseudo relocations for autoimporting them; if a function from
a different DLL is called without the appropriate dllimport attribute,
the call just gets routed via a thunk instead.

GCC does something similar since 4.9 (when compiling with -mcmodel=medium
or large; from that version, medium is the default code model for x86_64
mingw), but only for x86_64.

Differential Revision: https://reviews.llvm.org/D51288

llvm-svn: 340942
2018-08-29 17:28:34 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Craig Topper 570d47a010 [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG.
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.

This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.

llvm-svn: 339496
2018-08-11 05:33:00 +00:00
Craig Topper 0423881820 [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes
These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference.

Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs.

Differential Revision: https://reviews.llvm.org/D50157

llvm-svn: 338735
2018-08-02 16:48:01 +00:00
Francis Visoiu Mistrih 7d003657de [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction
The machine verifier asserts with:

Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542.

It calls analyzeBranch which tries to call getMBB if the opcode is
JMP_1, but in this case we do:

JMP_1 @OUTLINED_FUNCTION

I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used
with brtarget8.

Differential Revision: https://reviews.llvm.org/D49299

llvm-svn: 338237
2018-07-30 09:59:33 +00:00
Jessica Paquette 69f517df27 [MachineOutliner][NFC] Move target frame info into OutlinedFunction
Just some gardening here.

Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.

Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.

llvm-svn: 337848
2018-07-24 20:13:10 +00:00
Chandler Carruth c9313a9ecb [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr* and fix latent bugs with register class updates.
Summary:
Enabling this fully exposes a latent bug in the instruction folding: we
never update the register constraints for the register operands when
fusing a load into another operation. The fused form could, in theory,
have different register constraints on its operands. And in fact,
TCRETURNm* needs its memory operands to use tailcall compatible
registers.

I've updated the folding code to re-constrain all the registers after
they are mapped onto their new instruction.

However, we still can't enable folding in the general case from
TCRETURNr* to TCRETURNm* because doing so may require more registers to
be available during the tail call. If the call itself uses all but one
register, and the folded load would require both a base and index
register, there will not be enough registers to allocate the tail call.

It would be better, IMO, to teach the register allocator to *unfold*
TCRETURNm* when it runs out of registers (or specifically check the
number of registers available during the TCRETURNr*) but I'm not going
to try and solve that for now. Instead, I've just blocked the forward
folding from r -> m, leaving LLVM free to unfold from m -> r as that
doesn't introduce new register pressure constraints.

The down side is that I don't have anything that will directly exercise
this. Instead, I will be immediately using this it my SLH patch. =/

Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't
have any tests that demonstrate the failure to update the memory operand
register constraints. This patch still seems correct, but I'm nervous
about the degree of testing due to this.

Suggestions?

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49717

llvm-svn: 337845
2018-07-24 19:04:37 +00:00
Jessica Paquette fca55129b1 [MachineOutliner][NFC] Make Candidates own their call information
Before this, TCI contained all the call information for each Candidate.

This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.

A follow-up patch for the AArch64 outliner will demonstrate this.

llvm-svn: 337840
2018-07-24 17:42:11 +00:00
Reid Kleckner 980c4df037 Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.

llvm-svn: 337740
2018-07-23 21:14:35 +00:00
Craig Topper 92ea7a7b48 [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address.
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.

llvm-svn: 337357
2018-07-18 07:31:32 +00:00
Craig Topper cdb0ed2910 [X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ.
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.

Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.

This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.

llvm-svn: 337137
2018-07-15 23:32:36 +00:00
Craig Topper 7426cf6717 [X86] Fix a subtle bug in the custom execution domain fixing for blends.
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.

Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.

llvm-svn: 337088
2018-07-14 06:30:30 +00:00
Craig Topper 02867f0fa3 [X86] The TEST instruction is eliminated when BSF/TZCNT is used
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
  %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
  %tobool = icmp eq i32 %v, 0
  %.op = add nuw nsw i32 %cnt, 1
  %add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
  bsfl     %edi, %ecx
  addl     $1, %ecx
  testl    %edi, %edi
  movl     $0, %eax
  cmovnel  %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

We now produce the following code:
  bsfl     %edi, %ecx
  movl     $-1, %eax
  cmovnel  %ecx, %eax
  addl     $1, %eax

Patch by Ivan Kulagin

Reviewers: davide, craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48765

llvm-svn: 336768
2018-07-11 06:57:42 +00:00
Craig Topper 860ab496d3 [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731
2018-07-10 22:02:23 +00:00
Yvan Roux eaececf5e0 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Craig Topper 0661f67296 [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.

Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.

llvm-svn: 336075
2018-07-02 06:23:39 +00:00
Craig Topper c004aa6c5f [X86] Remove the places that return nullptr from X86InstrInfo::commuteInstructionImpl.
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.

llvm-svn: 336070
2018-07-01 23:27:41 +00:00
Craig Topper 4e78213ae4 [X86] Move the memory unfolding table creation into its own class and make it a ManagedStatic.
Also move the static folding tables, their search functions and the new class into new cpp/h files.

The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables.

By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed.

llvm-svn: 336056
2018-07-01 05:47:49 +00:00
Craig Topper 84199deb17 [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a getFMA3Group free function. NFCI
The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use.

This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic.

llvm-svn: 336055
2018-06-30 22:38:42 +00:00
Craig Topper 7c96f051d2 [X86] Use a std::vector for the memory unfolding table.
Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated.

This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups.

Differential Revision: https://reviews.llvm.org/D48585

llvm-svn: 335994
2018-06-29 17:11:26 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Benjamin Kramer f9613b2995 Unify sorted asserts to use the existing atomic pattern
These are all benign races and only visible in !NDEBUG. tsan complains
about it, but a simple atomic bool is sufficient to make it happy.

llvm-svn: 335823
2018-06-28 10:03:45 +00:00
Benjamin Kramer e214f046af [X86] Make folding table checking threadsafe
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
2018-06-27 21:01:53 +00:00
Craig Topper 33aba0eb4c [X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group.
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.

The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.

llvm-svn: 335694
2018-06-27 00:42:24 +00:00
Craig Topper 614f192471 [X86] Add comment about the sorting of the memory folding tables added in r335501.
llvm-svn: 335517
2018-06-25 20:11:16 +00:00
Reid Kleckner 88fee5fdbc Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335508
2018-06-25 18:16:27 +00:00
Craig Topper 3cc6cb1d35 [X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search.
With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed.

We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it.

Differential Revision: https://reviews.llvm.org/D48527

llvm-svn: 335501
2018-06-25 17:26:56 +00:00
Craig Topper facea6b4a6 [X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands.
We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted.

The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices.

With that abort case pushed earlier, we can remove all the abort checks and replace with asserts.

llvm-svn: 335446
2018-06-25 06:05:37 +00:00
Craig Topper 19772c89c7 [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions.
llvm-svn: 335428
2018-06-24 06:29:50 +00:00
Reid Kleckner 3a2fd1c2f3 Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
MCJIT can't handle R_X86_64_GOT64 yet.

llvm-svn: 335300
2018-06-21 22:19:05 +00:00
Reid Kleckner 3286a6c896 [X86] Commit some comments that weren't in the medium code model patch
llvm-svn: 335298
2018-06-21 21:57:44 +00:00
Reid Kleckner 247fe6aeab [X86] Implement more of x86-64 large and medium PIC code models
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

Reviewers: chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335297
2018-06-21 21:55:08 +00:00
Craig Topper c2696d577b [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.

I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.

Differential Revision: https://reviews.llvm.org/D43608

llvm-svn: 335173
2018-06-20 21:05:02 +00:00
Jessica Paquette 32de26d432 [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Craig Topper 9fe45d846e [X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.

llvm-svn: 334915
2018-06-17 18:00:16 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Craig Topper d00e375310 [X86] Add more instructions to the hasUndefRegUpdate list.
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.

llvm-svn: 334867
2018-06-15 22:25:04 +00:00
Craig Topper 1657b7b8d2 [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.

llvm-svn: 334848
2018-06-15 17:56:17 +00:00
Craig Topper c8a763ed84 Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update."
There's a typo causing the build to fail.

llvm-svn: 334803
2018-06-15 06:15:26 +00:00
Craig Topper 5ec210cc27 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

llvm-svn: 334802
2018-06-15 06:11:36 +00:00
Craig Topper 3c4cc01226 [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

llvm-svn: 334800
2018-06-15 05:49:19 +00:00
Craig Topper f43807dd89 [X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency.
llvm-svn: 334785
2018-06-15 04:42:54 +00:00
Craig Topper bfa94d5086 [X86] Fix stale comment in folding tables.
llvm-svn: 334758
2018-06-14 19:28:31 +00:00
Craig Topper 3ffeb41f6b [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

llvm-svn: 334728
2018-06-14 15:40:31 +00:00
Craig Topper b0742bf30d [X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.

llvm-svn: 334726
2018-06-14 15:40:29 +00:00
Craig Topper f7f663e0a9 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

llvm-svn: 334648
2018-06-13 20:03:42 +00:00
Craig Topper ede97c9548 [X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table.
llvm-svn: 334511
2018-06-12 15:48:03 +00:00
Tomasz Krupa f8c7637027 [X86] Block UndefRegUpdate
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.

Reviewers: craig.topper

Subscribers: llvm-commits, mike.dvoretsky, ashlykov

Differential Revision: https://reviews.llvm.org/D47621

llvm-svn: 334175
2018-06-07 08:48:45 +00:00
Petar Jovanovic 8cb6a521be Change TII isCopyInstr way of returning arguments(NFC)
Make TII isCopyInstr() return MachineOperands through pointer to pointer
instead via reference.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D47364

llvm-svn: 334105
2018-06-06 16:36:30 +00:00
Jessica Paquette aa087327ce [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h
This is setting up to fix bug 37573 cleanly.

This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.

Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.

Also, as a side-effect, it makes the outliner a bit cleaner.

https://bugs.llvm.org/show_bug.cgi?id=37573

llvm-svn: 333952
2018-06-04 21:14:16 +00:00
Craig Topper 6b545182fb [X86] Fix typo in comment. NFC
llvm-svn: 333382
2018-05-28 19:33:06 +00:00
Petar Jovanovic c051000b83 [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Eli Friedman d268bf0a4d Fix unused lambda capture.
llvm-svn: 332686
2018-05-18 02:11:25 +00:00
Eli Friedman 4081a57af7 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Andrea Di Biagio d4c58400c5 [X86] Correct spill slot size.
This patch fixes a bug introduced by revision 330778 (originally reviewed at:
https://reviews.llvm.org/D44782), where function isFrameLoadOpcode returned
the wrong number of bytes read for opcodes VMOVSSrm and VMOVSDrm.

This corrects that mistake, and extends the regression test to catch cases where
the dead stores should be removed.

Patch by Jeremy Morse.

Differential Revision: https://reviews.llvm.org/D46256

llvm-svn: 331252
2018-05-01 10:29:38 +00:00
Craig Topper 8a6532ae84 [X86] Rename BNDMOV instructions and hide redundant instruction encoding from the assembler.
Favor the 0x1a encoding for register/register move to match gas.

The instructions used RM and MR in their name along with rr/rm/mr at the end. To make more consistent with other instructions remove the RM/MR and use rr/rm/mr/rr_REV.

Hide the _REV encoding from the assembler but leave it for the disassembler.

llvm-svn: 331101
2018-04-28 06:02:39 +00:00
Craig Topper d656410293 [X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block.
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.

To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.

Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.

I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.

Reviewers: chandlerc, RKSimon, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46202

llvm-svn: 331091
2018-04-27 22:15:33 +00:00
Chandler Carruth eb631ef51e [x86] Allow folding unaligned memory operands into pcmp[ei]str*
instructions.

These have special permission according to the x86 manual to read
unaligned memory, and this folding is done by ICC and GCC as well.

This corrects one of the issues identified in PR37246.

llvm-svn: 330896
2018-04-26 03:17:25 +00:00
Warren Ristow b960d2cb40 [X86] Account for partial stack slot spills (PR30821)
Previously, _any_ store or load instruction was considered to be
operating on a spill if it had a frameindex as an operand, and thus
was fair game for optimisations such as "StackSlotColoring". This
usually works, except on architectures where spills can be partially
restored, for example on X86 where a spilt vector can have a single
component loaded (zeroing the rest of the target register). This can be
mis-interpreted and the zero extension unsoundly eliminated, see
pr30821.

To avoid this, this commit optionally provides the caller to
isLoadFromStackSlot and isStoreToStackSlot with the number of bytes
spilt/loaded by the given instruction. Optimisations can then determine
that a full spill followed by a partial load (or vice versa), for
example, cannot necessarily be commuted.

Patch by Jeremy Morse!

Differential Revision: https://reviews.llvm.org/D44782

llvm-svn: 330778
2018-04-24 22:01:50 +00:00
Chandler Carruth 0ca3bd0729 [x86] Model the direction flag (DF) separately from the rest of EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.

In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.

I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.

Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.

Differential Revision: https://reviews.llvm.org/D45154

llvm-svn: 329673
2018-04-10 06:40:51 +00:00
Chandler Carruth 19618fc639 [x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.

However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.

There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.

This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.

Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.

Differential Revision: https://reviews.llvm.org/D45146

llvm-svn: 329657
2018-04-10 01:41:17 +00:00
Jessica Paquette 5fa2a63785 [MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.

Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.

This also adds a new test for the new redzone behaviour.

llvm-svn: 329134
2018-04-03 23:32:41 +00:00
Chandler Carruth 06b343c6ed [x86] Expose more of the condition conversion routines in the public API
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.

llvm-svn: 328944
2018-04-01 21:47:55 +00:00
Craig Topper 40d3b32e12 [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.

This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.

llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Craig Topper ad7c685791 [X86] Rename MOVSX32_NOREXrr8 to MOVSX32rr8_NOREX so that the scheduler model regular expressions will pick it up with the regular version.
Do the same for MOVSX32_NOREXrm8, MOVZX32_NOREXrr8, and MOVZX32_NOREXrm8

llvm-svn: 327948
2018-03-20 05:00:20 +00:00
Jessica Paquette b3e7dc9144 [MachineOutliner] Make KILLs invisible
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.

llvm-svn: 327760
2018-03-16 22:53:34 +00:00
Craig Topper cb7881c649 [X86] Stop passing two arguments by reference. NFC
I think these used to be out parameters, but they haven't been for a while.

llvm-svn: 326417
2018-03-01 06:25:13 +00:00
Craig Topper 688d1eb919 Revert r326225 "[X86] Move the load folding tables to a separate .inc file"
The bots don't seem to like the .inc file. I must be missing some cmake incantation.

llvm-svn: 326228
2018-02-27 19:15:40 +00:00
Craig Topper c0a1291478 [X86] Move the load folding tables to a separate .inc file
These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway.

Differential Revision: https://reviews.llvm.org/D43806

llvm-svn: 326225
2018-02-27 18:46:11 +00:00
Craig Topper a05ed17316 [X86] Make XOP VPCOM instructions commutable to fold loads during isel.
llvm-svn: 325547
2018-02-20 03:58:13 +00:00
Craig Topper 9b64bf54b9 [X86] Make a helper function for commuting AVX512 VPCMP immediates since we do it in two places.
llvm-svn: 325546
2018-02-20 03:58:11 +00:00
Chandler Carruth 0be0cfa65b [x86] Fix nasty bug in the x86 backend that is essentially impossible to
hit from IR but creates a minefield for MI passes.

The x86 backend has fairly powerful logic to try and fold loads that
feed register operands to instructions into a memory operand on the
instruction. This is almost always a good thing, but there are specific
relocated loads that are only allowed to appear in specific
instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and
`addq`. This patch blocks folding of memory operands using this
relocation unless the target is in fact `addq`.

The particular relocation indicates why we simply don't hit this under
normal circumstances. This relocation is only used for TLS, and it gets
used in very specific ways in conjunction with %fs-relative addressing.
The result is that loads using this relocation are essentially never
eligible for folding into an instruction's memory operands. Unless, of
course, you have an MI pass that inserts usage of such a load. I have
exactly such an MI pass and was greeted by truly mysterious miscompiles
where the linker replaced my instruction with a completely garbage byte
sequence. Go team.

This is the only such relocation I'm aware of in x86, but there may be
others that need to be similarly restricted.

Fixes PR36165.

Differential Revision: https://reviews.llvm.org/D42732

llvm-svn: 324546
2018-02-07 23:59:14 +00:00
Craig Topper 8baa9c77e3 [X86] When doing callee save/restore for k-registers make sure we don't use KMOVQ on non-BWI targets
If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass.

Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug.

Fixes PR36256

Differential Revision: https://reviews.llvm.org/D42989

llvm-svn: 324533
2018-02-07 21:41:50 +00:00
Nirav Dave eedb663221 [X86] Teach DAG unfoldMemoryOperand to reconvert CMPs to tests
Summary:
Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold.
This fixes a regression from upcoming D41293 change.

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42808

llvm-svn: 324261
2018-02-05 18:58:58 +00:00
Amaury Sechet f89f188ddb [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323888
2018-01-31 16:48:54 +00:00
Eric Liu 0b69b5ed85 Revert "[X86] Avoid using high register trick for test instruction"
This reverts commit r323690. This causes crash in llc. See the original commit thread for details.

llvm-svn: 323761
2018-01-30 14:18:33 +00:00
Amaury Sechet 015184b79e [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 .

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323690
2018-01-29 20:54:33 +00:00
Craig Topper 066e73762d [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFC
llvm-svn: 323403
2018-01-25 04:45:32 +00:00
Craig Topper dbddac0915 [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall consistency. NFC
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.

llvm-svn: 323402
2018-01-25 04:45:30 +00:00
Craig Topper b85b484fee [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
2018-01-24 17:58:51 +00:00
Craig Topper a55ac7b790 [X86] Rename 256-bit VFRCZ instructions to have the Y before the rr/rm to match other instructions. NFC
llvm-svn: 323304
2018-01-24 05:14:39 +00:00
Craig Topper 002657731b [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.

llvm-svn: 323261
2018-01-23 21:37:51 +00:00
Craig Topper c2df6409c7 [X86] Add missing MOVSX/MOVZX instructions to load folding tables.
I'm not sure there's any way to generate these folding cases especially the movzx ones since even the register form is never emitted by codegen.

I'm just adding them to remove the difference with the autogenerated version of the folding table.

llvm-svn: 323200
2018-01-23 14:09:22 +00:00
Marina Yatsina 811523cc08 Fix bug in commit 323096 exposed by test in test-suite-verify-machineinstrs-x86_64h-O3
Change-Id: I0a4b10d0d6c8de606d989c567ec07944ae283a87
llvm-svn: 323126
2018-01-22 15:31:05 +00:00
Marina Yatsina 77a21dbad4 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d
llvm-svn: 323096
2018-01-22 10:07:01 +00:00
Marina Yatsina 6fc2aaae8d Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
2018-01-22 10:05:23 +00:00
Simon Pilgrim e5dad1365c Avoid Wparentheses warning.
llvm-svn: 322526
2018-01-15 22:40:06 +00:00
Simon Pilgrim 85bd9141ca [X86][MMX] Add support for MMX zero vector creation
As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register.

This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.

Differential Revision: https://reviews.llvm.org/D41908

llvm-svn: 322525
2018-01-15 22:32:40 +00:00
Simon Pilgrim 940eae3cc1 [X86][SSE] Add custom execution domain fixing for BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873)
Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW.

Differential Revision: https://reviews.llvm.org/D42042

llvm-svn: 322524
2018-01-15 22:18:45 +00:00
Jessica Paquette 3291e7353e [MachineOutliner] AArch64: Handle instrs that use SP and will never need fixups
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.

Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if

- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block

If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.

The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say

%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy

are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
  

llvm-svn: 322048
2018-01-09 00:26:18 +00:00
Craig Topper 03d8e516cf [X86] Add VSHUFF32X4 and similar instructions to load folding tables.
llvm-svn: 321978
2018-01-07 23:30:20 +00:00
Craig Topper a21f551109 [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.
llvm-svn: 321958
2018-01-07 06:48:20 +00:00
Craig Topper d0859a03b5 [X86] Correct the load folding flags for xmm fp->mmx conversion instructions.
The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.

llvm-svn: 321956
2018-01-07 06:24:30 +00:00
Craig Topper aa73941176 [X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load folding table.
llvm-svn: 321955
2018-01-07 06:24:29 +00:00
Craig Topper 85657d59a9 [X86] Don't put any EVEX_B instructions in the tablegen generated load folding tables.
EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent.

llvm-svn: 321954
2018-01-07 06:24:28 +00:00
Craig Topper 89293a2a94 [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.
llvm-svn: 321953
2018-01-07 06:24:27 +00:00
Craig Topper a124ab10ef [X86] Add some 8 and 16-bit instructions to the load folding tables.
llvm-svn: 321952
2018-01-07 06:24:25 +00:00
Craig Topper 11aede13db [X86] Add EVEX vcvtph2ps to the load folding tables.
llvm-svn: 321951
2018-01-07 06:24:24 +00:00
Craig Topper 40cc8338f7 [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex versions of cvtps2ph to the store folding tables.
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.

llvm-svn: 321950
2018-01-07 06:24:23 +00:00
Craig Topper 8fa800b834 [X86] Add CMP8ri8 to load folding tables.
llvm-svn: 321949
2018-01-07 06:24:21 +00:00
Simon Pilgrim b1b30286bf Remove superfluous break after a return. NFCI.
llvm-svn: 320941
2017-12-17 11:01:33 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Craig Topper a0be5a06c1 [X86] Rename some instructions that start with Int_ to have the _Int at the end.
This matches AVX512 version and is more consistent overall. And improves our scheduler models.

In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses.

llvm-svn: 320325
2017-12-10 19:47:56 +00:00
Craig Topper aa904d5ab6 [X86] Fix a few instructions that were named Z512 instead of just Z.
This makes things consistent with our normal instruction naming.

llvm-svn: 320316
2017-12-10 17:42:39 +00:00
Craig Topper c89e282f7d [X86] Rename some instructions so that 'b' is added as a suffix instead of replacing an 'r'
llvm-svn: 320290
2017-12-10 09:14:38 +00:00
Craig Topper da7e78e18c [X86] Rename the rb form of scalar ADD/SUB/MUL/DIV to include _Int since they can only be selected by intrinsics.
llvm-svn: 320283
2017-12-10 04:07:28 +00:00
Francis Visoiu Mistrih a8a83d150f [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Hans Wennborg 5df9f0878b Re-commit r319490 "XOR the frame pointer with the stack cookie when protecting the stack"
The patch originally broke Chromium (crbug.com/791714) due to its failing to
specify that the new pseudo instructions clobber EFLAGS. This commit fixes
that.

> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622

llvm-svn: 319824
2017-12-05 20:22:20 +00:00
Hans Wennborg 361d4392cf Revert r319490 "XOR the frame pointer with the stack cookie when protecting the stack"
This broke the Chromium build (crbug.com/791714). Reverting while investigating.

> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-svn: 319706
2017-12-04 22:21:15 +00:00
Zachary Turner 8065f0b975 Mark all library options as hidden.
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.

This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.

Differential Revision: https://reviews.llvm.org/D40674

llvm-svn: 319505
2017-12-01 00:53:10 +00:00
Reid Kleckner ba4014e9dc XOR the frame pointer with the stack cookie when protecting the stack
Summary: This strengthens the guard and matches MSVC.

Reviewers: hans, etienneb

Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits

Differential Revision: https://reviews.llvm.org/D40622

llvm-svn: 319490
2017-11-30 22:41:21 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Craig Topper 40a1edc307 [X86] Don't invert NewCC variable while processing the jcc/setcc/cmovcc instructions in optimizeCompareInstr.
The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value.

Fixes PR35399.

llvm-svn: 318934
2017-11-23 19:25:45 +00:00
Coby Tayree 7ca5e58736 [x86][icelake]vpclmulqdq introduction
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101

llvm-svn: 318741
2017-11-21 09:30:33 +00:00
Craig Topper 0ccec70ff5 [X86] Add scalar register class versions of VRNDSCALE instructions and rename the existing versions to _Int.
This is consistent with out normal implementation of scalar instructions.

While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.

llvm-svn: 317977
2017-11-11 08:24:15 +00:00
Craig Topper ca1aa83cbe [X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate.
This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.

We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.

Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.

Differential Revision: https://reviews.llvm.org/D39402

llvm-svn: 317112
2017-11-01 18:10:06 +00:00
Craig Topper beed653135 [X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is enabled. Use 128-bit VLX instruction when VLX is enabled.
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.

Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?

llvm-svn: 316978
2017-10-31 06:01:04 +00:00
Craig Topper 85bcc297c3 [X86] Rearrange code in X86InstrInfo.cpp to put all the foldMemoryOperandImpl methods together without partial/undef register handling in the middle. NFC
I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that.

llvm-svn: 316883
2017-10-30 04:39:18 +00:00
Simon Pilgrim ab6dbe2b29 Strip trailing whitespace. NFCI.
llvm-svn: 316277
2017-10-21 20:40:49 +00:00
Simon Pilgrim 3cb024490a [X86][SSE] Add extractps/pextrd equivalence to domain tables
Differential Revision: https://reviews.llvm.org/D39135

llvm-svn: 316274
2017-10-21 20:19:48 +00:00
Craig Topper 61010a85b8 [X86] Add AVX512 versions of VCVTPD2PS to load folding tables.
llvm-svn: 315801
2017-10-14 05:55:43 +00:00
Craig Topper 134241e4af [X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding tables.
llvm-svn: 315796
2017-10-14 04:18:08 +00:00
Craig Topper 0b64e67b0d [X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load folding tables.
I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true.

llvm-svn: 315795
2017-10-14 04:18:07 +00:00
Ayman Musa 1170deb9c8 [X86] Add missing entries in 'MemoryFoldTable2Addr' to get complete form of the table.
Get the folding table 'MemoryFoldTable2Addr' to a complete state as part of the process explained in https://reviews.llvm.org/D38028

Differential Revision: https://reviews.llvm.org/D38500

llvm-svn: 315174
2017-10-08 09:46:50 +00:00
Jessica Paquette 13593843f6 [MachineOutliner] Disable outlining from LinkOnceODRs by default
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!

To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.

Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.

Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda

Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic

Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb

Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.

llvm-svn: 315136
2017-10-07 00:16:34 +00:00
Craig Topper 6fb55716e9 [X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.

This should fix PR33079

Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.

Differential Revision: https://reviews.llvm.org/D38449

llvm-svn: 314914
2017-10-04 17:20:12 +00:00
Craig Topper c20b46da2f [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.

For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.

I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995.

Reviewers: aymanmus, RKSimon, zvi

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38120

llvm-svn: 314639
2017-10-01 23:53:53 +00:00
Jessica Paquette 4cf187b5b4 [MachineOutliner] AArch64: Avoid saving + restoring LR if possible
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.

This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.

This also improves several comments on the outliner's cost model.

https://reviews.llvm.org/D36721

llvm-svn: 314341
2017-09-27 20:47:39 +00:00
Craig Topper c16a472966 Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."""
This caused PR34751

llvm-svn: 314339
2017-09-27 20:34:17 +00:00
Craig Topper e0d8290094 Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg."
This contributed to PR34751

llvm-svn: 314338
2017-09-27 20:34:13 +00:00
Craig Topper 7f0eeb428b Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.""
The late MOV8rr_NOREX that caused the crash has been removed.

llvm-svn: 314249
2017-09-26 21:35:09 +00:00
Craig Topper ab3c0075b8 [X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg.
This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix.

llvm-svn: 314248
2017-09-26 21:35:06 +00:00
Benjamin Kramer 4b2113a303 Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."
Makes llc crash. This reverts commit r314151.

llvm-svn: 314199
2017-09-26 10:25:27 +00:00
Craig Topper d830f276c1 [X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.
llvm-svn: 314151
2017-09-25 21:14:55 +00:00
Craig Topper 23f1830748 [X86] Add IFMA instructions to the load folding tables and make them commutable for the multiply operands.
llvm-svn: 314080
2017-09-24 17:28:14 +00:00
Craig Topper 675bdd30c6 [X86] Make sure we still mark the full register as implicitly defined when we shrink 256/512 bit zeroing xors to 128-bit.
Not sure if anything really cares, but this seems like the right thing to do.

llvm-svn: 314071
2017-09-24 05:24:51 +00:00
Craig Topper a80949feb5 [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table.
llvm-svn: 313610
2017-09-19 04:39:55 +00:00
Craig Topper a6054328e8 [X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain.
MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter.

llvm-svn: 313509
2017-09-18 04:40:58 +00:00
Craig Topper 87f7381edf [X86] Teach execution domain fixing to convert between FP and int unpack instructions.
llvm-svn: 313508
2017-09-18 03:29:54 +00:00
Craig Topper d4341920d5 [X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD.
llvm-svn: 313507
2017-09-18 03:29:47 +00:00
Craig Topper fa82efb50a [X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables.
llvm-svn: 312449
2017-09-03 17:52:23 +00:00
Craig Topper 62c47a2aa5 Mark Knights Landing as having slow two memory operand instructions
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.

Patch by David Zarzycki.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37224

llvm-svn: 311979
2017-08-29 05:14:27 +00:00
Craig Topper 355d8cff49 [X86] Add TBM instructions to X86InstrInfo::isDefConvertible.
This allows us to remove "test" instructions and use the flags from the TBM instructions directly.

llvm-svn: 311747
2017-08-25 01:59:06 +00:00
Craig Topper b049158a55 [X86] Add the rest of the ADC and SBB instructions to isDefConvertible.
I don't know if this really affects anything. Just thought it was weird that we had all of the ADD/SUB/AND/OR/XOR instructions.

llvm-svn: 310447
2017-08-09 06:17:49 +00:00
Dinar Temirbulatov a0beedef1c [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35965

llvm-svn: 309926
2017-08-03 08:50:18 +00:00
Craig Topper 410d252f5b [AVX-512] Add unmasked subvector inserts and extract to the execution domain tables.
llvm-svn: 309632
2017-07-31 22:07:29 +00:00
Jessica Paquette d87f54493d [MachineOutliner] NFC: Change IsTailCall to a call class + frame class
This commit

- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively

This accomplishes a couple things.

Firstly, we don't need the notion of *tail call* in the general outlining algorithm.

Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.

Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.

llvm-svn: 309475
2017-07-29 02:55:46 +00:00
Jessica Paquette 809d708b8a [MachineOutliner] NFC: Split up getOutliningBenefit
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.

The goal of refactoring getOutliningBenefit is to:

- Get us closer to getting rid of the IsTailCall flag

- Further split up "target-specific" things and
"general algorithm" things

llvm-svn: 309356
2017-07-28 03:21:58 +00:00
Dinar Temirbulatov aead31a36f [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35839

llvm-svn: 309298
2017-07-27 17:47:01 +00:00
Hiroshi Inoue ddb34d84c9 fix trivial typos in comments; NFC
llvm-svn: 307004
2017-07-03 06:32:59 +00:00
Craig Topper 8b8767662c [AVX-512] Mark masked VPCMP instructions as commutable.
llvm-svn: 305276
2017-06-13 07:13:50 +00:00
Craig Topper 42d0339257 [X86] Add masked integer compare instructions to load folding tables.
llvm-svn: 305274
2017-06-13 07:13:44 +00:00
Craig Topper 69fead95c7 [AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables.
llvm-svn: 305180
2017-06-12 04:57:31 +00:00
Chandler Carruth 41ed4034dd [x86] Revert the X86FoldTablesEmitter due to more miscompiles.
In testing, we've found yet another miscompile caused by the new tables.
And this one is even less clear how to fix (we could teach it to fold
a 16-bit load instead of the 32-bit load it wants, or block folding
entirely).

Also, the approach to excluding instructions seems increasingly to not
scale well.

I have left a more detailed analysis on the review log for the original
patch (https://reviews.llvm.org/D32684) along with suggested path
forward. I will land an additional test case that I wrote which covers
the code that was miscompiling (folding into the output of `pextrw`) in
a subsequent commit to keep this a pure revert.

For each commit reverted here, I've restricted the revert to the
non-test code touching the x86 fold table emission until the last commit
where I did revert the test updates. This means the *new* test cases
added for `insertps` and `xchg` remain untouched (and continue to pass).

Reverted commits:
r304540: [X86] Don't fold into memory operands into insertps in the ...
r304347: [TableGen] Adapt more places to getValueAsString now ...
r304163: [X86] Don't fold away the memory operand of an xchg.
r304123: Don't capture a temporary std::string in a StringRef.
r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..."

Original commit was in r304088, and after a string of fixes was reverted
previously in r304121 to fix build bots, and then re-landed in r304122.

llvm-svn: 304762
2017-06-06 02:15:31 +00:00
Galina Kistanova c752c4bf56 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304355
2017-05-31 21:50:45 +00:00
Zachary Turner df1832cf86 Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This was reverted due to buildbot breakages and I was not familiar
with this code to investigate it.  But while trying to get a
useful backtrace for the author, it turns out the fix was very
obvious.  Resubmitting this patch as is, and will submit the
fix in a followup so that the fix is not hidden in the larger
CL.

llvm-svn: 304122
2017-05-29 02:19:37 +00:00
Zachary Turner 5b199be769 Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well
as all subsequent followups.  llvm-tblgen currently segfaults with
this change, and it seems it has been broken on the bots all
day with no fixes in preparation.  See, for example:

http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/

llvm-svn: 304121
2017-05-29 01:48:53 +00:00
Ayman Musa d9f1fe43a8 [X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.
X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.

Differential Revision: https://reviews.llvm.org/D32684

llvm-svn: 304088
2017-05-28 12:55:36 +00:00
Matthias Braun ac4307c41e LivePhysRegs: Rework constructor + documentation; NFC
- Take reference instead of pointer to a TRI that cannot be nullptr.
- Improve documentation comments.

llvm-svn: 304038
2017-05-26 21:51:00 +00:00
Oren Ben Simhon 7bf27f03f2 [X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858
2017-05-25 13:45:23 +00:00
Simon Pilgrim 9f46d1d479 Strip trailing whitespace. NFCI.
llvm-svn: 303736
2017-05-24 11:02:27 +00:00